Most Cited 2024 "gated memory unit" Papers
12,324 papers found • Page 5 of 62
Conference
Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du, Miaojing Shi, Jiankang Deng
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang, Qixiang ZHANG, Yi Li et al.
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations
Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang, Si Si, Daliang Li et al.
Learning the 3D Fauna of the Web
Zizhang Li, Dor Litvak, Ruining Li et al.
Curriculum reinforcement learning for quantum architecture search under hardware errors
Yash J. Patel, Akash Kundu, Mateusz Ostaszewski et al.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.
A Watermark-Conditioned Diffusion Model for IP Protection
Rui Min, Sen Li, Hongyang Chen et al.
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li, Haoyan Guan, Jianing Qiu et al.
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee, Sebin Lee, Changho Jo et al.
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
Guoqiang Liang, Kanghao Chen, Hangyu Li et al.
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
Chenrui Zhang, Lin Liu, Chuyuan Wang et al.
Benchmarking and Improving Generator-Validator Consistency of Language Models
XIANG LI, Vaishnavi Shrivastava, Siyan Li et al.
Generative Proxemics: A Prior for 3D Social Interaction from Images
Vickie Ye, Vickie Ye, Georgios Pavlakos et al.
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
Shuai Tan, Bin Ji, Ye Pan
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li, Hsiaoyu Chen, Egor Larionov et al.
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu, Yazhou Ren, Xiaolong Wang et al.
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
Pratyush Maini, Sachin Goyal, Zachary Lipton et al.
MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding
Lirong Wu, Yijun Tian, Yufei Huang et al.
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang, Qu Yang, Chenxiang Ma et al.
A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Kai Katsumata, Duc Minh Vo, Hideki Nakayama
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
Letian Huang, Jiayang Bai, Jie Guo et al.
Few-Shot Detection of Machine-Generated Text using Style Representations
Rafael Rivera Soto, Kailin Koch, Aleem Khan et al.
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Zhengxiang Shi, Aldo Lipani
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin, Yupeng Zheng, Pengfei Li et al.
Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
Tian-Xing Xu, WENBO HU, Yu-Kun Lai et al.
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki, Or Litany, Angela Dai
WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series
Irina Rish, Kartik Ahuja, Mohammad Javad Darvishi Bayazi et al.
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai, Zongbo Han, Bing Cao et al.
A Diffusion-Based Framework for Multi-Class Anomaly Detection
Haoyang He, Jiangning Zhang, Hongxu Chen et al.
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Qiao Gu, Zhaoyang Lv, Duncan Frost et al.
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen et al.
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.
Prompt Learning via Meta-Regularization
Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Hallee E. Wong, Marianne Rakic, John Guttag et al.
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng, Li Hebei, Yueyi Zhang et al.
A Vision Check-up for Language Models
Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim, Byeongho Heo, Dongyoon Han
Attribute-Missing Graph Clustering Network
Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.
Does CLIP’s generalization performance mainly stem from high train-test similarity?
Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak et al.
Stream Query Denoising for Vectorized HD-Map Construction
Shuo Wang, Fan Jia, Weixin Mao et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
Multi-Architecture Multi-Expert Diffusion Models
Yunsung Lee, Jin-Young Kim, Hyojun Go et al.
STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction
Yu-Hsuan Wu, Jerry Hu, Weijian Li et al.
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang, Ruiyuan Gao, Kai Chen et al.
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing, Rui Wang, Wenqi Ren et al.
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Chao Xu, Ang Li, Linghao Chen et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN, Yanjun Wang, Ailing Zeng et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai et al.
Controllable Mind Visual Diffusion Model
Bohan Zeng, Shanglin Li, Xuhui Liu et al.
PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters
Jingyu Chen, Runlin Lei, Zhewei Wei
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva, Pinar Yanardag
Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai et al.
Quality-Diversity through AI Feedback
Herbie Bradley, Andrew Dai, Hannah Teufel et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction
Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai et al.
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
Hyunwook Lee, Sungahn Ko
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data
YongKyung Oh, Dongyoung Lim, Sungil Kim
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
Soopil Kim, Sion An, Philip Chikontwe et al.
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.
Latent Space Editing in Transformer-Based Flow Matching
Vincent Tao Hu, Wei Zhang, Meng Tang et al.
Better Call SAL: Towards Learning to Segment Anything in Lidar
Aljoša Ošep, Tim Meinhardt, Francesco Ferroni et al.
Text-Guided Molecule Generation with Diffusion Language Model
Haisong Gong, Qiang Liu, Shu Wu et al.
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu, Xiaoqi Zhao, Youwei Pang et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang, Zhendong Mao, Mingcong Liu et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Aleksandar Petrov, Philip Torr, Adel Bibi
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng et al.
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu, Bo Li, Xueyang Fu et al.
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su, Mingzhao Yang, Bin Li et al.
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang, Changxing Ding, Zhibin Hong et al.
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani et al.
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang et al.
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang, Zhentao Tan, Tianyi Wei et al.
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang, Xiangyu Rui, Long Cui et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples
Chen Ma, Ningfei Wang, Qi Alfred Chen et al.
Making RL with Preference-based Feedback Efficient via Randomization
Runzhe Wu, Wen Sun
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu, Yijie Guo, Yuxin Peng
Disentangled Prompt Representation for Domain Generalization
De Cheng, Zhipeng Xu, XINYANG JIANG et al.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng, Wei Yin, Kaixuan Wang et al.
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang et al.
STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Liangcai Su, Junwei Pan, Ximei Wang et al.
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano et al.
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu, Diankun Wu, Yi Wei et al.
A Unified and General Framework for Continual Learning
Zhenyi Wang, Yan Li, Li Shen et al.
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu, Shijun Zhou, Xiyao Liu et al.
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun, Lingyun Yu, Hongtao Xie et al.
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
Shuzhao Xie, Weixiang Zhang, Chen Tang et al.
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt et al.
Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu, Xinke Li, Xueting Li et al.
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici, Davide Scaramuzza
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning
Mohamed Elsayed, A. Rupam Mahmood
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Tan, Muhammad Ferjad Naeem et al.
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang, Xiaoyu Liu, Wei Li et al.
Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan et al.
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
Kezhi Kong, Jiani Zhang, Zhengyuan Shen et al.
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Junjie Guo, Chenqiang Gao, Fangcen liu et al.
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li, Jiawei Peng, huiyi chen et al.
Towards Efficient Replay in Federated Incremental Learning
Yichen Li, Qunwei Li, Haozhao Wang et al.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
XuDong Wang, Ishan Misra, Ziyun Zeng et al.
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu, CHEN CHEN, Chao-Han Huck Yang et al.
U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting
Xiang Ma, Xuemei Li, Lexin Fang et al.
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu, Lingzhi Zhang, Jianbo Shi
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi, Hirofumi Inaguma, Xutai Ma et al.
DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception
Jiayu Zou, Kun Tian, Zheng Zhu et al.
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Wang Le, Naiyu Fang et al.
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li, Angela Dai
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim, Jinkyu Kim, Bohyung Han
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed abdelrahman et al.
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.
DragVideo: Interactive Drag-style Video Editing
Yufan Deng, Ruida Wang, Yuhao ZHANG et al.
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Mara Finkelstein, Markus Freitag
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views
Han Huang, Yulun Wu, Junsheng Zhou et al.
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao et al.
Exploiting Label Skews in Federated Learning with Model Concatenation
Yiqun Diao, Qinbin Li, Bingsheng He
Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi, Xinquan Chen, Junqi Gao et al.
Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao et al.
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro, Quinlan Sykora, Sergio Casas et al.
How to Fine-Tune Vision Models with SGD
Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang, Dan Guo, Kun Li et al.
UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li et al.
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization
Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin, Jie Zhang, Zhenyu Huang et al.
Do Generated Data Always Help Contrastive Learning?
Yifei Wang, Jizhe Zhang, Yisen Wang
Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
Zhen Zhao, Zicheng Wang, Dian Yu et al.
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen et al.
GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
Changshuo Wang, Meiqing Wu, Siew-Kei Lam et al.
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li et al.
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen, Guanglu Song, Zeyue Xue et al.
Towards Energy Efficient Spiking Neural Networks: An Unstructured Pruning Framework
Xinyu Shi, Jianhao Ding, Zecheng Hao et al.
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li, Jingbo Wang, Lixin Yang et al.
Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang, Yuan Liu, Zhiyang Dou et al.
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen, Quanyi Li, Tai Wang et al.
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
Yi Wu, Ziqiang Li, Heliang Zheng et al.
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin, Wentang Song, Bin Li et al.
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yanpei Cao et al.
Neural Redshift: Random Networks are not Random Functions
Damien Teney, Armand Nicolicioiu, Valentin Hartmann et al.
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
Generalizable Human Gaussians for Sparse View Synthesis
Youngjoong Kwon, Baole Fang, Yixing Lu et al.
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
Wentao Bao, Lichang Chen, Heng Huang et al.
Active Generalized Category Discovery
Shijie Ma, Fei Zhu, Zhun Zhong et al.
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu, Long Chen, Jan Hünermann et al.
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Jacob, Yikang Shen, Gabriele Farina et al.
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
Qingwen Zhang, Yi Yang, Peizheng Li et al.
Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
Junjie Huang, Yun Ye, Zhujin Liang et al.
SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang, Jiaming Ji, Chunhe Xia et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka