Most Cited CVPR "autoregressive visual generation" Papers
5,589 papers found • Page 26 of 28
Conference
Continuous Adverse Weather Removal via Degradation-Aware Distillation
Xin Lu, Jie Xiao, Yurui Zhu et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
Yiyang Shen, Kun Zhou, He Wang et al.
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
Xiangtao Zhang, Sheng Li, Ao Li et al.
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang, Zekai Li, Zhi-Qi Cheng et al.
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument, Robin Bruneau, Yvain Queau et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes
Andrea Porfiri Dal Cin, Georgi Dikov, Jihong Ju et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning
Zijian Gao, Wangwang Jia, Xingxing Zhang et al.
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
Ren Wang, Haoliang Sun, Yuxiu Lin et al.
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen, Xiangtai Li, Yining Li et al.
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Chenxu Dang, Pei An, Xinmin Zhang et al.
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma, Sumit Patidar, Iain Haughton et al.
Structure-from-Motion with a Non-Parametric Camera Model
Yihan Wang, Linfei Pan, Marc Pollefeys et al.
EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera
Bohan Yu, Jin Han, Boxin Shi et al.
LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning
Xiaoning Sun, Dong Wei, Huaijiang Sun et al.
Sea-ing in Low-light
Nisha Varghese, A. N. Rajagopalan
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region
Jianping Wu
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
Chong Yu, Tao Chen, Zhongxue Gan
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks
Daizong Liu, Wei Hu
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
Ding Ding, Yueming Pan, Ruoyu Feng et al.
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
Mariamma Antony, Rajiv Porana, Sahil M. Lathiya et al.
Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework
Hanrui Zhao, Niuniu Qi, Mengxin Ren et al.
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
Nicole Meng, Caleb Manicke, Ronak Sahu et al.
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin, Cheng-En Wu, Huanran Li et al.
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li, Boyang Li
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
Yongfan Liu, Hyoukjun Kwon
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo, Xiaode Liu, Yuanpei Chen et al.
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
Longquan Dai, He Wang, Jinhui Tang
Robotic Visual Instruction
Yanbang Li, ZiYang Gong, Haoyang Li et al.
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
Yuhui Liu, Liangxun Ou, Qiang Fu et al.
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi et al.
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
Query Efficient Black-Box Visual Prompting with Subspace Learning
Haozhen Zhang, Zhaogeng Liu, Hualin Zhang et al.
Fingerprinting Denoising Diffusion Probabilistic Models
Huan Teng, Yuhui Quan, Chengyu Wang et al.
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So, Jiwoong Shin, Chaeyeon Jang et al.
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Minh Kha Do, Kang Han, Phu Lai et al.
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
Shaoming Li, Qing Cai, Songqi KONG et al.
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
Changsheng Lv, Mengshi Qi, Liang Liu et al.
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
Jeongryong Lee, Yejee Shin, Geonhui Son et al.
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining
Shanglin Liu, Jianming Lv, Jingdan Kang et al.
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
Jingyi Xie, Jintao Yang, Zhunchen Luo et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu, Jiangwei Lao, Deyi Ji et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR
Xugong Qin, peng zhang, Jun Jie Ou Yang et al.
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
Mingzhe Guo, Weiping Tan, Wenyu Ran et al.
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
Yuechen Xie, Jie Song, Huiqiong Wang et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
Shadow Generation Using Diffusion Model with Geometry Prior
Haonan Zhao, Qingyang Liu, Xinhao Tao et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video
Noah Stier, Alex Rich, Pradeep Sen et al.
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik et al.
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, weixing chen, Yang Liu et al.
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
Hang Zhou, Xinxin Zuo, Rui Ma et al.
Q-PART: Quasi-Periodic Adaptive Regression with Test-time Training for Pediatric Left Ventricular Ejection Fraction Regression
Jie Liu, Tiexin Qin, Hui Liu et al.
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
Eric Hedlin, Munawar Hayat, Fatih Porikli et al.
iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting
Tuo Cao, Fei LUO, Jiongming Qin et al.
Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding
Ye Chen, Zhangli Hu, Zhongyin Zhao et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.
Automated Proof of Polynomial Inequalities via Reinforcement Learning
Banglong Liu, Niuniu Qi, Xia Zeng et al.
Active Hyperspectral Imaging Using an Event Camera
Bohan Yu, Jinxiu Liang, Zhuofeng Wang et al.
Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression
Lucas Relic, Roberto Azevedo, Yang Zhang et al.
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang, Junkai Chen, Beier Zhu et al.
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li, Yanmin Wu, Jiarui Meng et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Riku Murai, Eric Dexheimer, Andrew J. Davison
UNIALIGN: Scaling Multimodal Alignment within One Unified Model
bo zhou, Liulei Li, Yujia Wang et al.
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang, Peng Zhang, Donglin Yang et al.
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution
Fei Ye, Adrian Bors
OffsetOPT: Explicit Surface Reconstruction without Normals
Huan Lei
Zero-Shot 4D Lidar Panoptic Segmentation
Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo, Yang Liu, weixing chen et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance
Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng et al.
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
Basim Azam, Naveed Akhtar
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals
Changhao Peng
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang, Wei Zhai, Hongchen Luo et al.
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
Zhuoran ZHAO, Linlin Yang, Pengzhan Sun et al.
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani, Savas Ozkan, Sijun Cho et al.
Towards Smart Point-and-Shoot Photography
Jiawan Li, Fei Zhou, Zhipeng Zhong et al.
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction
Yuan Wang, Yali Li, Lixiang Li et al.
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
Xinrui Wang, Lanqing Guo, Xiyu Wang et al.
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
wenqiao Li, Yao Gu, Xintao Chen et al.
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.
Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays
Shashwath Bharadwaj, Ruangrawee Kitichotkul, Akshay Agarwal et al.
Learning on Model Weights using Tree Experts
Eliahu Horwitz, Bar Cavia, Jonathan Kahana et al.
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
Pan Yin, Kaiyu Li, Xiangyong Cao et al.
Adaptive Parameter Selection for Tuning Vision-Language Models
Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo et al.
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li, Puhao Li, Tengyu Liu et al.
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui, Tengyu Liu, Ziyu Meng et al.
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
Jiyuan Liu, Xinwang Liu, chuankun Li et al.
The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation
Marcus Nordström, Atsuto Maki, Henrik Hult
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
Yinan Liang, Ziwei Wang, Xiuwei Xu et al.
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu, Peng Cui, Bingning Wang et al.
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture
Kenkun Liu, Yurong Fu, Weihao Yuan et al.
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
Guo Junfu, Yu Xin, Gaoyi Liu et al.
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui, Hui Li, Qingkun Su et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
Huakai Lai, Guoxin Xiong, Huayu Mai et al.
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
Xiao Cui, Yulei Qin, Wengang Zhou et al.
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
Buzhen Huang, Chen Li, Chongyang Xu et al.
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
Zhiyu Qu, Yunqi Miao, Zhensong Zhang et al.
Incremental Object Keypoint Learning
Mingfu Liang, Jiahuan Zhou, Xu Zou et al.
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li, Bin Chen, Chen Zhao et al.
M3GYM: A Large-Scale Multimodal Multi-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings
Qingzheng Xu, Ru Cao, Xin Shen et al.
Star with Bilinear Mapping
Zelin Peng, Yu Huang, Zhengqin Xu et al.
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.
Learning Extremely High Density Crowds as Active Matters
Feixiang He, Jiangbei Yue, Jialin Zhu et al.
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
Junha Lee, Chunghyun Park, Jaesung Choe et al.
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi, Min-Cheol Sagong, SeokYeong Lee et al.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang, Ziqing Chen, Junyu Chen et al.
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Zeqing Wang, Qingyang Ma, Wentao Wan et al.
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau, Guillaume Bourmaud
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
Zhongze Wang, Haitao Zhao, Jingchao Peng et al.
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
Dubing Chen, Huan Zheng, Jin Fang et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
Jiamin WU, Kenkun Liu, Han Gao et al.
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
Haiyi Qiu, Minghe Gao, Long Qian et al.
Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target.
Dokyoon Yoon, Youngsook Song, Woomyoung Park
Shape and Texture: What Influences Reliable Optical Flow Estimation?
Libo Long, Xiao Hu, Jochen Lang
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
Li Lin, Santosh Santosh, Mingyang Wu et al.
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang, Qianli Luo, Guowu Yang et al.
HOT: Hadamard-based Optimized Training
Seonggon Kim, Juncheol Shin, Seung-taek Woo et al.
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang, Chenda Duan, Zhenghao Peng et al.
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun et al.
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
Yao Mu, Tianxing Chen, Zanxin Chen et al.
Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation
Hongmei Yin, Tingliang Feng, Fan Lyu et al.
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Yuxin Peng
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
Qianhan Feng, Wenshuo Li, Tong Lin et al.
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han et al.
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
Yuanqi Yao, Siao Liu, Haoming Song et al.
Rectification-specific Supervision and Constrained Estimator for Online Stereo Rectification
Rui Gong, Kim-Hui Yap, Weide Liu et al.
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
Xin Ye, Burhan Yaman, Sheng Cheng et al.
Dual Focus-Attention Transformer for Robust Point Cloud Registration
Kexue Fu, Ming'zhi Yuan, Changwei Wang et al.
Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions
Tianhao Ma, Han Chen, Juncheng Hu et al.
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou, Dan Guo, Ruohao Guo et al.
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
Xueyi Ke, Satoshi Tsutsui, Yayun Zhang et al.
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model
Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang et al.
IceDiff: High Resolution and High-Quality Arctic Sea Ice Forecasting with Generative Diffusion Prior
Jingyi Xu, Siwei Tu, Weidong Yang et al.
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao, WEI CHEN, Qiang Qiu
MVBoost: Boost 3D Reconstruction with Multi-View Refinement
Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma et al.
Learning Textual Prompts for Open-World Semi-Supervised Learning
Yuxin Fan, Junbiao Cui, Jiye Liang
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney et al.
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection
Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou et al.
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Silin Gao, Sheryl Mathew, Li Mi et al.
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining
Shangquan Sun, Wenqi Ren, Juxiang Zhou et al.
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
Yongli Xiang, Ziming Hong, Lina Yao et al.
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He, Jiangning Zhang, Yuxuan Cai et al.
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Sagar Soni, Akshay Dudhane, Hiyam Debary et al.
Learning Endogenous Attention for Incremental Object Detection
Xiang Song, Yuhang He, Jingyuan Li et al.
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
Ziyue Zhu, Shenlong Wang, Jin Xie et al.
Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data
Yuchuan Li, Jae-Mo Kang, Il-Min Kim
Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation
Fangyun Wei, Jinjing Zhao, Kun Yan et al.
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
Jiahui Zhang, Fangneng Zhan, Ling Shao et al.
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh et al.
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park et al.
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid et al.
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
Yiyang Chen, Tianyu Ding, Lei Wang et al.
CARL: A Framework for Equivariant Image Registration
Hastings Greer, Lin Tian, François-Xavier Vialard et al.
Perceptual Inductive Bias Is What You Need Before Contrastive Learning
Junru Zhao, Tianqin Li, Dunhan Jiang et al.
Animate and Sound an Image
Xihua Wang, Ruihua Song, Chongxuan Li et al.
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns
Zhenyu Zhou, Chengdong Dong, Ajay Kumar
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang, Xiangyong Cao, Xuan Wu et al.
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang, Xinzhu Ma, Encheng Su et al.
Dynamic Prompt Optimizing for Text-to-Image Generation
Wenyi Mo, Tianyu Zhang, Yalong Bai et al.