Most Cited CVPR "distance-over-gradient technique" Papers
5,589 papers found • Page 17 of 28
Conference
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.
Compositional Video Understanding with Spatiotemporal Structure-based Transformers
Hoyeoung Yun, Jinwoo Ahn, Minseo Kim et al.
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
Markus Karmann, Onay Urfalioglu
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan et al.
Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning
Shouhang Zhu, Chenglin Li, Yuankun Jiang et al.
Coherent Temporal Synthesis for Incremental Action Segmentation
Guodong Ding, Hans Golong, Angela Yao
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
ChangHee Yang, ChanHee Kang, Kyeongbo Kong et al.
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Yifang Men, Yuan Yao, Miaomiao Cui et al.
Estimating Extreme 3D Image Rotations using Cascaded Attention
Shay Dekel, Yosi Keller, Martin Čadík
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
Greg Heinrich, Mike Ranzinger, Danny Yin et al.
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Yong Shu, Liquan Shen, Xiangyu Hu et al.
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular Stereo and RGB-D Cameras
Huajian Huang, Longwei Li, Hui Cheng et al.
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou et al.
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh, Jianming Zhang, Qing Liu et al.
GraCo: Granularity-Controllable Interactive Segmentation
Yian Zhao, Kehan Li, Zesen Cheng et al.
Segment Every Out-of-Distribution Object
Wenjie Zhao, Jia Li, Xin Dong et al.
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou, Peiqing Yang, Jianyi Wang et al.
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu, Jinjin Gu, Zheyuan Li et al.
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui, Paolo Favaro
Font-Agent: Enhancing Font Understanding with Large Language Models
Yingxin Lai, Cuijie Xu, Haitian Shi et al.
Open-Vocabulary Object 6D Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh et al.
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin, Yi Jiang, Lizhen Qu et al.
Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets
Muhammad Abdullah Jamal, Omid Mohareri
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian, Qi Cai, Yingwei Pan et al.
STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation
Yisi Luo, Xile Zhao, Kai Ye et al.
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo, Weihong Ren, Weibo Jiang et al.
3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping
Chenhui Shi, Fulin Tang, Ning An et al.
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu, Can Cui, Yunsheng Ma et al.
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
Xudong Jiang, Fangjinhua Wang, Silvano Galliani et al.
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia, Jiahao Li, Bin Li et al.
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim, Jungwon Park, Wonjong Rhee
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
Hongkai Lin, Dingkang Liang, Zhenghao Qi et al.
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie, Alain Pagani, Didier Stricker
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye et al.
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer, Peter Wonka, Maks Ovsjanikov
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu et al.
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU, Dongliang Chang, Timothy Hospedales et al.
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad, Yogesh S. Rawat
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty, Marc Habermann, Guoxing Sun et al.
Neighbor Relations Matter in Video Scene Detection
Jiawei Tan, Hongxing Wang, Jiaxin Li et al.
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Zhenyu Zhou, Defang Chen, Can Wang et al.
Referring Image Editing: Object-level Image Editing via Referring Expressions
Chang Liu, Xiangtai Li, Henghui Ding
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang, Tong Zhang, Alaa Abboud et al.
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman, Xiaoman Zhang, Emma Chen et al.
Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
Dong Li, Wenqi Zhong, Wei Yu et al.
Temporally Consistent Object-Centric Learning by Contrasting Slots
Anna Manasyan, Maximilian Seitzer, Filip Radovic et al.
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon et al.
Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen, Xiangpeng Tian, Shuhua Xiong et al.
Mask Grounding for Referring Image Segmentation
Yong Xien Chng, Henry Zheng, Yizeng Han et al.
SignGraph: A Sign Sequence is Worth Graphs of Nodes
Shiwei Gan, Yafeng Yin, Zhiwei Jiang et al.
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
Zixian Gao, Xun Jiang, Xing Xu et al.
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
Shuzhe Wang, Juho Kannala, Daniel Barath
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling, Lin Chen, Pan Zhang et al.
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.
SET: Spectral Enhancement for Tiny Object Detection
Huixin Sun, Runqi Wang, Yanjing Li et al.
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing "Jed" Yang, Alexander Sax, Kevin Liang et al.
Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation
Hyejin Oh, Woo-Shik Kim, Sangyoon Lee et al.
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue, Yulin Wang, Haojun Jiang et al.
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang, Ruihao Gong, Jing Liu et al.
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld et al.
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li et al.
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
jiajun cao, Yuan Zhang, Tao Huang et al.
Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users
Daniela Massiceti, Camilla Longden, Agnieszka Słowik et al.
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang, Hongye Fu, Wei Zou et al.
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye, Dan Xu
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan, Dongming Wu, Wencheng Han et al.
Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion
Jiangtong Tan, Jie Huang, Naishan Zheng et al.
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
Keyizhi Xu, Chi Zhang, Zhan Chen et al.
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection
Ming Sun, Rui Wang, Zixuan Zhu et al.
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
Jinglin Xu, Guohao Zhao, Sibo Yin et al.
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina, Massimiliano Mancini, Elia Cunegatti et al.
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li, Ming Qian, Gui-Song Xia
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin, Haisheng Su, Kai Liu et al.
CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Qixuan Zheng, Ming Zhang, Hong Yan
FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning
Qiwei Li, Yuxin Peng, Jiahuan Zhou
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
Dabing Yu, Zheng Gao
Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation
Yi Zhang, Meng-Hao Guo, Miao Wang et al.
GALA: Generating Animatable Layered Assets from a Single Scan
Taeksoo Kim, Byungjun Kim, Shunsuke Saito et al.
Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo, Feiyang Qin, Can Cui et al.
Hearing Anything Anywhere
Mason Wang, Ryosuke Sawata, Samuel Clarke et al.
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam et al.
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis
You Wang, Li Fang, Hao Zhu et al.
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao, Yu lei, Feng Zhou et al.
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
Qingyuan Wang, Rui Song, Jiaojiao Li et al.
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Song Xia, Yi Yu, Wenhan Yang et al.
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng, Xiaoyong Chen, Jiaming Liang et al.
Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data
Wenxin Su, Song Tang, Xiaofeng Liu et al.
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong, Haoran Zhou, Kang Xia et al.
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu, Sida Peng, Haotong Lin et al.
Context-Guided Spatio-Temporal Video Grounding
Xin Gu, Heng Fan, Yan Huang et al.
Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification
Haobin Zhong, Shuai He, Anlong Ming et al.
Frequency-Biased Synergistic Design for Image Compression and Compensation
Jiaming Liu, Qi Zheng, Zihao Liu et al.
APT: Adaptive Personalized Training for Diffusion Models with Limited Data
JungWoo Chae, Jiyoon Kim, Jaewoong Choi et al.
Foundations of the Theory of Performance-Based Ranking
Sébastien Piérard, Anaïs Halin, Anthony Cioppa et al.
WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels
Hyeokjun Kweon, Kuk-Jin Yoon
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.
Convex Combination Star Shape Prior for Data-driven Image Semantic Segmentation
Xinyu Zhao, Jun Xie, Shengzhe Chen et al.
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang, Bo Li, Shuang Wu et al.
Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning
Xiaohan Zou, Wenchao Ma, Shu Zhao
Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li et al.
Rotation-Equivariant Self-Supervised Method in Image Denoising
Hanze Liu, Jiahong Fu, Qi Xie et al.
A Unified Approach for Text- and Image-guided 4D Scene Generation
Yufeng Zheng, Xueting Li, Koki Nagano et al.
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao, Shuyang Liu, Xiaoyun Yuan et al.
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He, Wenwu Yang
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li et al.
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang, Nikita Karaev, Christian Rupprecht et al.
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu, Shu Kong, Tao Hu et al.
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo, Hongping Gan
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
Jiazi Bu, Pengyang Ling, Pan Zhang et al.
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving
Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.
Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning
Pierre Marza, Laetitia Matignon, Olivier Simonin et al.
Disentangled Pose and Appearance Guidance for Multi-Pose Generation
Tengfei Xiao, Yue Wu, Yuelong Li et al.
VI^3NR: Variance Informed Initialization for Implicit Neural Representations
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Sameera Ramasinghe et al.
Efficient Diffusion as Low Light Enhancer
Guanzhou Lan, Qianli Ma, YUQI YANG et al.
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou, Boxiao Liu, Yi Zhang et al.
GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
Mengqiao Han, Liyuan Pan, Xiabi Liu
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang, Huairui Wang, Zhenzhong Chen et al.
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng, Linyuan Zhou, Han Li et al.
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang, Yue Xu, Cewu Lu et al.
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo et al.
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen, Lin Gu, Dezhi Zheng et al.
Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion
Xiangfeng Xu, Pinyi Zhang, Wenxuan Huang et al.
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
Image Processing GNN: Breaking Rigidity in Super-Resolution
Yuchuan Tian, Hanting Chen, Chao Xu et al.
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models
Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta et al.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni, Aradhye Agarwal, Chetan Arora
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework
Ping Guo, Cheng Gong, Fei Liu et al.
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU, Hui Zhang, Jiaqian Yu et al.
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen, Yue Song, Gaowen Liu et al.
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa, Seiichi Uchida
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu, Tianbin, Quanfeng Lu et al.
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
Huaxin Zhang, Xiaohao Xu, Xiang Wang et al.
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation
Feng Yu, Jiacheng Cao, Li Liu et al.
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Learning from Streaming Video with Orthogonal Gradients
Tengda Han, Dilara Gokay, Joseph Heyward et al.
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu et al.
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu, KE LIANG, Zhibin Dong et al.
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.
MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation
Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha et al.
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang et al.
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel, Changhoon Kim, Sheng Cheng et al.
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
Neural Video Compression with Feature Modulation
Jiahao Li, Bin Li, Yan Lu
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li, Yishuo Cai, Haowei Li et al.
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan et al.
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu, Wenjie Wang, Yongqi Li et al.
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
Zichen Tang, Yuan Yao, Miaomiao Cui et al.
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin, Chunyu Wei, Li He et al.
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Žan Gojčič et al.
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.
Bridging Gait Recognition and Large Language Models Sequence Modeling
Shaopeng Yang, Jilong Wang, Saihui Hou et al.
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen, Jiahao Qi, Xingyue Liu et al.
Towards Practical Real-Time Neural Video Compression
Zhaoyang Jia, Bin Li, Jiahao Li et al.
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li, Aming Wu, Yaowei Wang et al.
EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong, Shikun Liu, Xiaoyang Lyu et al.
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu, Shiwei Zhang, Xiaofeng Wang et al.
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su et al.
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu, Lingling Zhang, Jun Liu et al.
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang et al.
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.
Cross-Rejective Open-Set SAR Image Registration
Shasha Mao, Shiming Lu, Zhaolong Du et al.
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Xin Wang, Kai Chen, Jiaming Zhang et al.
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu, Tao Hu, Ziwei Liu
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui, Tengyu Liu, Nian Liu et al.
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
Hyo-Jun Lee, Yeong Jun Koh, Hanul Kim et al.
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu, Fangzhou Hong, Ziwei Liu
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao, Yunuo Cai, Qiaole Dong et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang, Tongtian Yue, Yisi Zhang et al.
PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Diantao Tu, Hainan Cui, Xianwei Zheng et al.
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
Haoquan Zhang, Ronggang Huang, Yi Xie et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao et al.
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li, Liang Wang, Chao Wang et al.
FedCS: Coreset Selection for Federated Learning
Chenhe Hao, Weiying Xie, Daixun Li et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong, Yu LIU, Zhi Li et al.
Ungeneralizable Examples
Jingwen Ye, Xinchao Wang
GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning
Lin Bie, Shouan Pan, Siqi Li et al.