Most Cited CVPR "listwise ranking" Papers
5,589 papers found • Page 6 of 28
Conference
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen, Quanyi Li, Tai Wang et al.
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan, Gaoang Wang, Yu LIN et al.
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim, Subhankar Roy, Slim Essid et al.
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao TIAN, Jie Shao et al.
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao, Jingyu Song, Katherine Skinner
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen et al.
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang, Dan Guo, Kun Li et al.
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li et al.
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang, Bingliang Li, Ailing Zeng et al.
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Zehan Zheng, Fan Lu, Weiyi Xue et al.
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Enshen Zhou, Qi Su, Cheng Chi et al.
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng, Guole Shen, Chen Xun et al.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
Yuncong Yang, Han Yang, Jiachen Zhou et al.
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao et al.
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li, Laurence Yang, Bocheng Ren et al.
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue, Kumar Ashutosh, Kristen Grauman
AutoAD III: The Prequel – Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani et al.
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng, Tri Cao, Zhirui Chen et al.
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao, Jing Zhang, Zhuo Chen et al.
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye et al.
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu, Sibo Yin, Guohao Zhao et al.
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang, Biao Gong, Yutong Feng et al.
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You, Yifei Min, Weicheng Dai et al.
CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltán Á. Milacski et al.
Generative Gaussian Splatting for Unbounded 3D City Generation
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
One Diffusion to Generate Them All
Duong H. Le, Tuan Pham, Sangho Lee et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng, Yujie Zhong, Zequn Jie et al.
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng et al.
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
Three Pillars Improving Vision Foundation Model Distillation for Lidar
Gilles Puy, Spyros Gidaris, Alexandre Boulch et al.
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
High-fidelity Person-centric Subject-to-Image Synthesis
Yibin Wang, Weizhong Zhang, Jianwei Zheng et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Simple Semantic-Aided Few-Shot Learning
Hai Zhang, Junzhe Xu, Shanlin Jiang et al.
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng, Qifei Wang, Wei Wei et al.
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
Active Generalized Category Discovery
Shijie Ma, Fei Zhu, Zhun Zhong et al.
Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang li, Hongru Zhu et al.
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo et al.
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang et al.
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li, Bing Hu, Rui Shao et al.
Language-driven Grasp Detection
An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing, Juncheng Hu, Guotao Liang et al.
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion
Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang, Shuai Liu, Hongming Guo et al.
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang, Zhong-Zhi Li, Fei Yin et al.
MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion
Roy Kapon, Guy Tevet, Daniel Cohen-Or et al.
3D-HGS: 3D Half-Gaussian Splatting
Haolin Li, Jinyang Liu, Mario Sznaier et al.
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye, Huijuan Huang, Xintao Wang et al.
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo, Jinkyu Kim, Geeho Kim et al.
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space
Chengyang Hu, Ke-Yue Zhang, Taiping Yao et al.
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie et al.
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
Wenxiao Deng, Wenbin Li, Tianyu Ding et al.
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin, Yoad Tewel, Hilit Segev et al.
PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models
Minghao Chen, Roman Shapovalov, Iro Laina et al.
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui, Tengyu Liu, Nian Liu et al.
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami, Albin Soutif, Yuyang Liu et al.
Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen et al.
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang, Songpengcheng Xia, Lei Chu et al.
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir, William Avery, Md Mostafijur Rahman et al.
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang, Jingyuan Xia, Shengxi Li et al.
Transductive Zero-Shot and Few-Shot CLIP
Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.
CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli et al.
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.
Inversion-Free Image Editing with Language-Guided Diffusion Models
Sihan Xu, Yidong Huang, Jiayi Pan et al.
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang, Haodong Lu, Lina Yao et al.
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee, Bumsub Ham
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Ruofan Liang, Žan Gojčič, Huan Ling et al.
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang, Shaodi You, Yu Li et al.
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu, Sicheng Mo, Yin Li
Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
Pingping Zhang, Tianyu Yan, Yang Liu et al.
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Renshuai Liu, Bowen Ma, Wei Zhang et al.
Sieve: Multimodal Dataset Pruning using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.
OmniViD: A Generative Framework for Universal Video Understanding
Junke Wang, Dongdong Chen, Chong Luo et al.
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models
Zhejun Zhang, Peter Karkus, Maximilian Igl et al.
Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot Crowley
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin, Wenfei Yang, Yuan Gao et al.
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang, Yinpeng Dong, Shouwei Ruan et al.
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li, Ke Xu, Gerhard Hancke et al.
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei, Yixiao Ge, Kun Yi et al.
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Kexue Fu, Minghong Duan et al.
Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai, Yuan Shen, Emily Y. Chen et al.
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.
SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma, Vasu Agrawal, Haithem Turki et al.
MonSter: Marry Monodepth to Stereo Unleashes Power
JunDa Cheng, Longliang Liu, Gangwei Xu et al.
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.
Towards Generalizable Multi-Object Tracking
Zheng Qin, Le Wang, Sanping Zhou et al.
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Xiaoyang Wang, Huihui Bai, Limin Yu et al.
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng, S.-H. Gary Chan
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
Sangmin Woo, byeongjun park, Hyojun Go et al.
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
Phong Tran, Egor Zakharov, Long Nhat Ho et al.
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang, Zhizhou Sha, Zheng Ding et al.
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.
Material Palette: Extraction of Materials from a Single Image
Ivan Lopes, Fabio Pizzati, Raoul de Charette
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
LT3SD: Latent Trees for 3D Scene Diffusion
Quan Meng, Lei Li, Matthias Nießner et al.
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar et al.
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang, Zhuotao Tian, Li Jiang et al.
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu, Haoran Lu, Yiyan Wang et al.
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo, Pedro Morgado
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu, Xiaofei Wei, Zixuan Zeng et al.
Modular Blind Video Quality Assessment
Wen Wen, Mu Li, Yabin ZHANG et al.
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen, Boyang Sun, Anran Zhang et al.
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen, Benteng Ma, Hengfei Cui et al.
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Runmin Dong, Shuai Yuan, Bin Luo et al.
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Changki Sung, Wanhee Kim, Jungho An et al.
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa, Seiichi Uchida
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li, Jiacheng Zhang, Jichang Li et al.
PhysGen3D: Crafting a Miniature Interactive World from a Single Image
Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
Wei Luo, Yunkang Cao, Haiming Yao et al.
FlowIE: Efficient Image Enhancement via Rectified Flow
Yixuan Zhu, Wenliang Zhao, Ao Li et al.
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti, Oisin Mac Aodha, Hakan Bilen
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie, Yun Xing, Gongjie Zhang et al.
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang, Zheqi Lv
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang, Jiawei Du, Weiying Xie et al.
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu, Xiangtai Li, Chenyang Si et al.
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan LI, Shuai Li, Xindong Zhang et al.
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan, Neel Dey, Polina Golland
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang, Liu Liu, Tianheng Cheng et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo et al.
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou et al.
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang, jiawei zhang, Hao Li et al.
PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
Alessandro Flaborea, Guido M. D&, #x27 et al.
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It
Adam Lilja, Junsheng Fu, Erik Stenborg et al.
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
Li Lin, Santosh Santosh, Mingyang Wu et al.
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo et al.
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov, Janis Keuper
Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu, Zeliang Zhang, Susan Liang et al.
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang et al.
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu, Anton Obukhov, Jan D. Wegner et al.
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang et al.
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim, Woojung Han, Dayun Ju et al.
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu, Wentong Li, Song Wang et al.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Wentao Qu, Yuantian Shao, Lingwu Meng et al.
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei et al.
SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina et al.
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu, Fangyun Wei, Yanye Lu
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu, Mathias Gehrig, Qing Lyu et al.
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling, Lin Chen, Pan Zhang et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, haibo yang et al.
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Ziheng Ouyang, Zhen Li, Qibin Hou
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu, Cairong Zhang, Yitong Wang et al.
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou, Junbin Xiao, Qingyun Li et al.
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features
Chenfeng Xu, Huan Ling, Sanja Fidler et al.
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li, Mengyuan Liu, Hong Liu et al.
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Addison, Lin Wang
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu, Lingdong Kong, Xiaoyang Wu et al.