Most Cited CVPR "semantic segmentation masks" Papers
5,589 papers found • Page 11 of 28
Conference
Diversified and Personalized Multi-rater Medical Image Segmentation
Yicheng Wu, Xiangde Luo, Zhe Xu et al.
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin, Haoran Chen, Yue Fan et al.
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model
Zhenghao Pan, Haijin Zeng, Jiezhang Cao et al.
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Bor Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, JunJia Guo, Hang Hua et al.
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang, Xin Li, Shengzhao Wen et al.
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang, Tong Zhang, Alaa Abboud et al.
Generating Content for HDR Deghosting from Frequency View
Tao Hu, Qingsen Yan, Yuankai Qi et al.
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi, Boyi Li, Han Cai et al.
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi, Se Jin Park, Minsu Kim et al.
Binarized Low-light Raw Video Enhancement
Gengchen Zhang, Yulun Zhang, Xin Yuan et al.
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu, Mikihiro Tanaka, Kent Fujiwara
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao, Kunyu Shi, Pengkai Zhu et al.
Adapters Strike Back
Jan-Martin Steitz, Stefan Roth
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su, Peihan Miao, Huanzhang Dou et al.
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Haoyuan Wang, Wenbo Hu, Lei Zhu et al.
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou, Jiahao Li, Zunnan Xu et al.
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu, Ao Li, Yansong Tang et al.
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao, jialiang zhu, Chunyu Wang et al.
NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo, Lijun Zhang, Mengyang Sun et al.
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim, Sebin Shin, Youngjoon Yu et al.
Degradation-Aware Feature Perturbation for All-in-One Image Restoration
Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Andreas Engelhardt, Amit Raj, Mark Boss et al.
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.
Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Zhiyu Qu, LAN YANG, Honggang Zhang et al.
Adversarial Score Distillation: When score distillation meets GAN
Min Wei, Jingkai Zhou, Junyao Sun et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.
TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.
A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?
Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin, Shihao Zou, Yuxuan Ge et al.
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
Joren Brunekreef, Eric Marcus, Ray Sheombarsing et al.
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
Tian Liu, Huixin Zhang, Shubham Parashar et al.
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang et al.
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.
MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li, Yafei Mao, Raja Bala et al.
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim, Sung Jin Um, Sangmin Lee et al.
OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.
MVSAnywhere: Zero-Shot Multi-View Stereo
Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek et al.
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou, Zichong Chen, Hui Huang
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen, Pei Yan, sihe xiang et al.
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Lijun Sheng, Jian Liang, Zilei Wang et al.
Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm, Maolin Gao, Paul Roetzer et al.
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin, Iro Armeni, Marc Pollefeys et al.
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang, Pengyu Zheng, Wanquan Yan et al.
AniDoc: Animation Creation Made Easier
Yihao Meng, Hao Ouyang, Hanlin Wang et al.
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei, Zixuan Pan, Andrew Owens
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
Yifan Yang, Dong Liu, Shuhai Zhang et al.
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization
Junhao Xu, Yanan Zhang, Zhi Cai et al.
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao, Daoyuan Chen, Yilun Huang et al.
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie, Wenqiang Xu, Tutian Tang et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
Zhengyi Zhong, Weidong Bao, Ji Wang et al.
Personalized Preference Fine-tuning of Diffusion Models
Meihua Dang, Anikait Singh, Linqi Zhou et al.
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.
Bidirectional Autoregessive Diffusion Model for Dance Generation
Canyu Zhang, Youbao Tang, NING Zhang et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho, Jonghyun Lee, Seunggyu Chang et al.
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia, Entong Su, Marius Memmel et al.
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Quentin HERAU, Nathan Piasco, Moussab Bennehar et al.
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang, Junru Li, Kai Zhang et al.
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
Kaituo Feng, Changsheng Li, Dongchun Ren et al.
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Xin Wang, Kai Chen, Jiaming Zhang et al.
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
Yingdong Shi, Changming Li, Yifan Wang et al.
Docopilot: Improving Multimodal Models for Document-Level Understanding
Yuchen Duan, Zhe Chen, Yusong Hu et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.
Customization Assistant for Text-to-Image Generation
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu et al.
GenN2N: Generative NeRF2NeRF Translation
Xiangyue Liu, Han Xue, Kunming Luo et al.
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
Shuzhe Wang, Juho Kannala, Daniel Barath
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
Leveraging Predicate and Triplet Learning for Scene Graph Generation
Jiankai Li, Yunhong Wang, Xiefan Guo et al.
Neural Clustering based Visual Representation Learning
Guikun Chen, Xia Li, Yi Yang et al.
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang, Bo-Wen Yin, yuming chen et al.
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge, Yihe Tang, Jiashu Xu et al.
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.
BiPer: Binary Neural Networks using a Periodic Function
Edwin Vargas, Claudia Correa, Carlos Hinojosa et al.
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Yifang Men, Biwen Lei, Yuan Yao et al.
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang, xinyu zhou, Bingxuan Wang et al.
GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao, Boyan Jiang, Xinghui Li et al.
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na, Woo Jae Kim, Kyu Han et al.
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan, Muzhi Zhu, Hao Chen et al.
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Xuesong Chen, Linjiang Huang, Tao Ma et al.
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
Qiang Hu, Zihan Zheng, Houqiang Zhong et al.
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan, Hongbo Liu, Mading Li et al.
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Liyuan Zhu, Shengyu Huang, Konrad Schindler et al.
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.
Towards Universal Soccer Video Understanding
Jiayuan Rao, Haoning Wu, Hao Jiang et al.
MangaNinja: Line Art Colorization with Precise Reference Following
Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Haokun Chen, Hang Li, Yao Zhang et al.
Dynamic Camera Poses and Where to Find Them
Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, 書緯 呂, Yi-Hsuan Tsai et al.
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao, Guanbin Li, Weikai Chen et al.
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.
Cyclic Learning for Binaural Audio Generation and Localization
Zhaojian Li, Bin Zhao, Yuan Yuan
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction
Yifan Wang, Peishan Yang, Zhen Xu et al.
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
Haiyi Qiu, Minghe Gao, Long Qian et al.
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Li Maomao, Yu Li, Tianyu Yang et al.
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds
Kang You, Tong Chen, Dandan Ding et al.
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard, Zhixi Cai, Shiki Wen et al.
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
Jiancheng Zhang, Haijin Zeng, Yongyong Chen et al.
NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc et al.
What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick, Matthew Chan, Towaki Takikawa et al.
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera
Jian Huang, Chengrui Dong, Xuanhua Chen et al.
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng, Jiacheng Wei, Tianrun Chen et al.
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han, Heng Guo, Koki Fukai et al.
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen, Samuel Rota Bulò, Norman Müller et al.
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee, Bolin Lai, Fiona Ryan et al.
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
Jinjing Zhao, Fangyun Wei, Chang Xu
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
Hao Xu, Li Haipeng, Yinqiao Wang et al.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv, Yuval Kirstain, Amit Zohar et al.
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo et al.
Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye, Wenze Liu, He Guo et al.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe, Roger Girgis, Anthony Gosselin et al.
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang, Chen Guo, Manuel Kaufmann et al.
Unified Entropy Optimization for Open-Set Test-Time Adaptation
Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang, Guoliang Kang
Region-Based Representations Revisited
Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni, Juncheng Wu, Zian Wang et al.
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz et al.
Functional Diffusion
Biao Zhang, Peter Wonka
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang et al.
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Qi Jia, Yaqi Cai, Qi Jia et al.
Human Motion Instruction Tuning
Lei Li, Sen Jia, Jianhao Wang et al.
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang et al.
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Hongxun Yao
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu, Mingyu Liu, Zeyu Zhu et al.
Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse
Yining Wang, Junjie Sun, Chenyue Wang et al.
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.
3D Neural Edge Reconstruction
Lei Li, Songyou Peng, Zehao Yu et al.
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
Vladimir Yugay, Theo Gevers, Martin R. Oswald
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Junwen Xiong, Peng Zhang, Tao You et al.
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Yifan Zhou, Zeqi Xiao, Shuai Yang et al.
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang, Xiaolong Li, Fei Gao et al.
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got et al.
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
Jingxuan Wei, Cheng Tan, Qi Chen et al.
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Rui Gong, Weide Liu, ZAIWANG GU et al.
HEAL-SWIN: A Vision Transformer On The Sphere
Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye, Dan Xu
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
Explaining CLIP's Performance Disparities on Data from Blind/Low Vision Users
Daniela Massiceti, Camilla Longden, Agnieszka Słowik et al.