Most Cited CVPR "detailed depth estimation" Papers
5,589 papers found • Page 5 of 28
Conference
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression Features
Andre Rochow, Max Schwarz, Sven Behnke
6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
Li Xu, Haoxuan Qu, Yujun Cai et al.
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong, Chuan Fang, Liefeng Bo et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen, Yusuke Hirota, Mayu Otani et al.
Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo, Hyunseo Koh, Wonje Jeung et al.
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu, Benran Hu, Chi-Keung Tang et al.
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou, Yicong Liu, Yiman Hu et al.
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang, Biao Gong, Yutong Feng et al.
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin et al.
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment
yiming ren, xiao han, Chengfeng Zhao et al.
Domain Prompt Learning with Quaternion Networks
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.
StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu, Yizi Chen, Loic Landrieu et al.
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao, Lei Feng, Kai Tang et al.
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Quan Liu, Hongzi Zhu, Zhenxi Wang et al.
COCONut: Modernizing COCO Segmentation
Xueqing Deng, Qihang Yu, Peng Wang et al.
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li, Kaichun Mo, Yueqi Duan et al.
Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation
zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li, Baojie Fan, Jiandong Tian et al.
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu, Bing Shuai, Yanbei Chen et al.
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He, Garvita Tiwari, Tolga Birdal et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
wenlong deng, Christos Thrampoulidis, Xiaoxiao Li
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang, Donghyun Kim, Zihang Meng et al.
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
yaofeng xie, Lingwei Kong, Kai Chen et al.
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov, David Shustin, Ruyu Yan et al.
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng, Quan Liu, Can Cui et al.
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit et al.
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang, Kaixin Yao, Chengcheng Guo et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou, Andy Shih, Chenlin Meng et al.
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding, Liulei Li, Wenguan Wang et al.
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.
Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li, Wei Zhang, Yining Liu et al.
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang, Yibo Zhang, Quan Zheng et al.
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie, Jiahao Li, Hao Tan et al.
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai, Yang Zhang, Tao Liu et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
Georg Bökman, Johan Edstedt, Michael Felsberg et al.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen, Yipeng Qin, Lingjie Liu et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu, Huabin Liu, Yu Qiao et al.
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu et al.
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li, Haoke Xiao, Lv Tang
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han, Fangrui Zhu, Qianru Lao et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham, Vojtech Franc
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang, Haotian Qian, Zhilong Zhang et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue, Ehsan Elhamifar
MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu et al.
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee, Soyeong Kwon, Taehwan Kim
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang, Nicolas Michel, Ling Xiao et al.
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu, Zehao Wang, Georgios Kouros et al.
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie, De-An Huang, Wenda Chu et al.
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang et al.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen, Haris Vikalo
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters
Woohyeok Kim, Geonu Kim, Junyong Lee et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou, Inju Ha, Hyewon Yoo et al.
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi, Lu Ren, De-Chuan Zhan et al.
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.
LiDAR-based Person Re-identification
Wenxuan Guo, Zhiyu Pan, Yingping Liang et al.
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao, Peng Xu, Pengda Qin et al.
eTraM: Event-based Traffic Monitoring Dataset
Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela et al.
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong, Xing Zhang, Bihui Chen et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou, Li Haipeng, Ziyi Wang et al.
Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang Chen, Xiangqian Wu
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang, Lourdes Agapito
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang, Liangzu Peng, Rene Vidal et al.
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu, Yu Liu et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang et al.
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim, Tae Hyun Kim, Sungyong Baik
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang, Jianmin Bao, Shuyang Gu et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan, Rundong He, Yicong Dong et al.
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng, Jingkuan Song, Lianli Gao et al.
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao, Guo-Ye Yang, Xue Yang et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng, Wenguan Wang, Fan Ma et al.
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan et al.
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen et al.
FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu, Xingxuan Zhang, Renzhe Xu et al.
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou, Ziqi Pang, Yu-Xiong Wang
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.