Most Cited CVPR "underwater imaging models" Papers
5,589 papers found • Page 4 of 28
Conference
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang, Junli Cao, Vidit Goel et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu, Tao Jiang, Yining Li et al.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng, Xingyu Zhang, Yuming Li et al.
Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar, Xiaohan Wang, Yann Dubois et al.
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
Zebin Xing, Xingyu Zhang, Yang Hu et al.
Multiple Object Tracking as ID Prediction
Ruopeng Gao, Ji Qi, Limin Wang
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette, Avneesh Sud, Thomas Leung et al.
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
Towards Text-guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu, Sule Bai, Guanbin Li et al.
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen, Yan Di, Guangyao Zhai et al.
Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang et al.
Goku: Flow Based Video Generative Foundation Models
Shoufa Chen, Chongjian GE, Yuqi Zhang et al.
GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen, Xiaoming Zhao, Jason Ren et al.
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
Xiaofeng Cong, Jie Gui, Jing Zhang et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu, Chenlin Zhang, Chen Zhao et al.
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du, Xinyao Li, Fengling Li et al.
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Yifang Men, Yuan Yao, Miaomiao Cui et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller, Katja Schwarz, Barbara Roessle et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Zongsheng Yue, Kang Liao, Chen Change Loy
Few-Shot Object Detection with Foundation Models
Guangxing Han, Ser-Nam Lim
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
Greg Heinrich, Mike Ranzinger, Danny Yin et al.
Text-Image Alignment for Diffusion-Based Perception
Neehar Kondapaneni, Markus Marks, Manuel Knott et al.
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo, Xue Yang, Yi Yu et al.
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Shivangi Aneja, Justus Thies, Angela Dai et al.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni, Aradhye Agarwal, Chetan Arora
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Yikun Liu, Yajie Zhang, jiayin cai et al.
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li, Xiaohao Xu, Yao Gu et al.
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao, Jun Cen, Xiaobin Chang
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei, Ruoxuan Feng, Zihe Wang et al.
Describing Differences in Image Sets with Natural Language
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li, Henry Li, Yichun Shi et al.
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu, Pan Zhou, YI Xuanyu et al.
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong, Zishuo Zheng, Peihao Chen et al.
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu, Shuai Xue, Gan Jianwen et al.
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan, Qingpei Guo, Ming Yang et al.
AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond
Zixiang Zhou, Yu Wan, Baoyuan Wang
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye, Chao Fan, Jingzhe Ma et al.
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
Language-driven All-in-one Adverse Weather Removal
Hao Yang, Liyuan Pan, Yan Yang et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng, Bowen Wen, Jonathan Tremblay et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang, Wenjie Zhu, Hui Tang et al.
Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick, Guangxing Han, Rui Hou et al.
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng, Vishal M. Patel, Haochen Wang et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie et al.
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
Katrin Renz, Long Chen, Elahe Arani et al.
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang, Yifan Zhou, Ziwei Liu et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu et al.
On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?
Maxime Zanella, Ismail Ben Ayed
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.
Neural Markov Random Field for Stereo Matching
Tongfan Guan, Chen Wang, Yun-Hui Liu
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu, Zhanghan Ke, Fang Liu et al.
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan, Jiazhao Zhang, Yan Zhu et al.
Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan et al.
One-Shot Open Affordance Learning with Foundation Models
Gen Li, Deqing Sun, Laura Sevilla-Lara et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng, Dagan Feng, Lei Bi et al.
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.
Discovering and Mitigating Visual Biases through Keyword Explanation
Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.
TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin, Sida Peng, Jingxiao Chen et al.
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo, Fan Ma, Linchao Zhu et al.
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Rui Chen, Jianfeng Zhang, Yixun Liang et al.
Communication-Efficient Collaborative Perception via Information Filling with Codebook
Yue Hu, Juntong Peng, Sifei Liu et al.
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang et al.
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang, Yuhao Wang, Yang Liu et al.
Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou, Dingkang Liang, Wei Xu et al.
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia, Jiahao Li, Bin Li et al.
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma, Huachen Gao, Haoge Deng et al.
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang, Jiacheng Bao, Ruichi Zhang et al.
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen, Yunhao Gou, Runhui Huang et al.
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen, Huaijin Pi, Sida Peng et al.
Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park
Scaling Mesh Generation via Compressive Tokenization
Haohan Weng, Zibo Zhao, Biwen Lei et al.
Mosaic-SDF for 3D Generative Models
Lior Yariv, Omri Puny, Oran Gafni et al.
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di, Weidi Xie
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang et al.
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard, Avinash Madasu, Tiep Le et al.
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao, Renjie Pi, Jianhua Han et al.
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf et al.
PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
Zhenyu Li, Shariq Bhat, Peter Wonka
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian, Qi Cai, Yingwei Pan et al.
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang, Yuan Qu, Hongbin Zhou et al.
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.
One-Prompt to Segment All Medical Images
Wu, Min Xu
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Siyan Dong, Shuzhe Wang, Shaohui Liu et al.
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He, Yiheng Deng, SHIXIANG TANG et al.
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li, Danfeng Hong, Jocelyn Chanussot
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU, Hui Zhang, Jiaqian Yu et al.
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao, Peng Ye, Shengze Li et al.
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
Zhi Gao, Yuntao Du., Xintong Zhang et al.
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou et al.
DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Hao Fei, Shengqiong Wu, Wei Ji et al.
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Conghao Wong, Beihao Xia, Ziqian Zou et al.
Universal Actions for Enhanced Embodied Foundation Models
Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu, Song Wang, Wentong Li et al.
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu, Tianpei Zou, Lianghua He et al.
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
Weirong Chen, Le Chen, Rui Wang et al.
Point Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang, Mingliang Dai, Yi Zhang et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
M-LLM Based Video Frame Selection for Efficient Video Understanding
Kai Hu, Feng Gao, Xiaohan Nie et al.
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du, Miaojing Shi, Jiankang Deng
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang, Qizhe Zhang, Zijun Gao et al.
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.
Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges
Tongtong Yuan, Xuange Zhang, Kun Liu et al.
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Tianqi Li, Guansong Pang, wenjun miao et al.
Towards Practical Real-Time Neural Video Compression
Zhaoyang Jia, Bin Li, Jiahao Li et al.
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li, Haoyan Guan, Jianing Qiu et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao, Tianheng Qiu, Xinyu Zhang et al.
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song, Chenwei Liang, Hu Cao et al.
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han, Shuai Zhang, Xingjian Shi et al.
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang, Qixiang ZHANG, Yi Li et al.
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He, Jiangning Zhang, Yuxuan Cai et al.
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu, Xue Yang, Qingyun Li et al.
Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
Ziying Song, Caiyan Jia, Lin Liu et al.
Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei et al.
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita, Naoto Inoue, Kotaro Kikuchi et al.
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP
wenxin ma, Xu Zhang, Qingsong Yao et al.
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng, Ce Zheng, Chen Chen
Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang, Hsuan-I Ho, Chen Guo et al.
LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu
Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du, Sicheng Zhang, Binzhu Xie et al.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang, Tao Huang, Jiaming Liu et al.
MET3R: Measuring Multi-View Consistency in Generated Images
Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.
Map-Relative Pose Regression for Visual Re-Localization
Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu et al.
Mask Grounding for Referring Image Segmentation
Yong Xien Chng, Henry Zheng, Yizeng Han et al.
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng, Li Hebei, Yueyi Zhang et al.
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin, Leiyi Hu, Bin Li et al.
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
Hongyu Li, Jinyu Chen, Ziyu Wei et al.
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
Yuhang Chen, Wenke Huang, Mang Ye
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee, Sebin Lee, Changho Jo et al.
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li et al.
On Exact Inversion of DPM-Solvers
Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon et al.