Most Cited CVPR "gaussian feature maps" Papers
5,589 papers found • Page 6 of 28
Conference
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam, Joseph Chandler, Chenning Li et al.
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao, Jiang Jingen, Lei Ma et al.
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Harsh Rangwani, Pradipto Mondal, Mayank Mishra et al.
HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
Xihe Yang, Xingyu Chen, Daiheng Gao et al.
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang, Zhenhong Sun, Stewart Tan et al.
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
Youyu Chen, Junjun Jiang, Kui Jiang et al.
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
Understanding Video Transformers via Universal Concept Discovery
Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.
Differentiable Information Bottleneck for Deterministic Multi-view Clustering
Xiaoqiang Yan, Zhixiang Jin, Fengshou Han et al.
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Junha Hyung, Kinam Kim, Susung Hong et al.
Spiking Transformer with Spatial-Temporal Attention
Donghyun Lee, Yuhang Li, Youngeun Kim et al.
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.
CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Wei Fang, Yuxing Tang, Heng Guo et al.
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
Jianan Ye, Weiguang Zhao, Xi Yang et al.
Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast et al.
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.
Revisiting Adversarial Training Under Long-Tailed Distributions
Xinli Yue, Ningping Mou, Qian Wang et al.
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
WENCAN CHENG, Hao Tang, Luc Van Gool et al.
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu, Yi Zhang, Song Bai et al.
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei, Chenxi Liu, Siyuan Qiao et al.
Neural Visibility Field for Uncertainty-Driven Active Mapping
Shangjie Xue, Jesse Dill, Pranay Mathur et al.
Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning
Youqi Pan, Wugen Zhou, Yingdian Cao et al.
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li, Zhiheng Li, Nuo Chen et al.
Condition-Aware Neural Network for Controlled Image Generation
Han Cai, Muyang Li, Qinsheng Zhang et al.
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang, gaohuan, Ping Guo et al.
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Haoxuanye Ji, Pengpeng Liang, Erkang Cheng
SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations
Pu Li, Jianwei Guo, HUIBIN LI et al.
SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration
Xu Cao, Takafumi Taketomi
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma et al.
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu, Yao Qin
Data Valuation and Detections in Federated Learning
Wenqian Li, Shuran Fu, Fengrui Zhang et al.
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang, Ouxiang Li, Tingting Mu et al.
eTraM: Event-based Traffic Monitoring Dataset
Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela et al.
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu, Weicai Ye, Yan Luximon et al.
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
Jinglei Zhang, Jiankang Deng, Chao Ma et al.
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel, Elia Peruzzo, Yifan Jiang et al.
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang, Zizhang Li, Matt Zhou et al.
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Zhe Shan, Yang Liu, Lei Zhou et al.
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang, Shuming Hu, Shiyu Zhao et al.
DreamOmni: Unified Image Generation and Editing
Bin Xia, Yuechen Zhang, Jingyao Li et al.
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Haoyuan Wang, Wenbo Hu, Lei Zhu et al.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng et al.
Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen, Dapeng Chen, Ruijin Liu et al.
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi, Se Jin Park, Minsu Kim et al.
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava, Abhinav Shrivastava
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan, Jing Xu, Hao Pan et al.
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song, Jianfeng Zhang, Xiu Li et al.
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
Lei Fan, Mingfu Liang, Yunxuan Li et al.
Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.
Diversified and Personalized Multi-rater Medical Image Segmentation
Yicheng Wu, Xiangde Luo, Zhe Xu et al.
C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Yiqun Lin, Jiewen Yang, hualiang wang et al.
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Nicolas Dufour, Vicky Kalogeiton, David Picard et al.
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du CHEN, Tianhe Wu, Kede Ma et al.
Efficient Meshflow and Optical Flow Estimation from Event Cameras
Xinglong Luo, Ao Luo, Zhengning Wang et al.
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Andreas Engelhardt, Amit Raj, Mark Boss et al.
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
Khiem Le, Tuan Long Ho, Cuong Do et al.
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Min Ren et al.
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, JunJia Guo, Hang Hua et al.
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu, Wanchao Su, Jing Liao
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Wenyuan Zhang, Yixiao Yang, Han Huang et al.
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo, Lijun Zhang, Mengyang Sun et al.
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang, shiyu xuan, Shiliang Zhang
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang, Guosheng Hu, Hongguang Wang
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Tianshu Huang, John Miller, Akarsh Prabhakara et al.
Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
Inhee Lee, Byungjun Kim, Hanbyul Joo
Degradation-Aware Feature Perturbation for All-in-One Image Restoration
Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li et al.
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
Tao Xie, Xi Chen, Zhen Xu et al.
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu, Bin Duan, Weitai Kang et al.
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
Hai Wu, Shijia Zhao, Xun Huang et al.
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong, Lihe Ding, Zhanpeng Huang et al.
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang, Xin Li, Shengzhao Wen et al.
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu, Ao Li, Yansong Tang et al.
Revisiting MAE Pre-training for 3D Medical Image Segmentation
Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Alireza Ganjdanesh, Shangqian Gao, Heng Huang
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan, Pei Fu, Shan Guo et al.
SLICE: Stabilized LIME for Consistent Explanations for Image Classification
Revoti Prasad Bora, Kiran Raja, Philipp Terhörst et al.
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
Wei Deng, Mengshi Qi, Huadong Ma
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
Day-Night Cross-domain Vehicle Re-identification
Hongchao Li, Jingong Chen, AIHUA ZHENG et al.
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models
Matthew Kowal, Richard P. Wildes, Kosta Derpanis
PrEditor3D: Fast and Precise 3D Shape Editing
Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
Yifan Yang, Dong Liu, Shuhai Zhang et al.
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
Jiancheng Zhang, Haijin Zeng, Yongyong Chen et al.
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen, Samuel Rota Bulò, Norman Müller et al.
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
Tian Liu, Huixin Zhang, Shubham Parashar et al.
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su, Peihan Miao, Huanzhang Dou et al.
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.
A Noisy Elephant in the Room: Is Your Out-of-Distribution Detector Robust to Label Noise?
Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Liyuan Zhu, Shengyu Huang, Konrad Schindler et al.
Adapters Strike Back
Jan-Martin Steitz, Stefan Roth
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu, Mikihiro Tanaka, Kent Fujiwara
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang, Bo-Wen Yin, yuming chen et al.
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
yunlong lin, Zixu Lin, Haoyu Chen et al.
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
Joren Brunekreef, Eric Marcus, Ray Sheombarsing et al.
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao, Guanbin Li, Weikai Chen et al.
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin, Haoran Chen, Yue Fan et al.
Gaussian Shadow Casting for Neural Characters
Luis Bolanos, Shih-Yang Su, Helge Rhodin
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang, Jiayi He, Lechao Cheng et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang, Xinyi Chen, Yilun Chen et al.
Binarized Low-light Raw Video Enhancement
Gengchen Zhang, Yulun Zhang, Xin Yuan et al.
One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho, Jonghyun Lee, Seunggyu Chang et al.
GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao, Boyan Jiang, Xinghui Li et al.
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.
Cyclic Learning for Binaural Audio Generation and Localization
Zhaojian Li, Bin Zhao, Yuan Yuan
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao, Kunyu Shi, Pengkai Zhu et al.
Bidirectional Autoregessive Diffusion Model for Dance Generation
Canyu Zhang, Youbao Tang, NING Zhang et al.
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
Junyang Chen, Jinshan Pan, Jiangxin Dong
Adversarial Score Distillation: When score distillation meets GAN
Min Wei, Jingkai Zhou, Junyao Sun et al.
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
Jinjing Zhao, Fangyun Wei, Chang Xu
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Li Maomao, Yu Li, Tianyu Yang et al.
Any6D: Model-free 6D Pose Estimation of Novel Object
Taeyeop Lee, Bowen Wen, Minjun Kang et al.
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei, Zixuan Pan, Andrew Owens
Dynamic Camera Poses and Where to Find Them
Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta, Thiemo Alldieck, NIKOS KOLOTOUROS et al.
Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
Kota Sueyoshi, Takashi Matsubara
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin, Shihao Zou, Yuxuan Ge et al.
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang et al.
Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao, jialiang zhu, Chunyu Wang et al.
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim, Sung Jin Um, Sangmin Lee et al.
ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia, Miao Liu, Hao Jiang et al.
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen, Hao Zheng, Yuemeng LI et al.
What How and When Should Object Detectors Update in Continually Changing Test Domains?
Jayeon Yoo, Dongkwan Lee, Inseop Chung et al.
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou, Jiahao Li, Zunnan Xu et al.
Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI
Chong Wang, Lanqing Guo, Yufei Wang et al.
MangaNinja: Line Art Colorization with Precise Reference Following
Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi, Boyi Li, Han Cai et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan, Huaibo Huang, Ran He
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang, xinyu zhou, Bingxuan Wang et al.
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction
Yifan Wang, Peishan Yang, Zhen Xu et al.
OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan et al.
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.
Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
Aobo Li, Jinjian Wu, Yongxu Liu et al.
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Xuesong Chen, Linjiang Huang, Tao Ma et al.
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Quentin HERAU, Nathan Piasco, Moussab Bennehar et al.
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera
Jian Huang, Chengrui Dong, Xuanhua Chen et al.
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng, Jiacheng Wei, Tianrun Chen et al.
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Lei Fan, Dongdong Fan, Zhiguang Hu et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu, Chih-Hui Ho, Nuno Vasconcelos
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Lijun Sheng, Jian Liang, Zilei Wang et al.
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li, Yingyi Chen, Xuanlong Yu et al.
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick, Matthew Chan, Towaki Takikawa et al.
Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
Bi'an Du, Xiang Gao, Wei Hu et al.
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu, Yuenan Hou, Xiaoshui Huang et al.
Personalized Preference Fine-tuning of Diffusion Models
Meihua Dang, Anikait Singh, Linqi Zhou et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu, Mingyu Liu, Zeyu Zhu et al.
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P, K. Poudel, Harit Pandya et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang, Lizhi Wang, Xiangtian Ma et al.
Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye, Wenze Liu, He Guo et al.
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Junwen Xiong, Peng Zhang, Tao You et al.
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge, Yihe Tang, Jiashu Xu et al.
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.