Most Cited CVPR "data-driven noise schedules" Papers
5,589 papers found • Page 12 of 28
Conference
3D Dental Model Segmentation with Geometrical Boundary Preserving
Shufan Xi, Zexian Liu, Junlin Chang et al.
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
YUEJIAO SU, Yi Wang, Qiongyang Hu et al.
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra et al.
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
Haipeng Fang, Sheng Tang, Juan Cao et al.
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman, Haiwen Feng, Michael J. Black et al.
ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images
Yanqing Shen, Turcan Tuna, Marco Hutter et al.
Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions
Boran Wen, Dingbang Huang, Zichen Zhang et al.
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
Haiyang Mei, Pengyu Zhang, Mike Zheng Shou
Locally Adaptive Neural 3D Morphable Models
Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O' Sullivan et al.
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
Xiang Gao, Shuai Yang, Jiaying Liu
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal, Xiang Yue, Erion Plaku et al.
Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels
Qiming Xia, Wenkai Lin, Haoen Xiang et al.
Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers
Ji Zhao, Banglei Guan, Zibin Liu et al.
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.
Muchen Li, Sammy Christen, Chengde Wan et al.
Learning to Highlight Audio by Watching Movies
Chao Huang, Ruohan Gao, J. M. F. Tsang et al.
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang, Xiaolu Liu, Lingdong Kong et al.
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
Yancong Lin, Shiming Wang, Liangliang Nan et al.
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
Weikang Bian, Zhaoyang Huang, Xiaoyu Shi et al.
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?
Zebin You, Xinyu Zhang, Hanzhong Guo et al.
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering
Hanxiao Sun, Yupeng Gao, Jin Xie et al.
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization
Sihao Liu, Yibo Yang, Xiaojie Li et al.
Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling
Xingyu Chen, Zihao Feng, Kun Qian et al.
Segment Any-Quality Images with Generative Latent Space Enhancement
Guangqian Guo, Yong Guo, Xuehui Yu et al.
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen, Yikai Wang, Wenqiang Sun et al.
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
Bohan Xiao, PEIYONG WANG, Qisheng He et al.
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva, Fadime Sener, Edoardo Remelli et al.
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
Yian Zhao, Wanshi Xu, Ruochong Zheng et al.
Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
Kaouther Messaoud, Matthieu Cord, Alex Alahi
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong, Bokai Lai, Yiyi Zhou et al.
Unity in Diversity: Video Editing via Gradient-Latent Purification
Junyu Gao, Kunlin Yang, Xuan Yao et al.
Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On
Nannan Zhang, Yijiang Li, Dong Du et al.
Context-Aware Multimodal Pretraining
Karsten Roth, Zeynep Akata, Dima Damen et al.
Geometry in Style: 3D Stylization via Surface Normal Deformation
Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.
Multiplane Prior Guided Few-Shot Aerial Scene Rendering
Zihan Gao, Licheng Jiao, Lingling Li et al.
Neural Hierarchical Decomposition for Single Image Plant Modeling
Zhihao Liu, Zhanglin Cheng, Naoto Yokoya
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability
Jaehui Hwang, Junghyuk Lee, Jong-Seok Lee
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras
Jiaxi Deng, Yushen Wang, Haitao Meng et al.
TexVocab: Texture Vocabulary-conditioned Human Avatars
Yuxiao Liu, Zhe Li, Yebin Liu et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
Real-Time Neural BRDF with Spherically Distributed Primitives
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen, Guoqiang Gong, Tianxiang Hao et al.
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
Sai Wang, Yutian Lin, Yu Wu
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition
Chuanfu Shen, Rui Wang, Lixin Duan et al.
Anatomically Constrained Implicit Face Models
Prashanth Chandran, Gaspard Zoss
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Xi Wang, Hongzhen Li, Heng Fang et al.
Anomize: Better Open Vocabulary Video Anomaly Detection
Fei Li, Wenxuan Liu, Jingjing Chen et al.
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla, Sethuraman T V, Alexander G. Schwing et al.
Reasoning to Attend: Try to Understand How <SEG> Token Works
Rui Qian, Xin Yin, Dejing Dou
BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering
Minye Wu, Haizhao Dai, Kaixin Yao et al.
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range
Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
Senmao Li, Lei Wang, Kai Wang et al.
LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge, Qiao Feng, Hailong Jia et al.
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation
Edward LOO, Jiacheng Deng
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
Mengtian Li, Jinshu Chen, Wanquan Feng et al.
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Shengqiong Wu, Hao Fei, Jingkang Yang et al.
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari et al.
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang et al.
Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Jeonghoon Park, Chaeyeon Chung, Jaegul Choo
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
Linyi Jin, Nilesh Kulkarni, David Fouhey
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images
Jiuchen Chen, Xinyu Yan, Qizhi Xu et al.
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan, Matthew A Gwilliam, Pulkit Kumar et al.
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li, Lingyun Xu, Mingxu Zhang et al.
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
Yu Mao, Jun Wang, Nan Guan et al.
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.
Towards All-in-One Medical Image Re-Identification
Yuan Tian, Kaiyuan Ji, Rongzhao Zhang et al.
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille, Greg Ver Steeg, Tian Yu Liu et al.
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
Luyuan Xie, Tianyu Luan, Wenyuan Cai et al.
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane, Gabrielle Ebbrecht, Ying Wang et al.
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Zekun Li et al.
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik et al.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo et al.
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin, Valery Weber, Ahmed Nassar et al.
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li, Ka-Hou Chan, Yue Sun et al.
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Wei Li, Pin-Yu Chen, Sijia Liu et al.
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
Learning Affine Correspondences by Integrating Geometric Constraints
Pengju Sun, Banglei Guan, Zhenbao Yu et al.
SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
Tomoki Ichikawa, Shohei Nobuhara, Ko Nishino
On Denoising Walking Videos for Gait Recognition
Dongyang Jin, Chao Fan, Jingzhe Ma et al.
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
Ziqin Huang, Gu Wang, Chenyangguang Zhang et al.
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng, Hang Zhou, Jing Liao et al.
DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing
Yufei Huang, Bangyan Liao, Yuqi Hu et al.
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
Hongye Cheng, Tianyu Wang, guangsi shi et al.
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen, Bingchen Zhao, Yilun Chen et al.
Building Optimal Neural Architectures using Interpretable Knowledge
Keith Mills, Fred Han, Mohammad Salameh et al.
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang, Zhiwu Huang, Xiaopeng Hong
Fractal Calibration for Long-tailed Object Detection
Konstantinos Alexandridis, Ismail Elezi, Jiankang Deng et al.
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
Fangyu Wu, Yuhao Chen
Test-Time Visual In-Context Tuning
Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr et al.
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin, Zhu Xu, Yang Liu
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
Kang Chen, Jiyuan Zhang, Zecheng Hao et al.
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li, Zixuan Huang, Anh Thai et al.
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
Guangyan Chen, Te Cui, Meiling Wang et al.
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang et al.
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging
Zhu Liu, Zijun Wang, Jinyuan Liu et al.
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao, Weijia Mao, Mike Zheng Shou
ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai, Zekai Yin, Eshed Ohn-Bar
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
Shian Du, Menghan Xia, Chang Liu et al.
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang, Chao Xu, Cheng Yu et al.
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
Juil Koo, Paul Guerrero, Chun-Hao P. Huang et al.
Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Hsu, Jie-Ying Zhang, Yu-Hsiang Huang et al.
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu, Liying Yang, Peike Li et al.
AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation
Zeyi Xu, Jinfan Liu, Kuangxu Chen et al.
Enhancing Dataset Distillation via Non-Critical Region Refinement
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang, Lianghua Huang, Jingwu Fang et al.
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing, Avinab Saha, Junfeng He et al.
Atom-Level Optical Chemical Structure Recognition with Limited Supervision
Martijn Oldenhof, Edward De Brouwer, Adam Arany et al.
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing
Yanjun Li, Zhaoyang Li, Honghui Chen et al.
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
Jingyi Xu, Xieyuanli Chen, Junyi Ma et al.
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers
Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner et al.
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening
Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.
Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation
Zhaoyang Li, Yuan Wang, Wangkai Li et al.
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
Han Hu, Wenli Du, Peng Liao et al.
GASP: Gaussian Avatars with Synthetic Priors
Jack Saunders, Charlie Hewitt, Yanan Jian et al.
Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation
David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang et al.
Learning to Remove Wrinkled Transparent Film with Polarized Prior
Jiaqi Tang, RUIZHENG WU, Xiaogang Xu et al.
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Dihan Zheng, Yihang Zou, Xiaowen Zhang et al.
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.
Order-One Rolling Shutter Cameras
Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano et al.
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders et al.
MARBLE: Material Recomposition and Blending in CLIP-Space
Ta-Ying Cheng, Prafull Sharma, Mark Boss et al.
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
Hongkai Wei, YANG YANG, Shijie Sun et al.
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
Cong Wei, Haoxian Tan, Yujie Zhong et al.
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
Jeonghyeon Kim, Sangheum Hwang
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
Jiayi Guan, Li Shen, Ao Zhou et al.
Multi-modal Medical Diagnosis via Large-small Model Collaboration
Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.
URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration
Rui Xu, Yuzhen Niu, Yuezhou Li et al.
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan, Yang Zhao, Xuhui Jia et al.
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
Wonyong Seo, Jihyong Oh, Munchurl Kim
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
Jamie Wynn, Zawar Qureshi, Jakub Powierza et al.
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
Jianyu LAI, Sixiang Chen, yunlong lin et al.
Enhanced then Progressive Fusion with View Graph for Multi-View Clustering
Zhibin Dong, Meng Liu, Siwei Wang et al.
GENIUS: A Generative Framework for Universal Multimodal Search
Sungyeon Kim, Xinliang Zhu, Xiaofan Lin et al.
DistinctAD: Distinctive Audio Description Generation in Contexts
Bo Fang, Wenhao Wu, Qiangqiang Wu et al.
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning
Hao Zhu, Yifei Zhang, Junhao Dong et al.
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization
Junchen Yu, Siyuan Cao, Runmin Zhang et al.
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin, Ke Wu, Jie Li et al.
Understanding Multi-Task Activities from Single-Task Videos
Yuhan Shen, Ehsan Elhamifar
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse, Jan van Gemert
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
Chi Su, Xiaoxuan Ma, Jiajun Su et al.
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong, Jicheol Park, Sungyeon Kim et al.
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
Xianing Chen, Si Huo, Borui Jiang et al.
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization
Xiran Wang, Jian Zhang, Lei Qi et al.
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang, Zekun Li, Jian Zhang et al.
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X. Chen, Misha Sra, Pradeep Sen
Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
Fan Zhang, Xian-Sheng Hua, Chong Chen et al.
Gated Fields: Learning Scene Reconstruction from Gated Videos
Andrea Ramazzina, Stefanie Walz, Pragyan Dahal et al.
Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations
Haitong Liu, Kuofeng Gao, Yang Bai et al.
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.
SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
Chongkai Yu, Ting Liu, Li Anqi et al.
PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
Barza Nisar, Steven L. Waslander
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang, Leonor Fermoselle, Duy Ta et al.
Enhancing Diversity for Data-free Quantization
Kai Zhao, zhihao zhuang, Miao Zhang et al.
Sufficient Invariant Learning for Distribution Shift
Taero Kim, Subeen Park, Sungjun Lim et al.
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
Ziliang Chen, Xin Huang, Xiaoxuan Fan et al.
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury, Kewen Liao, Vu Minh Hieu Phan et al.
LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup
Jianxiong Shen, Yue Qian, Xiaohang Zhan
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng, Menglei Chai, Delio Vicini et al.
Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach
Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Yiheng Li, Yang Yang, Zichang Tan et al.
GG-SSMs: Graph-Generating State Space Models
Nikola Zubic, Davide Scaramuzza
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Eunkyu Park, Minyeong Kim, Gunhee Kim
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
Yaohua Zha, Yanzi Wang, Hang Guo et al.
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li, Hongyu Wu, Wang et al.
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang, Daqing Liu, Xinchen Liu et al.
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek et al.
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
Ziqian Yang, Xinqiao Zhao, Xiaolei Wang et al.
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang, Zeyi Huang, Xiangjie Kong et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri, Mahdieh Baghshah, Mohammad Taher Pilehvar
DFM: Differentiable Feature Matching for Anomaly Detection
Wu Sheng, Yimi Wang, Xudong Liu et al.
Reconstruction-free Cascaded Adaptive Compressive Sensing
Chenxi Qiu, Tao Yue, Xuemei Hu
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang, Weisheng Dai, Hongtao Wen et al.
Previously on ... From Recaps to Story Summarization
Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi
Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic et al.