Most Cited CVPR "recurrent propagation module" Papers
5,589 papers found • Page 20 of 28
Conference
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
Zhaolin Wan, Han Qin, Zhiyang Li et al.
A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts
Xuyi He, Yuhui Quan, Ruotao Xu et al.
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
Neural Inverse Rendering from Propagating Light
Anagh Malik, Benjamin Attal, Andrew Xie et al.
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin, Jisu Shin, Jihwan Bae et al.
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
Ali Athar, Xueqing Deng, Liang-Chieh Chen
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu et al.
A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models
Keyu Tu, Mengqi Huang, Zhuowei Chen et al.
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification
Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG, Yexin Liu, Xu Zheng et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion
Haoyu Wang, Le Wang, Sanping Zhou et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality
Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam et al.
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
Joonmyung Choi, Sanghyeok Lee, Jaewon Chu et al.
Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
Chong Yu, Tao Chen, Zhongxue Gan
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
Doppelgängers and Adversarial Vulnerability
George Kamberov
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney, Hanlin Tian, Hamish Scott et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Yuhan Wang, Fangzhou Hong, Shuai Yang et al.
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang, Bingfeng Zhou, Yadong Mu
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu, Ehsan Elhamifar
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
Matrix-Free Shared Intrinsics Bundle Adjustment
Daniel Safari
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.
Seeing More with Less: Human-like Representations in Vision Models
Andrey Gizdov, Shimon Ullman, Daniel Harari
ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang et al.
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu et al.
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li, Zhihao Liu, Bedrich Benes et al.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Jianyi Wang, Zhijie Lin, Meng Wei et al.
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding
Jiaxin Shi, Mingyue Xiang, Hao Sun et al.
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma, Sumit Patidar, Iain Haughton et al.
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
Nikaan Nikzad, YI LIAO, Yongsheng Gao et al.
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval
Siyuan Duan, Yuan Sun, Dezhong Peng et al.
Traceable Federated Continual Learning
Qiang Wang, Bingyan Liu, Yawen Li
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
NVILA: Efficient Frontier Visual Language Models
Zhijian Liu, Ligeng Zhu, Baifeng Shi et al.
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo et al.
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang, Zekai Li, Zhi-Qi Cheng et al.
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
Wonjoon Jin, Qi Dai, Chong Luo et al.
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei, Shaofeng Yin, Yang Liu
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK, Zi Li, Yunhao Bai et al.
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai, Weizhong Zhang, Yuan Wu et al.
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
Xiangtao Zhang, Sheng Li, Ao Li et al.
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan, Qingpei Guo, Ming Yang et al.
LoS: Local Structure-Guided Stereo Matching
Kunhong Li, Longguang Wang, Ye Zhang et al.
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer, Yuval Haitman
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
Yiyang Shen, Kun Zhou, He Wang et al.
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu, Anpei Chen, Binbin Huang et al.
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu, Xiaohong Liu, Jun Jia et al.
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani, John Collomosse, Tu Bui et al.
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu, Guansong Pang
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei, Chenyu Lin, Yu Qiu et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, HwiJeong Lee, Inha Kang et al.
Learning Partonomic 3D Reconstruction from Image Collections
Xiaoqian Ruan, Pei Yu, Dian Jia et al.
LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning
Peng Wu, Xiankai Lu, Hao Hu et al.
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang, Chen Junnan, Guohuan Gao et al.
3D Student Splatting and Scooping
Jialin Zhu, Jiangbei Yue, Feixiang He et al.
Sheared Backpropagation for Fine-tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument, Robin Bruneau, Yvain Queau et al.
LEDiff: Latent Exposure Diffusion for HDR Generation
Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Cheng Zhang, Haofei Xu, Qianyi Wu et al.
Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot Crowley
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction
Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang, Jinzhi Zhang, Fan Wang et al.
Differentiable Micro-Mesh Construction
Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang, Yang Fu, Zheng Ding et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.
Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel et al.
Can’t Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo et al.
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang, Dongliang Cao, Florian Bernard
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
Continuous Adverse Weather Removal via Degradation-Aware Distillation
Xin Lu, Jie Xiao, Yurui Zhu et al.
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao, Qi Qian, Juhua Hu
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi, Jian Pang, Bingfeng Zhang et al.
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
Zimo Wang, Cheng Wang, Taiki Yoshino et al.
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca, Anshul Gupta, Jean-marc Odobez
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
chenkai zhang, Yiming Lei, Zeming Liu et al.
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.
Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection
Jiangyi Wang, Na Zhao
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.
Neural Clustering based Visual Representation Learning
Guikun Chen, Xia Li, Yi Yang et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Yuxin Guo, Siyang Sun, Shuailei Ma et al.
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu, Kalyan Ramakrishnan, A. N. Rajagopalan
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu, Xinting Hu, Yuyang Sun et al.
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu, Yuhan Li, Yunqi Gao et al.
Autoregressive Sequential Pretraining for Visual Tracking
Shiyi Liang, Yifan Bai, Yihong Gong et al.
A Selective Re-learning Mechanism for Hyperspectral Fusion Imaging
Yuanye Liu, jinyang liu, Renwei Dian et al.
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.
Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.
Point Transformer V3: Simpler Faster Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um, Jong Chul Ye
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation
Yifei Zhang, Hao Zhu, Alysa Ziying Tan et al.
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim, Kejie Li, Xueqing Deng et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han, Qinghua Zheng, Guang Dai et al.
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen et al.
Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang, Jiahao Wang, Sucheng Ren et al.
EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang, Fangwen Tu, Yixuan Long et al.
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin, Guangzong Si, Zilei Wang
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano, Federico Magistri, Lucas Nunes et al.
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu, Jie Huang, Li et al.
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba, Matthew Walter, Norimichi Ukita
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Hao Cheng, Erjia Xiao, Jiayan Yang et al.
MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction
Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong, Ningtao Mao, He Wang
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng, Huisi Wu, Runhao Zeng et al.
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng, Yujie Zhong, Zequn Jie et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
Jiasen Lu, Christopher Clark, Sangho Lee et al.
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo et al.
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition
ZHANG LINTONG, Kang Yin, Seong-Whan Lee
DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement
huang yongshu, Chen Liu, Minghang Zhu et al.
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen et al.
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection
Fan Xing, Zhuo Tian, Xuefeng Fan et al.
Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights
Ondrej Tybl, Lukas Neumann
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs
Junsheng Wang, Nieqing Cao, Yan Ding et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection
Ji Du, Fangwei Hao, Mingyang Yu et al.
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole et al.
MAD: Memory-Augmented Detection of 3D Objects
Ben Agro, Sergio Casas, Patrick Wang et al.
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation
Daosong Hu, Mingyue Cui, Kai Huang
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing, Mai Xu, Shengxi Li et al.
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration
Jae Hyeon Park, Joo Hyeon Jeon, Jae Yun Lee et al.
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Chenxu Dang, Pei An, Xinmin Zhang et al.