Most Cited CVPR "motion prediction" Papers
5,589 papers found • Page 22 of 28
Conference
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton, Andrew Zisserman, John Hershey et al.
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang, Liyuan Pan, Yan Yang et al.
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang, Pierre Gleize, Hao Tang et al.
Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou, Zeyu Xiao, Shiyu Deng et al.
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan, Dongnan Liu, Hang Chang et al.
Gaussian Splatting SLAM
Hidenobu Matsuki, Riku Murai, Paul Kelly et al.
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao, Yijiang Liu, Zhen Dong et al.
A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability
Xu Yang, Xuan chen, Moqi Li et al.
POPDG: Popular 3D Dance Generation with PopDanceSet
Zhenye Luo, Min Ren, Xuecai Hu et al.
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao, Weiming Liu, Chaochao Chen et al.
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis et al.
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed et al.
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao, Bolin Ni, Junsong Fan et al.
Unsupervised Feature Learning with Emergent Data-Driven Prototypicality
Yunhui Guo, Youren Zhang, Yubei Chen et al.
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling, Seung Wook Kim, Antonio Torralba et al.
URHand: Universal Relightable Hands
Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo et al.
Snapshot Lidar: Fourier Embedding of Amplitude and Phase for Single-Image Depth Reconstruction
Sarah Friday, Yunzi Shi, Yaswanth Kumar Cherivirala et al.
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu, Jinliang Zheng, Yu Liu et al.
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li, Hao Li, Yiyi Liao et al.
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee, Jongwon Choi
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita, Naoto Inoue, Kotaro Kikuchi et al.
Validating Privacy-Preserving Face Recognition under a Minimum Assumption
Hui Zhang, Xingbo Dong, YenLungLai et al.
IDGuard: Robust General Identity-centric POI Proactive Defense Against Face Editing Abuse
Yunshu Dai, Jianwei Fei, Fangjun Huang
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang, Fuchen Long, Yingwei Pan et al.
Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction
Ruixuan Yu, Jian Sun
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong, Zhiqi Li, Yuntao Chen et al.
Compositional Video Understanding with Spatiotemporal Structure-based Transformers
Hoyeoung Yun, Jinwoo Ahn, Minseo Kim et al.
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan et al.
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
ChangHee Yang, ChanHee Kang, Kyeongbo Kong et al.
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular Stereo and RGB-D Cameras
Huajian Huang, Longwei Li, Hui Cheng et al.
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu, Jinjin Gu, Zheyuan Li et al.
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui, Paolo Favaro
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin, Yi Jiang, Lizhen Qu et al.
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian, Qi Cai, Yingwei Pan et al.
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo, Weihong Ren, Weibo Jiang et al.
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia, Jiahao Li, Bin Li et al.
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim, Jungwon Park, Wonjong Rhee
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer, Peter Wonka, Maks Ovsjanikov
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU, Dongliang Chang, Timothy Hospedales et al.
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty, Marc Habermann, Guoxing Sun et al.
Neighbor Relations Matter in Video Scene Detection
Jiawei Tan, Hongxing Wang, Jiaxin Li et al.
Mask Grounding for Referring Image Segmentation
Yong Xien Chng, Henry Zheng, Yizeng Han et al.
SignGraph: A Sign Sequence is Worth Graphs of Nodes
Shiwei Gan, Yafeng Yin, Zhiwei Jiang et al.
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
Zixian Gao, Xun Jiang, Xing Xu et al.
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang, Ruihao Gong, Jing Liu et al.
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye, Dan Xu
Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion
Jiangtong Tan, Jie Huang, Naishan Zheng et al.
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
Jinglin Xu, Guohao Zhao, Sibo Yin et al.
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina, Massimiliano Mancini, Elia Cunegatti et al.
Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation
Yi Zhang, Meng-Hao Guo, Miao Wang et al.
GALA: Generating Animatable Layered Assets from a Single Scan
Taeksoo Kim, Byungjun Kim, Shunsuke Saito et al.
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng, Xiaoyong Chen, Jiaming Liang et al.
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu, Sida Peng, Haotong Lin et al.
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang, Bo Li, Shuang Wu et al.
A Unified Approach for Text- and Image-guided 4D Scene Generation
Yufeng Zheng, Xueting Li, Koki Nagano et al.
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li et al.
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.
Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning
Pierre Marza, Laetitia Matignon, Olivier Simonin et al.
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen, Lin Gu, Dezhi Zheng et al.
Image Processing GNN: Breaking Rigidity in Super-Resolution
Yuchuan Tian, Hanting Chen, Chao Xu et al.
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen, Yue Song, Gaowen Liu et al.
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa, Seiichi Uchida
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang et al.
Neural Video Compression with Feature Modulation
Jiahao Li, Bin Li, Yan Lu
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li, Yishuo Cai, Haowei Li et al.
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.
EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong, Shikun Liu, Xiaoyang Lyu et al.
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu, Lingling Zhang, Jun Liu et al.
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui, Tengyu Liu, Nian Liu et al.
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu, Fangzhou Hong, Ziwei Liu
PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Diantao Tu, Hainan Cui, Xianwei Zheng et al.
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao et al.
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Xu, Yun Fu
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification
Yuanmin Huang, Mi Zhang, Daizong Ding et al.
Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
Jin Wang, Bingfeng Zhang, Jian Pang et al.
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
Longguang Wang, Juncheng Li, Yingqian Wang et al.
Detector-Free Structure from Motion
Xingyi He, Jiaming Sun, Yifan Wang et al.
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang, Tianyi Zhou, kanxue Li et al.
One-Shot Open Affordance Learning with Foundation Models
Gen Li, Deqing Sun, Laura Sevilla-Lara et al.
Mean-Shift Feature Transformer
Takumi Kobayashi
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao, Jun Cen, Xiaobin Chang
Fast Adaptation for Human Pose Estimation via Meta-Optimization
Shengxiang Hu, Huaijiang Sun, Bin Li et al.
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuohong Li, Wei He, Jiepan Li et al.
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero, Luca Zancato, Matthew Trager et al.
Exploring Orthogonality in Open World Object Detection
Zhicheng Sun, Jinghan Li, Yadong Mu
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.
ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images
Yiqi Shi, Duo Liu, Liguo Zhang et al.
Rethinking Multi-domain Generalization with A General Learning Objective
Zhaorui Tan, Xi Yang, Kaizhu Huang
Grounding and Enhancing Grid-based Models for Neural Fields
Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng, Yaning Zhu, Zongji Wang et al.
TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.
HUGS: Human Gaussian Splats
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Jiye Lee, Hanbyul Joo
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
Jihua Peng, Yanghong Zhou, Tracy P Y Mok
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie, Yun Xing, Gongjie Zhang et al.
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang, Gangwei Xu, Hao Jia et al.
Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin, Jisu Shin, Jihwan Bae et al.
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.
Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification
Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG, Yexin Liu, Xu Zheng et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney, Hanlin Tian, Hamish Scott et al.
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu et al.
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li, Zhihao Liu, Bedrich Benes et al.
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo et al.
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei, Shaofeng Yin, Yang Liu
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan, Qingpei Guo, Ming Yang et al.
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu, Anpei Chen, Binbin Huang et al.
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu, Guansong Pang
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.
Plug and Play Active Learning for Object Detection
Chenhongyi Yang, Lichao Huang, Elliot Crowley
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang, Jinzhi Zhang, Fan Wang et al.
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang, Yang Fu, Zheng Ding et al.
Learning from Synthetic Human Group Activities
Che-Jui Chang, Danrui Li, Deep Patel et al.
Can’t Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo et al.
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi, Jian Pang, Bingfeng Zhang et al.
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.
Neural Clustering based Visual Representation Learning
Guikun Chen, Xia Li, Yi Yang et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu, Yuhan Li, Yunqi Gao et al.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han, Qinghua Zheng, Guang Dai et al.
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng, Huisi Wu, Runhao Zeng et al.
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv, Hong Chen, Jinyang Guo et al.
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing, Mai Xu, Shengxi Li et al.
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren, Minchul Kim, Feng Liu et al.
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan, Miaojing Shi, Zijie Yue et al.
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao, Zeyu Wang, Wei-Shi Zheng et al.
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao et al.
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan et al.
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang et al.
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao, M. Salman Asif, Zhan Ma
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman, Christopher Zach, Carl Olsson
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen et al.
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui, Zihao Wei, Hongru Zhu et al.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang, Xin Yang, Jiantao Lin et al.
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
Gradient Alignment for Cross-Domain Face Anti-Spoofing
MINH BINH LE, Simon Woo
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu, Kean Liu, Xiaoyue Mi et al.
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu et al.
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang et al.
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu, Qiong Cao, Yandong Wen et al.
Parameter Efficient Self-Supervised Geospatial Domain Adaptation
Linus Scheibenreif, Michael Mommert, Damian Borth
MICap: A Unified Model for Identity-Aware Movie Descriptions
Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan et al.
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li, Ke Xu, Gerhard Hancke et al.
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
Wenjun Hui, Zhenfeng Zhu, Shuai Zheng et al.
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son, Jaehun Park, Kwangsu Kim
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang, Anyi Rao, Jingbo Wang et al.
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong et al.
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma, Can Cui, Xu Cao et al.
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Hongxun Yao
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan, Xiao Wu, Haoyu Deng et al.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Wentao Qu, Yuantian Shao, Lingwu Meng et al.
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang, Fengyu Yang, Xihang Yu et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang et al.
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem, Conor McCullough, Randy Hsin et al.
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul, Zhizhong Li, Hao Yang et al.
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan LI, Shuai Li, Xindong Zhang et al.
Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan, Yue Wu, Xiaolong Fan et al.
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li, Haote Yang, Zhenghao Hu et al.
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo, Sangyoon Lee, Kwang In Kim et al.
A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
Feng Yu, Teng Zhang, Gilad Lerman
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Jiapeng Tang, Angela Dai, Yinyu Nie et al.
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu, Zichang Tan, Chuangchuang Tan et al.
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang et al.
Novel View Synthesis with View-Dependent Effects from a Single Image
Juan Luis Gonzalez Bello, Munchurl Kim
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang, Linjie Li, Kevin Lin et al.
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Lei Qi et al.
iKUN: Speak to Trackers without Retraining
Yunhao Du, Cheng Lei, Zhicheng Zhao et al.