Most Cited ICCV "fixed-point iterations" Papers
2,701 papers found • Page 4 of 14
Conference
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.
Frequency-Dynamic Attention Modulation For Dense Prediction
Linwei Chen, Lin Gu, Ying Fu
Textured 3D Regenerative Morphing with 3D Diffusion Prior
Songlin Yang, Yushi LAN, Honghua Chen et al.
Snakes and Ladders: Two Steps Up for VideoMamba
Hui Lu, Albert Ali Salah, Ronald Poppe
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao, Isaac Chung, Imene Kerboua et al.
Differentially Private Fine-Tuning of Diffusion Models
Yu-Lin Tsai, Yizhe Li, Zekai Chen et al.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong et al.
R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception
Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.
Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
Jorge Herrera, Yi Zhou, Xin Sun et al.
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
Yunheng Li, Yuxuan Li, Quan-Sheng Zeng et al.
Dual-Process Image Generation
Grace Luo, Jonathan Granskog, Aleksander Holynski et al.
Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
Luyao Tang, Kunze Huang, Yuxuan Yuan et al.
ZeroStereo: Zero-shot Stereo Matching from Single Images
Xianqi Wang, Hao Yang, Gangwei Xu et al.
TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation
Yinda Chen, Haoyuan Shi, Xiaoyu Liu et al.
Driving View Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang, Zijie Pan, Yuankun Yang et al.
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration
Javier Tirado-Garín, Javier Civera
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang et al.
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You, Shanglin Li, Yuanrui Sun et al.
SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
Zhentao Tan, Ben Xue, Jian Jia et al.
CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization
Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu, Zanlin Ni, Yeguo Hua et al.
DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
Runqi Wang, Yang Chen, Sijie Xu et al.
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Zhiqi Ge, Juncheng Li, Xinglei Pang et al.
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Zichen Tang, Haihong E, Jiacheng Liu et al.
RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes
Pou-Chun Kung, Skanda Harisha, Ram Vasudevan et al.
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
Andrew Bond, Jui-Hsien Wang, Long Mai et al.
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang, Yao Feng, Alpár Cseke et al.
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh, Hwanhee Jung, JongWook Kim et al.
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
Cui Miao, Tao Chang, meihan wu et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching
Yihong Luo, Tianyang Hu, Yifan Song et al.
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models
hongji yang, Wencheng Han, Yucheng Zhou et al.
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou, Yicong Hong, Zun Wang et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.
Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG, Haonan He, Maria Parelli et al.
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun, Rayan Armani, Manuel Meier et al.
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.
StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
Shakiba Kheradmand, Delio Vicini, George Kopanas et al.
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation
Yujie Zhang, Bingyang Cui, Qi Yang et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma, Qirong Peng, Xu Guo et al.
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR
Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections
Youwei Zhou, Tianyang Xu, Cong Wu et al.
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral, Enis Simsar, Federico Tombari et al.
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model
Jinhua Zhang, Hualian Sheng, Sijia Cai et al.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
Han-Hung Lee, Qinghong Han, Angel Chang
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin et al.
Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava, Xiang Zhang, He Wen et al.
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.
HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai, Yuhang Yang et al.
WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection
Haodong Zhu, Wenhao Dong, Linlin Yang et al.
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior
Minghao Wen, Shengjie Wu, Kangkan Wang et al.
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
Donglin Di, He Feng, Wenzhang SUN et al.
NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion
Zihao Xu, Yuzhi Tang, Bowen Xu et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
Liying Yang, Chen Liu, Zhenwei Zhu et al.
Latent Diffusion Models with Masked AutoEncoders
Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
Hao Zhang, Haolan Xu, Chun Feng et al.
Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang, Yihan Zhuang, Qian Guo et al.
G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation
Juntao Jian, Xiuping Liu, Zixuanchen Zixuanchen et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network
Jianfei Jiang, Qiankun Liu, Haochen Yu et al.
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.
Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking
Qiangqiang Wu, Yi Yu, Chenqi Kong et al.
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang, Mengqi Huang, Yijing Tu et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li, Yutong Chen, Yiqian Wu et al.
Scendi Score: Prompt‑Aware Diversity Evaluation via Schur Complement of CLIP Embeddings
Azim Ospanov, Mohammad Jalali, Farzan Farnia
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
Sijie Li, Chen Chen, Jungong Han
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Dongming Wu, Yanping Fu, Saike Huang et al.
Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu, Yuxiao Yang, Jialun Liu et al.
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning
yan wang, Da-Wei Zhou, Han-Jia Ye
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
shengyuan zhang, An Zhao, Ling Yang et al.
Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis
Kaiyang Ji, Ye Shi, Zichen Jin et al.
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu, Runze Wang, Bin Ren et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang, Zuyan Liu, Yongming Rao et al.
Exploring the Visual Feature Space for Multimodal Neural Decoding
Weihao Xia, Cengiz Oztireli
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
JUNHAO WEI, YU ZHE, Jun Sakuma
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.
Edicho: Consistent Image Editing in the Wild
Qingyan Bai, Hao Ouyang, Yinghao Xu et al.
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi, Eddy Ilg, Margret Keuper et al.
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng, Yuying Ge, Yixiao Ge et al.
Adversarial Robust Memory-Based Continual Learner
Xiaoyue Mi, Fan Tang, Zonghan Yang et al.
MOSCATO: Predicting Multiple Object State Change Through Actions
Parnian Zameni, Yuhan Shen, Ehsan Elhamifar
HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction
Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.
Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
MikuDance: Animating Character Art with Mixed Motion Dynamics
Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger, Florin-Alexandru Vasluianu, Marcos Conde et al.
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye, Yongkun Du, Yunbo Tao et al.
GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments
Lin Zeng, Boming Zhao, Jiarui Hu et al.
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts
Minwen Liao, Hao Dong, Xinyi Wang et al.
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan et al.
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park, Hyojun Go, Hyelin Nam et al.
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye, Jun He, Weijia Li et al.
4D Visual Pre-training for Robot Learning
Chengkai Hou, Yanjie Ze, Yankai Fu et al.
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Yuru Jia, Valerio Marsocci, Ziyang Gong et al.
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Zeyu Cai, Michael Black et al.
IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Qi Wang, Zhipeng Zhang, Baao Xie et al.
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Nan Chen, Mengqi Huang, Yihao Meng et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model
Junjia Huang, Pengxiang Yan, Jinhang Cai et al.
Learning Normal Flow Directly From Events
Dehao Yuan, Levi Burner, Jiayi Wu et al.
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen et al.
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li, Yihao Liu, Shuo Cao et al.
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei, Bo Wang, Robby Tan
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang, Yixuan Li, yanhong zeng et al.
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding
Tianci Wen, Zhiang Liu, Yongchun Fang
Motion Synthesis with Sparse and Flexible Keyjoint Control
Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.
C4D: 4D Made from 3D through Dual Correspondences
Shizun Wang, Zhenxiang Jiang, Xingyi Yang et al.
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner, Thomas Köhler, Lukas Radl et al.
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
Jie Zhu, Yiyang Su, Minchul Kim et al.
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
Jimyeong Kim, Jungwon Park, Yeji Song et al.
Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration
Junyuan Deng, Wei Yin, Xiaoyang Guo et al.
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker, Abhishek Goyal, Rene Vidal
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li et al.
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
Zhenghong Zhou, Jie An, Jiebo Luo
Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction
Zhirui Gao, Renjiao Yi, YaQiao Dai et al.
Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang, Chao Wen, Haoyu Guo et al.
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
CAVIS: Context-Aware Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Kiljoon Han et al.
IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin et al.
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin, Ting Lei, Yang Liu
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.
PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian, Jinyu Miao, Xinyu Jiao et al.
Generative Zoo
Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Changwoon Choi, Jeongjun Kim, Geonho Cha et al.
TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction
Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
Li, Yang Xiao, Jie Ji et al.
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Tianyi Zhao, Boyang Liu, Yanglei Gao et al.
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu, Na Zhao, Gang Niu et al.
Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning
Gaurav Patel, Qiang Qiu
Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh
Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou, Zhanning Gao, Zhili Chen et al.
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, xilong zhou, Umar Iqbal et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
Inwoo Hwang, Bing Zhou, Young Min Kim et al.
Occupancy Learning with Spatiotemporal Memory
Ziyang Leng, Jiawei Yang, Wenlong Yi et al.
ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
Shadi Hamdan, Chonghao Sima, Zetong Yang et al.
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.
GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors
Kang DU, Zhihao Liang, Yulin Shen et al.
DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Rui Yu, Xianghang Zhang, Runkai Zhao et al.
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora et al.
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen, Lei Zhang, Yakun Niu
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.
Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation
Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang et al.
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.
DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization
Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.
Boosting Multimodal Learning via Disentangled Gradient Learning
Shicai Wei, Chunbo Luo, Yang Luo