Most Cited ICCV "large language models training" Papers
2,701 papers found • Page 10 of 14
Conference
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Yuseung Lee, Jihyeon Je, Chanho Park et al.
GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination
Chengwei REN, Fan Zhang, Liangchao Xu et al.
Any-SSR: How Recursive Least Squares Works in Continual Learning of Large Language Model
Kai Tong, Kang Pan, Xiao Zhang et al.
Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic et al.
MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li et al.
Removing Cost Volumes from Optical Flow Estimators
Simon Kiefhaber, Stefan Roth, Simone Schaub-Meyer
PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation
Zhihao ZHU, Yifan Zheng, Siyu Pan et al.
PanSt3R: Multi-view Consistent Panoptic Segmentation
Lojze Zust, Yohann Cabon, Juliette Marrie et al.
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Sihang Li, Zeyu Jiang, Grace Chen et al.
Progressive Distribution Bridging: Unsupervised Adaptation for Large-scale Pre-trained Models via Adaptive Auxiliary Data
Weinan He, Yixin Zhang, Zilei Wang
AdaDCP: Learning an Adapter with Discrete Cosine Prior for Clear-to-Adverse Domain Generalization
Qi Bi, Yixian Shen, Jingjun Yi et al.
SummDiff: Generative Modeling of Video Summarization with Diffusion
Kwanseok Kim, Jaehoon Hahm, Sumin Kim et al.
Towards Performance Consistency in Multi-Level Model Collaboration
Qi Li, Runpeng Yu, Xinchao Wang
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim, Ju He, Qihang Yu et al.
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang, Fei Xie, Weidong Cai et al.
ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
Yuanhe Guo, Linxi Xie, Zhuoran Chen et al.
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi et al.
Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths
Sounak Mondal, Naveen Sendhilnathan, Ting Zhang et al.
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic
Munish Monga, Vishal Chudasama, Pankaj Wasnik et al.
FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift
yong zhang, Feng Liang, Guanghu Yuan et al.
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Zhiqi Ge, Juncheng Li, Xinglei Pang et al.
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.
Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios
Deng Li, Aming WU, Yang Li et al.
Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation
Lin Bie, Siqi Li, Yifan Feng et al.
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin, Hyung Jin Chang, Eunwoo Kim
PRO-VPT: Distribution-Adaptive Visual Prompt Tuning via Prompt Relocation
Chikai Shang, Mengke Li, Yiqun Zhang et al.
Generalized Deep Multi-view Clustering via Causal Learning with Partially Aligned Cross-view Correspondence
Xihong Yang, Siwei Wang, Jiaqi Jin et al.
Less is More: Empowering GUI Agent with Context-Aware Simplification
Gongwei Chen, Xurui Zhou, Rui Shao et al.
EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang, Bohan Yu, Siqi Yang et al.
When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
Hanqing Liu, Shouwei Ruan, Yao Huang et al.
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
Xiangyue Zhang, Jianfang Li, Jiaxu Zhang et al.
Guiding Diffusion Models with Adaptive Negative Sampling Without External Resources
Alakh Desai, Nuno Vasconcelos
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu et al.
Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
Qi Chen, Lingxiao Yang, Yun Chen et al.
Keep Your Friends Close, and Your Enemies Farther: Distance-aware Voxel-wise Contrastive Learning for Semi-supervised Multi-organ Segmentation
Haochen Zhao, Jianwei Niu, Xuefeng Liu et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.
Hierarchical Divide-and-Conquer Grouping for Classification Adaptation of Pre-Trained Models
Ziqian Lu, Yunlong Yu, Qinyue Tong et al.
Lark: Low-Rank Updates After Knowledge Localization for Few-shot Class-Incremental Learning
Jinxin Shi, Jiabao Zhao, Yifan Yang et al.
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Kaichen Zhang, Yifei Shen, Bo Li et al.
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Jun Zhang, Desen Meng, Zhengming Zhang et al.
ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
Hyun Jun Yook, Ga San Jhun, Cho Hyun et al.
Noise-Modeled Diffusion Models for Low-Light Spike Image Restoration
Ruonan Liu, Lin Zhu, Xijie Xiang et al.
Prototype-based Contrastive Learning with Stage-wise Progressive Augmentation for Self-Supervised Fine-Grained Learning
BaoFeng Tan, Xiu-Shen Wei, Lin Zhao
LMM-Det: Make Large Multimodal Models Excel in Object Detection
Jincheng Li, Chunyu Xie, Ji Ao et al.
ReTracker: Exploring Image Matching for Robust Online Any Point Tracking
Dongli Tan, Xingyi He, Sida Peng et al.
FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Seung-Wook Kim, Seongyeol Kim, Jiah Kim et al.
CMAD: Correlation-Aware and Modalities-Aware Distillation for Multimodal Sentiment Analysis with Missing Modalities
Yan Zhuang, Minhao Liu, Wei Bai et al.
Revelio: Interpreting and leveraging semantic information in diffusion models
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin et al.
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai, Kyle Genova, Songyou Peng et al.
Improved Noise Schedule for Diffusion Training
Tiankai Hang, Shuyang Gu, Jianmin Bao et al.
Test-Time Prompt Tuning for Zero-Shot Depth Completion
Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park et al.
One Encoder to Rule them All: Representation Learning for Model-free Visual Reinforcement Learning using Fourier Neural Operators
Parag Dutta, Mohd Ayyoob, Shalabh Bhatnagar et al.
TITAN: Query-Token based Domain Adaptive Adversarial Learning
Tajamul Ashraf, Janibul Bashir
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang, Yan Teng, Yingchun Wang et al.
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno et al.
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu, Marta Tintore Gazulla, Yongqin Xian et al.
Moderating the Generalization of Score-based Generative Model
Wan Jiang, He Wang, Xin Zhang et al.
LLM-assisted Entropy-based Adaptive Distillation for Unsupervised Fine-grained Visual Representation Learning
Jianfeng Dong, Danfeng Luo, Daizong Liu et al.
Boundary Probing for Input Privacy Protection When Using LMM Services
Xiaofei Hui, Haoxuan Qu, Ping Hu et al.
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
Shiming Chen, Bowen Duan, Salman Khan et al.
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
Xiao Zhang, Fei Wei, Yong Wang et al.
Dataset Distillation as Data Compression: A Rate-Utility Perspective
Youneng Bao, Yiping Liu, Zhuo Chen et al.
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang et al.
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization
ZUYU ZHANG, Ning Chen, Yongshan Liu et al.
A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering under Non-IID Missing Patterns
Xiaorui Jiang, Buyun He, Peng Yuan Zhou et al.
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning.
Daniel DeAlcala, Aythami Morales, Julian Fierrez et al.
One-Shot Knowledge Transfer for Scalable Person Re-Identification
Longhua Li, Lei Qi, Xin Geng
EA-Vit: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu, Wangbo Zhao, Huiwen Zhang et al.
Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
Changsheng Gao, Yifan Ma, Qiaoxi Chen et al.
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding, Wu Shenxi, Xiangyu Zhao et al.
Dataset Distillation via the Wasserstein Metric
Haoyang Liu, Peiran Wang, Yijiang Li et al.
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang, Junlin Xie, Wei Zhang et al.
Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation
Jinjing Zhu, Tianbo Pan, Zidong Cao et al.
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Weida Wang, Changyong He, Jin Zeng et al.
MPBR: Multimodal Progressive Bidirectional Reasoning for Open-Set Fine-Grained Recognition
Junfu Tan, Peiguang Jing, Yu Zhu et al.
MAVias: Mitigate any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.
AnnofreeOD: Detecting All Classes at Low Frame Rates Without Human Annotations
Boyi Sun, Yuhang Liu, Houxin He et al.
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.
Class-Wise Federated Averaging for Efficient Personalization
Gyuejeong Lee, Daeyoung Choi
Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning
Jieyi Tan, Chengwei Zhang, Bo Dang et al.
Multi-view Gaze Target Estimation
Qiaomu Miao, Vivek Golani, Jingyi Xu et al.
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang, Yue Liao, RONG KANG et al.
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou et al.
Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning
Borui Kang, Lei Wang, Zhiping Wu et al.
Prototype Guided Backdoor Defense via Activation Space Manipulation
Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini et al.
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel, Anna Hilsmann, Peter Eisert
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah KHAYATAN, Mustafa Shukor, Jayneel Parekh et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.
Staining and Locking Computer Vision Models Without Retraining
Oliver Sutton, Qinghua Zhou, George Leete et al.
Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning
Wufei Xie, Yalin Wang, Chenliang Liu et al.
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang et al.
POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction
Songyan Zhang, Yongtao Ge, Jinyuan Tian et al.
Stochastic Interpolants for Revealing Stylistic Flows across the History of Art
Pingchuan Ma, Ming Gui, Johannes Schusterbauer et al.
Is Tracking really more challenging in First Person Egocentric Vision?
Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi, Manon Béchaz, Zeming Chen et al.
Learning Large Motion Estimation from Intermediate Representations with a High-Resolution Optical Flow Dataset Featuring Long-Range Dynamic Motion
Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon
MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion
peilin Tao, Hainan Cui, Diantao Tu et al.
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
Chen Lin, Weizhi Du, Zhixiang Min et al.
Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation
Pengfei Ren, Jingyu Wang, Haifeng Sun et al.
Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation
Chen Gao, Shuo Zhang, Youfang Lin
SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion
Yuxi Xiao, Jianyuan Wang, Nan Xue et al.
A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks
Qi Bi, Jingjun Yi, Huimin Huang et al.
IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin et al.
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang, Cheng Han, James Liang et al.
Simultaneous Motion And Noise Estimation with Event Cameras
Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego
Weakly-Supervised Learning of Dense Functional Correspondences
Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
Andrew Bond, Jui-Hsien Wang, Long Mai et al.
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis, Ahmet Karadeniz, Sebastian Cavada et al.
Exploring View Consistency for Scene-Adaptive Low-Light Light Field Image Enhancement
Shuo Zhang, Chen Gao, Youfang Lin
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Yash Garg, Saketh Bachu, Arindam Dutta et al.
Tracking Tiny Drones against Clutter: Large-Scale Infrared Benchmark with Motion-Centric Adaptive Algorithm
Jiahao Zhang, Zongli Jiang, Gang Wang et al.
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen, Sungmin Eum, Doheon Lee et al.
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
Chengxu Liu, Lu Qi, Jinshan Pan et al.
H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction
Heng Jia, Na Zhao, Linchao Zhu
Find Any Part in 3D
Ziqi Ma, Yisong Yue, Georgia Gkioxari
Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion
Junru Lin, Chirag Vashist, Mikaela Uy et al.
SpikeDiff: Zero-shot High-Quality Video Reconstruction from Chromatic Spike Camera and Sub-millisecond Spike Streams
Siqi Yang, Jinxiu Liang, Zhaojun Huang et al.
EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting
Yufeng Jin, Vignesh Prasad, Snehal Jauhri et al.
Background Invariance Testing According to Semantic Proximity
Zukang Liao, Min Chen
RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration
Chong Cheng, Yu Hu, Sicheng Yu et al.
CObL: Toward Zero-Shot Ordinal Layering without User Prompting
Aneel Damaraju, Dean Hazineh, Todd Zickler
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge, Shree Nayar
TopicGeo: An Efficient Unified Framework for Geolocation
Xin Wang, Xinlin Wang, Shuiping Gou
Partially Matching Submap Helps: Uncetainty Modeling and Propagation for Text to Point Cloud Localization
Mingtao Feng, Longlong Mei, Zijie Wu et al.
Beyond Pixel Uncertainty: Bounding the OoD Objects in Road Scenes
Huachao Zhu, Zelong Liu, Zhichao Sun et al.
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou et al.
Environment-Agnostic Pose: Generating Environment-independent Object Representations for 6D Pose Estimation
Shaobo Zhang, Yuhang Huang, Wanqing Zhao et al.
Online Dense Point Tracking with Streaming Memory
Qiaole Dong, Yanwei Fu
Test-Time Retrieval-Augmented Adaptation for Vision-Language Models
Xinqi Fan, Xueli CHEN, Luoxiao Yang et al.
Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos
Chengbo Yuan, Geng Chen, Li Yi et al.
MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu, Jiawei Shi, Zheng Dang et al.
ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction
ADEELA ISLAM, Stefano Fiorini, Stuart James et al.
Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Philipp Wulff, Felix Wimbauer, Dominik Muhle et al.
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao et al.
Combinative Matching for Geometric Shape Assembly
Nahyuk Lee, Juhong Min, Junhong Lee et al.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus, Carl Doersch, Yi Yang et al.
Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes
Tom Fischer, Xiaojie Zhang, Eddy Ilg
A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition
Connor Malone, Somayeh Hussaini, Tobias Fischer et al.
Error Recognition in Procedural Videos using Generalized Task Graph
Shih-Po Lee, Ehsan Elhamifar
FaceShield: Defending Facial Image against Deepfake Threats
Jaehwan Jeong, Sumin In, Sieun Kim et al.
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An Lun Liu, Yu-Wei Chao, Yi-Ting Chen
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
shanlin sun, Yifan Wang, Hanwen Zhang et al.
Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars
Vanessa Sklyarova, Egor Zakharov, Malte Prinzler et al.
TeRA: Rethinking Text-guided Realistic 3D Avatar Generation
Yanwen Wang, Yiyu Zhuang, Jiawei Zhang et al.
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng, Zihao Wang, Shaofei Cai et al.
E-NeMF: Event-based Neural Motion Field for Novel Space-time View Synthesis of Dynamic Scenes
Yan Liu, Zehao Chen, Haojie Yan et al.
MonSTeR: a Unified Model for Motion, Scene, Text Retrieval
Luca Collorone, Matteo Gioia, Massimiliano Pappa et al.
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
shiduo zhang, Zhe Xu, Peiju Liu et al.
TrackVerse: A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning
Yibing Wei, Samuel Church, Victor Suciu et al.
Robust Test-Time Adaptation for Single Image Denoising Using Deep Gaussian Prior
Qing Ma, Pengwei Liang, Xiong Zhou et al.
Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
Jorge Herrera, Yi Zhou, Xin Sun et al.
Punching Bag vs. Punching Person: Motion Transferability in Videos
Raiyaan Abdullah, Jared Claypoole, Michael Cogswell et al.
Laboring on less labors: RPCA Paradigm for Pan-sharpening
honghui xu, Chuangjie Fang, Yibin Wang et al.
WarpHE4D: Dense 4D Head Map toward Full Head Reconstruction
Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park et al.
MBTI: Masked Blending Transformers with Implicit Positional Encoding for Frame-rate Agnostic Motion Estimation
Jungwoo Huh, Yeseung Park, Seongjean Kim et al.
GENMO: A GENeralist Model for Human MOtion
Jiefeng Li, Jinkun Cao, Haotian Zhang et al.
Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Yifan Liu, Shengjun Zhang, Chensheng Dai et al.
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky et al.
TimeBooth: Disentangled Facial Invariant Representation for Diverse and Personalized Face Aging
Zepeng Su, zhulin liu, Zongyan Zhang et al.
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
Rui Wang, Yimu Sun, Jingxing Guo et al.
Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation
Tanay Agrawal, Abid Ali, Antitza Dantcheva et al.
FlowDPS : Flow-Driven Posterior Sampling for Inverse Problems
Jeongsol Kim, Bryan Sangwoo Kim, Jong Ye
ZFusion: Efficient Deep Compositional Zero-shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior
Alireza Esmaeilzehi, Hossein Zaredar, Yapeng Tian et al.
Learning A Unified Template for Gait Recognition
Panjian Huang, Saihui Hou, Junzhou Huang et al.
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
Quanwei Yang, Luying Huang, Kaisiyuan Wang et al.
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
Zhenghong Zhou, Jie An, Jiebo Luo
MorphoGen: Efficient Unconditional Generation of Long-Range Projection Neuronal Morphology via a Global-to-Local Framework
Tianfang Zhu, Hongyang Zhou, Anan LI
GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars
Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein et al.
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
Jie Zhu, Yiyang Su, Minchul Kim et al.
Capturing head avatar with hand contacts from a monocular video
Haonan He, Yufeng Zheng, Jie Song
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong, Pengxiang Ding, Shangke Lyu et al.
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
Yangyi Huang, Ye Yuan, Xueting Li et al.
Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin, Zian Wang, Ruofan Liang et al.
Unfolding-Associative Encoder-Decoder Network with Progressive Alignment for Pansharpening
Shijie Fang, Hongping Gan
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
Tao Wang, Peiwen Xia, Bo Li et al.
DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation
Haitao Tian
EVDM: Event-based Real-world Video Deblurring with Mamba
Zhijing Sun, Senyan Xu, Kean Liu et al.
Q-Norm: Robust Representation Learning via Quality-Adaptive Normalization
Lanning Zhang, Ying Zhou, Fei Gao et al.
Proxy-Bridged Game Transformer for Interactive Extreme Motion Prediction
Yanwen Fang, Wenqi Jia, Xu Cao et al.
Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions
Thomas Dagès, Michael Lindenbaum, Alfred Bruckstein
RobAVA: A Large-scale Dataset and Baseline Towards Video based Robotic Arm Action Understanding
Baoli Sun, Ning Wang, Xinzhu Ma et al.
IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Yiming Wu, Huan Wang, Zhenghao Chen et al.
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang, Dayoung Gong, Manjin Kim et al.
Not All Degradations Are Equal: A Targeted Feature Denoising Framework for Generalizable Image Super-Resolution
hongjun wang, Jiyuan Chen, Zhengwei Yin et al.
Fine-Grained 3D Gaussian Head Avatars Modeling from Static Captures via Joint Reconstruction and Registration
Yuan Sun, Xuan Wang, Cong Wang et al.
SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting
Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib et al.
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
Zhuo Li, Mingshuang Luo, RuiBing Hou et al.
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
Thomas Kreutz, Max Mühlhäuser, Alejandro Sanchez Guinea
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang, Yi Yang
SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
Pingchuan Ma, Xiaopei Yang, Ming Gui et al.
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lyu, Chenyang Si, Tianlin Pan et al.