Most Cited ICCV "lightweight model architecture" Papers
2,701 papers found • Page 6 of 14
Conference
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong et al.
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
EA-KD: Entropy-based Adaptive Knowledge Distillation
Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li et al.
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang, Qixiang ZHANG, Lehan Wang et al.
CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh, Adriana Kovashka
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Sanghun Jung, Jingjing Zheng, Ke Zhang et al.
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li et al.
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu et al.
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan, Yunhe Feng, Yapeng Tian et al.
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao et al.
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu, Yunfan Ye, Fan Zhang et al.
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen, Huan Zheng, Yucheng Zhou et al.
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts, Kai Han, Samuel Albanie
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng, Hongzhi Wang
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang et al.
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo et al.
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu, Chen Chen
X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu, Yuanhao Cai, Ruyi Zha et al.
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu, Zhanxuan Hu, Yu Duan et al.
4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu et al.
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang, Hao Luo, Yuan Sun et al.
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou et al.
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li, Yiming Zhu, Yifan Huang et al.
TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration
Gong Meiqi, Hao Zhang, Xunpeng Yi et al.
Moderating the Generalization of Score-based Generative Model
Wan Jiang, He Wang, Xin Zhang et al.
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun, Yanjing Li, Linlin Yang et al.
ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers
Nicholas DiBrita, Jason Han, Tirthak Patel
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction
Danhui Chen, Ziquan Liu, Chuxi Yang et al.
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano, Aditya Arora, Luis Herranz et al.
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu, Shangkun Sun, Haoran Tang et al.
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno et al.
Underwater Visual SLAM with Depth Uncertainty and Medium Modeling
Rui Liu, Sheng Fan, Wenguan Wang et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge, Shree Nayar
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie, Tengda Han, Max Bain et al.
FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
Yunpeng Bai, Qixing Huang
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu, Yuzhen N/A, Zhimin Sun et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen, Tao Han, Song Guo et al.
Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia, Bingkui Tong, Yuhang Zang et al.
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
Miroslav Purkrabek, Jiri Matas
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao, Feng Hong, Mengxi Chen et al.
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Hai Huang, Yan Xia, Sashuai Zhou et al.
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao et al.
Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu et al.
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry Chen, Yi Wei, Luowei Zhou et al.
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho et al.
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan et al.
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo, Mohan Kankanhalli
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning
Yafei Zhang, Lingqi Kong, Huafeng Li et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth, David Rozenberszki, Angela Dai
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo, Xin Gao, Yuxiang Yan et al.
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Congyi Fan, Jian Guan, Xuanjia Zhao et al.
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai, Pengfei Zhou, xu Pan et al.
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.
Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.
Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement
Priyank Pathak, Yogesh Rawat
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang, Yanyuan Qiao, Qunbo Wang et al.
Simultaneous Motion And Noise Estimation with Event Cameras
Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.
Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions
Yuanhong Zheng, Ruixuan Yu, Jian Sun
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders
Ilan Naiman, Emanuel Baruch Baruch, Oron Anschel et al.
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo, Shengkun Tang, Cong Zeng et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian, Zheng Lu, Mingqi Gao et al.
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci, Shanyan Guan, Yanhao Ge et al.
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
Jiannan Ge, Lingxi Xie, Hongtao Xie et al.
Hybrid-grained Feature Aggregation with Coare-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
Wenyao Zhang, Hongsi Liu, Bohan Li et al.
Backdoor Attacks on Neural Networks via One-Bit Flip
Xiang Li, Lannan Luo, Qiang Zeng
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia, Candace Ross, Karen Ullrich et al.
Differentiable Room Acoustic Rendering with Multi-View Vision Priors
Derong Jin, Ruohan Gao
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.
CVPT: Cross Visual Prompt Tuning
Lingyun Huang, Jianxu Mao, Junfei YI et al.
Consensus-Driven Active Model Selection
Justin Kay, Grant Horn, Subhransu Maji et al.
FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging
Xin You, Runze Yang, Chuyan Zhang et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu, Dong Li, Yuxin Ma et al.
OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization
Saihui Hou, Panjian Huang, Zengbin Wang et al.
Addressing Text Embedding Leakage in Diffusion-based Image Editing
Sunung Mun, Jinhwan Nam, Sunghyun Cho et al.
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu, Linghe Kong, Guihai Chen
CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling
Trong-Thang Pham, AKASH AWASTHI, Saba Khan et al.
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Jessica Bader, Leander Girrbach, Stephan Alaniz et al.
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Youngjung Uh et al.
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang et al.
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.
Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu et al.
SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning
XIN Hu, Ke Qin, Guiduo Duan et al.
SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior
Bo Zhao, Haoran Wang, Jinghui Wang et al.
SAS: Segment Any 3D Scene with Integrated 2D Priors
Zhuoyuan Li, Jiahao Lu, Jiacheng Deng et al.
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Chuanyu Fu, Yuqi Zhang, Kunbin Yao et al.
Multi-Modal Few-Shot Temporal Action Segmentation
Zijia Lu, Ehsan Elhamifar
SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang, Yu Liang, Hao Zhang et al.
LoRAverse: A Submodular Framework to Retrieve Diverse Adapters for Diffusion Models
Mert Sonmezer, Matthew Zheng, Pinar Yanardag
Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
Guanyi Qin, Ziyue Wang, Daiyun Shen et al.
Autoregressive Denoising Score Matching is a Good Video Anomaly Detector
hanwen Zhang, Congqi Cao, Qinyi Lv et al.
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Huy Ta, Duy Anh Huynh, Yutong Xie et al.
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.
DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
Jijun Xiang, Xuan Zhu, Xianqi Wang et al.
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
Quang Nguyen, Nhat Le, Baoru Huang et al.
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar, Jiteng Mu, Nuno Vasconcelos
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim et al.
Consistency Trajectory Matching for One-Step Generative Super-Resolution
Weiyi You, Mingyang Zhang, Leheng Zhang et al.
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal, Huanrui Yang, Mark Horton et al.
Improving Rectified Flow with Boundary Conditions
Xixi Hu, Runlong Liao, Bo Liu et al.
RePoseD: Efficient Relative Pose Estimation With Known Depth Information
Yaqing Ding, Viktor Kocur, VACLAV VAVRA et al.
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
Hayeon Kim, Ji Ha Jang, Se Young Chun
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi, yibing zhang, Xinyu Xiang et al.
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
Junzhe Lu, Jing Lin, Hongkun Dou et al.
Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding
Xiaojie Zhang, Yuanfei Wang, Ruihai Wu et al.
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
LI XIAOJIE, Ronghui Li, Shukai Fang et al.
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng, Jieyu Zhang, Mohammadreza Salehi et al.
Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
Yifan Zhan, Qingtian Zhu, Muyao Niu et al.
MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data Generation
Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.
Gait-X: Exploring X modality for Generalized Gait Recognition
Zengbin Wang, Saihui Hou, Junjie Li et al.
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Yiming Wu, Huan Wang, Zhenghao Chen et al.
Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions
Thomas Dagès, Michael Lindenbaum, Alfred Bruckstein
Understanding Co-speech Gestures in-the-wild
Sindhu Hegde, K R Prajwal, Taein Kwon et al.
VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data
Jian Shi, Peter Wonka
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
CarGait: Cross-Attention based Re-ranking for Gait recognition
Gavriel Habib, Noa Barzilay, Or Shimshi et al.
VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction
Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.
Preacher: Paper-to-Video Agentic System
Jingwei Liu, Ling Yang, Hao Luo et al.
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Xiangdong Zhang, Shaofeng Zhang, Junchi Yan
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert Zhai, Evan Chen et al.
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression
Wenjie Huang, Qi Yang, Shuting Xia et al.
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.
Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Pulkit Kumar, Shuaiyi Huang, Matthew Walmer et al.
DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad, Yogesh Rawat
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
In Cho, Youngbeom Yoo, Subin Jeon et al.
SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing
Heyi Sun, Cong Wang, Tian-Xing Xu et al.
Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion
Mutian Xu, Chongjie Ye, Haolin Liu et al.
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim, Jaewoo Ahn, Gunhee Kim
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
Chen Yi Lu, Mehrab Tanjim, Ishita Dasgupta et al.
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel, Anna Hilsmann, Peter Eisert
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li, Chinthani Sugandhika, Ee Yeo Keat et al.
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Jiazheng Liu, Zejin Wang, Bohao Chen et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do, Munchurl Kim
PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Geonhee Sim, Gyeongsik Moon
Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld
Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model
Daehee Park, Monu Surana, Pranav Desai et al.
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
Young Kyun Jang, Ser-Nam Lim
ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu, Kehan Li, Dongbai Li et al.
MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances
Yunzhe Shao, Xinyu Yi, Lu Yin et al.
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.
Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting
Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.
Identity Preserving 3D Head Stylization with Multiview Score Distillation
Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Shaowei Liu, chuan guo, Bing Zhou et al.
MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions
Qingyuan Zhou, Yuehu Gong, Weidong Yang et al.
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Tianli Liao, Chenyang Zhao, Lei Li et al.
SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models
Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
Yana Hasson, Pauline Luc, Liliane Momeni et al.
Generate, Transduct, Adapt: Iterative Transduction with VLMs
Oindrila Saha, Logan Lawrence, Grant Horn et al.
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.