Most Cited 2025 "numerical reconstruction" Papers
22,274 papers found • Page 96 of 112
Conference
ConText: Driving In-context Learning for Text Removal and Segmentation
Fei Zhang, Pei Zhang, Baosong Yang et al.
Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures
Yingzhao Jian, Yue Zhang, Ying Wei et al.
Advancing Personalized Learning with Neural Collapse for Long-Tail Challenge
Hanglei Hu, Yingying Guo, Zhikang Chen et al.
Learning the Electronic Hamiltonian of Large Atomic Structures
Chen Hao Xia, Manasa Kaniselvan, Alexandros Nikolaos Ziogas et al.
Diffusion Counterfactual Generation with Semantic Abduction
Rajat Rasal, Avinash Kori, Fabio De Sousa Ribeiro et al.
When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration
Suorong Yang, Peng Ye, Furao Shen et al.
Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics
Hongbin Pei, Jingxin Hai, Yu Li et al.
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Gouki Minegishi, Hiroki Furuta, Shohei Taniguchi et al.
Weakly-Supervised Contrastive Learning for Imprecise Class Labels
Zi-Hao Zhou, Jun-Jie Wang, Tong Wei et al.
Maintaining Proportional Committees with Dynamic Candidate Sets
Chris Dong, Jannik Peters
Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits
Jinzhao Li, Nan Jiang, Yexiang Xue
Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
Andrey Kofnov, Daniel Kapla, Ezio Bartocci et al.
Reward Translation via Reward Machine in Semi-Alignable MDPs
Yun Hua, Haosheng Chen, Wenhao Li et al.
TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
Xingcheng Zhou, Konstantinos Larintzakis, Hao Guo et al.
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
Junhyuck Kim, Jongho Park, Jaewoong Cho et al.
Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation
Haozhe Ma, Fangling Li, Jing Lim et al.
Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural Networks
Xianliang Xu, Ye Li, Zhongyi Huang
On the Out-of-Distribution Generalization of Self-Supervised Learning
Wenwen Qiang, Jingyao Wang, Zeen Song et al.
Leveraging Diffusion Model as Pseudo-Anomalous Graph Generator for Graph-Level Anomaly Detection
Jinyu Cai, Yunhe Zhang, Fusheng Liu et al.
Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
Diyuan Wu, Marco Mondelli
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
Brett Barkley, David Fridovich-Keil
AtlasD: Automatic Local Symmetry Discovery
Manu Bhat, Jonghyun Park, Jianke Yang et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou, Zengzhi Wang, Qian Liu et al.
The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models
Zichao Li, Xueru Wen, Jie Lou et al.
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova, Erik Brinkman, Krithika Iyer et al.
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Ken Ziyu Liu, Christopher A. Choquette Choo, Matthew Jagielski et al.
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Tianyu Cui, Song-Jun Xu, Artem Moskalev et al.
Polynomial-Time Approximability of Constrained Reinforcement Learning
Jeremy McMahan
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Qingchuan Ma, Yuhang Wu, Xiawu Zheng et al.
No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural Networks
Attila Szász, Balázs Bánhelyi, Mark Jelasity
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim et al.
Large Language Models are Demonstration Pre-Selectors for Themselves
Jiarui Jin, Yuwei Wu, Haoxuan Li et al.
WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
Jiecheng Lu, Xu Han, Yan Sun et al.
Large Language-Geometry Model: When LLM meets Equivariance
Zongzhao Li, Jiacheng Cen, Bing Su et al.
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li, Zhicheng Sun, Yadong Mu
Open Materials Generation with Stochastic Interpolants
Philipp Höllmer, Thomas Egg, Maya Martirossyan et al.
AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment
Yuqin Cao, Xiongkuo Min, Yixuan Gao et al.
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim, Marwa El Halabi, Wonpyo Park et al.
Attributes Shape the Embedding Space of Face Recognition Models
Pierrick Leroy, Antonio Mastropietro, Marco Nurisso et al.
Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan, Rylan Schaeffer, Apratim Dey et al.
Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
Jiecheng Lu, Shihao Yang
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
Yinghui Li, Jiayi Kuang, Haojing Huang et al.
Probing Visual Language Priors in VLMs
Tiange Luo, Ang Cao, Gunhee Lee et al.
Control and Realism: Best of Both Worlds in Layout-to-Image without Training
Bonan Li, Yinhan Hu, Songhua Liu et al.
Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
Tianyi Zhang, Junda Su, Aditya Desai et al.
Tracking Most Significant Shifts in Infinite-Armed Bandits
Joe Suk, Jung-hun Kim
When to Forget? Complexity Trade-offs in Machine Unlearning
Martin Van Waerebeke, Marco Lorenzi, Giovanni Neglia et al.
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi, Taiji Suzuki
M2PDE: Compositional Generative Multiphysics and Multi-component PDE Simulation
Tao Zhang, Zhenhai Liu, Feipeng Qi et al.
Spherical Rotation Dimension Reduction with Geometric Loss Functions
Hengrui Luo, Jeremy E. Purvis, Didong Li
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.
Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg
Like Jian, Dong Liu
Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation
Yihao Yang, Wenke Huang, Guancheng Wan et al.
Understanding High-Dimensional Bayesian Optimization
Leonard Papenmeier, Matthias Poloczek, Luigi Nardi
Learning Configurations for Data-Driven Multi-Objective Optimization
Zhiyang Chen, Hailong Yao, Xia Yin
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Xiaole Zhang, Peiyu Zhang, Xiongye Xiao et al.
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Enze Xie, Junsong Chen, Yuyang Zhao et al.
Going Deeper into Locally Differentially Private Graph Neural Networks
Longzhu He, Chaozhuo Li, Peng Tang et al.
Federated Node-Level Clustering Network with Cross-Subgraph Link Mending
Jingxin Liu, Renda Han, Wenxuan Tu et al.
Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants
Daniele Tramontano, Yaroslav Kivva, Saber Salehkaleybar et al.
HuMoCon: Concept Discovery for Human Motion Understanding
Qihang Fang, Chengcheng Tang, Bugra Tekin et al.
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Siyan Dong, Shuzhe Wang, Shaohui Liu et al.
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen, Mohamed Elhoseiny
Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li et al.
S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Jianyi Wang, Zhijie Lin, Meng Wei et al.
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li et al.
Diffusion Model is Effectively Its Own Teacher
Xinyin Ma, Runpeng Yu, Songhua Liu et al.
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
wenqiao Li, Yao Gu, Xintao Chen et al.
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu, Jianfeng Wang, Zhengyuan Yang et al.
Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration
Lianxin Xie, csbingbing zheng, Si Wu et al.
BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction
Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu et al.
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye et al.
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Yifang Men, Yuan Yao, Miaomiao Cui et al.
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection
Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
Wang Yu-Hang, Junkang Guo, Aolei Liu et al.
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu, Lingdong Kong, hui shuai et al.
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.
Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays
Laurie Bose, Piotr Dudek, Jianing Chen
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph
Rao Fu, Jianmin Zheng, Liang Yu
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang, Haoyu Ma, Zecheng He et al.
Exploring Timeline Control for Facial Motion Generation
Yifeng Ma, Jinwei Qi, Chaonan Ji et al.
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu, Xiaofei Wei, Zixuan Zeng et al.
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Mohamad Hassan N C, Divyam Gupta, Mainak Singha et al.
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing, Hao Chen, Binzhu Xie et al.
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Jiahao Shao, Yuanbo Yang, Hongyu Zhou et al.
Yo’Chameleon: Personalized Vision and Language Generation
Thao Nguyen, Krishna Kumar Singh, Jing Shi et al.
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim, Hea In Jeong, JungHoon Sung et al.
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
Sara Al-Emadi, Yin Yang, Ferda Ofli
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
Unseen Visual Anomaly Generation
HAN SUN, Yunkang Cao, Hao Dong et al.
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
Yizheng Xie, Viktoria Ehm, Paul Roetzer et al.
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Yiyang Ma, Xingchao Liu, Xiaokang Chen et al.
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
Dennis Jacob, Chong Xiang, Prateek Mittal
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu et al.
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.
VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
Juan Luis Gonzalez Bello, Xu Yao, Alex Whelan et al.
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla, Christian Stippel, Leon Sick
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
Hao Ren, Yiming Zeng, Zetong Bi et al.
LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs
Zixuan Hu, Yongxian Wei, Li Shen et al.
TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering
Chun Gu, Xiaofei Wei, Li Zhang et al.
STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds
Zikuan Li, Honghua Chen, Yuecheng Wang et al.
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation
Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
Qiyu Dai, Xingyu Ni, Qianfan Shen et al.
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
Feng Zhou, Ruiyang Liu, chen liu et al.
Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
Joohyun Kwon, Hanbyel Cho, Junmo Kim
EventFly: Event Camera Perception from Ground to the Sky
Lingdong Kong, Dongyue Lu, Xiang Xu et al.
Exploiting Deblurring Networks for Radiance Fields
Haeyun Choi, Heemin Yang, Janghyeok Han et al.
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
Fernando Julio Cendra, Kai Han
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo, Xiaodong Gu
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang, Junliang Guo, Xinyi Xie et al.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong et al.
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
Peirong Liu, Ana Lawry Aguila, Juan Iglesias
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye, Yukang Gan, Xiaoke Huang et al.
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang, Hao Lu, Qingyong Hu et al.
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George et al.
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.
SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection
Phi Vu Tran
What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Omri Kaduri, Shai Bagon, Tali Dekel
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang, Zhong-Zhi Li, Fei Yin et al.
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration
Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng et al.
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Shenghai Yuan, Jinfa Huang, Xianyi He et al.
Associative Transformer
Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.
Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images
Wensheng Cheng, Zhenghong Li, Jiaxiang Ren et al.
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.
OSDFace: One-Step Diffusion Model for Face Restoration
Jingkai Wang, Jue Gong, Lin Zhang et al.
Free-viewpoint Human Animation with Pose-correlated Reference Selection
Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Yu Cao, Zengqun Zhao, Ioannis Patras et al.
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
Three-view Focal Length Recovery From Homographies
Yaqing Ding, Viktor Kocur, Zuzana Berger Haladova et al.
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures
Guoxing Sun, Rishabh Dabral, Heming Zhu et al.
Scene-agnostic Pose Regression for Visual Localization
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
Tomer Garber, Tom Tirer
Localizing Events in Videos with Multimodal Queries
Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma et al.
HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
Yung-Hao Yang, Zitang Sun, Taiki Fukiage et al.
Realistic Test-Time Adaptation of Vision-Language Models
Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.
GOAL: Global-local Object Alignment Learning
Hyungyu Choi, Young Kyun Jang, Chanho Eom
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang, Reuben Tan, Qianhui Wu et al.
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths, Maryam Haghighat, Simon Denman et al.
Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields
Runfeng Li, Mikhail Okunev, Zixuan Guo et al.
Generative Photomontage
Sean J. Liu, Nupur Kumari, Ariel Shamir et al.
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh, Jan Kautz
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
Yitang Li, Mingxian Lin, Zhuo Lin et al.
Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions
Quanyuan Ruan, Jiabao Lei, Wenhao Yuan et al.
Attention IoU: Examining Biases in CelebA using Attention Maps
Aaron Serianni, Tyler Zhu, Olga Russakovsky et al.
Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic
Jianwei Tang, Hong Yang, Tengyue Chen et al.
Feature Selection for Latent Factor Models
Rittwika Kansabanik, Adrian Barbu
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
Hadi Alzayer, Philipp Henzler, Jonathan T. Barron et al.
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Yikun Liu, Yajie Zhang, jiayin cai et al.
DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis
Ziyin Zeng, Mingyue Dong, Jian Zhou et al.
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
Ming Yan, Xincheng Lin, Yuhua Luo et al.
MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation
Jae-Ho Choi, Soheil Hor, Shubo Yang et al.
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.
Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian, Kuan-Chieh Wang, Or Patashnik et al.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou, Hui Ren, Yijia Weng et al.
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
Tianyi Zhu, Dongwei Ren, Qilong Wang et al.
Exploring Temporally-Aware Features for Point Tracking
Inès Hyeonsu Kim, Seokju Cho, Gabriel Huang et al.
Style-Editor: Text-driven Object-centric Style Editing
Jihun Park, Jongmin Gim, Kyoungmin Lee et al.
Locally Orderless Images for Optimization in Differentiable Rendering
Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi
Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention
Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci
A Dataset for Semantic Segmentation in the Presence of Unknowns
Zakaria Laskar, Tomas Vojir, Matej Grcic et al.
Light Transport-aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes
Ludwic Leonard, Nils Thuerey, rüdiger westermann
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.
Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
Hao Tan, Zichang Tan, Jun Li et al.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.
Adaptive Parameter Selection for Tuning Vision-Language Models
Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo et al.
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
Liang Pan, Zeshi Yang, Zhiyang Dou et al.
ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning
Haoyuan Yang, Xiaoou Li, Jiaming Lv et al.
DarkIR: Robust Low-Light Image Restoration
Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang, Xuan Dong, Xizhou Zhu et al.
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.