Most Cited 2025 "video-depth generation" Papers
22,274 papers found • Page 23 of 112
Conference
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang et al.
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao, Qingye Meng, Shengping Li et al.
Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
Yannis Montreuil, Axel Carlier, Lai Xing Ng et al.
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
Zhongpai Gao, Benjamin Planche, Meng Zheng et al.
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim et al.
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu, Shengnan Ma, Bo Wang et al.
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Yihong Luo, Tianyang Hu, Weijian Luo et al.
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Zheng Hu, Zhe Li, Ziyun Jiao et al.
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li, Songtao Lu, Pin-Yu Chen et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu, Dongyang Dai, Zhiyong Wu
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
How Expressive are Knowledge Graph Foundation Models?
Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Sumeet Singh, Vikas Sindhwani, Stephen Tu
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Yunfei Liu, Lei Zhu, Lijian Lin et al.
Reconstructing People, Places, and Cameras
Lea Müller, Hongsuk Choi, Anthony Zhang et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon, Jong Chul YE
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Bocheng Zeng, Qi Wang, Mengtao Yan et al.
Glad: A Streaming Scene Generator for Autonomous Driving
Bin Xie, Yingfei Liu, Tiancai Wang et al.
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
How Much Can We Forget about Data Contamination?
Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko et al.
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Yikai Wang, Chenjie Cao, Junqiu Yu et al.
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li, Tianyu Zhang, Zhiqi Bu et al.
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
Weihao Xuan, Junjue Wang, Heli Qi et al.
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Sheng Wang, Liheng Chen, Pengan CHEN et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang, Yuxiang Wei, Xianhui Lin et al.
PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning
Angel Villar-Corrales, Sven Behnke
HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
Yushi Huang, Zining Wang, Ruihao Gong et al.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
Angxiao Yue, Zichong Wang, Hongteng Xu
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu, Hanqing Wang, Ye Shi et al.
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty
Periodic Materials Generation using Text-Guided Joint Diffusion Model
KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
Laurent Condat, Artavazd Maranjyan, Peter Richtarik
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Xander Davies, Eric Winsor, Alexandra Souly et al.
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen, Aaron Defazio, Tsung-Hsien Lee et al.
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu et al.
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
Multi-modal brain encoding models for multi-modal stimuli
SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
Yifei He, Siqi Zeng, Yuzheng Hu et al.
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
Mingkun Zhang, Keping Bi, Wei Chen et al.
Safety Reasoning with Guidelines
Haoyu Wang, Zeyu Qin, Li Shen et al.
Gaussian Mixture Flow Matching Models
Hansheng Chen, Kai Zhang, Hao Tan et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
yifei feng, Mx Yang, Shuhui Yang et al.
Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang, Nati Srebro, David McAllester et al.
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang, Qifan Zhang, Yu-Wei Chao et al.
Improved Training Technique for Latent Consistency Models
Minh Quan Dao, Khanh Doan, Di Liu et al.
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu, Boyun Zheng, Wenting Chen et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
Implicit In-context Learning
Zhuowei Li, Zihao Xu, Ligong Han et al.
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Zidi Xiong, Shan Chen, Zhenting Qi et al.
Constrain Alignment with Sparse Autoencoders
Qingyu Yin, Chak Tou Leong, Hongbo Zhang et al.
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu et al.
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen, Runmin Cong, Yuzhi Zhao et al.
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Keda TAO, Jinjin Gu, Yulun Zhang et al.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
Hang Zhou, Yuezhou Ma, Haixu Wu et al.
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng, Xiang Kong, shuang ma et al.
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim, Marwa El Halabi, Wonpyo Park et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
Hao Ren, Yiming Zeng, Zetong Bi et al.
Diffusion State-Guided Projected Gradient for Inverse Problems
Rayhan Zirvi, Bahareh Tolooshams, anima anandkumar
Towards Federated RLHF with Aggregated Client Preference for LLMs
Feijie Wu, Xiaoze Liu, Haoyu Wang et al.
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong et al.
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images
Hongyi Wang, Xiuju Du, Jing Liu et al.
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan, Songlin Yang, Aaron Courville et al.
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter Súkeník, Zihan Wang et al.
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang, Pengfei Liu, Min Zhou et al.
$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
Saúl Santos, António Farinhas, Daniel McNamee et al.
GENMO: A GENeralist Model for Human MOtion
Jiefeng Li, Jinkun Cao, Haotian Zhang et al.
Selective Prompt Anchoring for Code Generation
Yuan Tian, Tianyi Zhang
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant
Jonas Spinner, Luigi Favaro, Peter Lippmann et al.
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.
Interpreting the Repeated Token Phenomenon in Large Language Models
Itay Yona, Ilia Shumailov, Jamie Hayes et al.
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Zhihao Sun, Haoran Jiang, Haoran Chen et al.
A Closer Look at Multimodal Representation Collapse
Abhra Chaudhuri, Anjan Dutta, Tu Bui et al.
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
Eli Chien, Pan Li
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Kang Liao, Zongsheng Yue, Zhouxia Wang et al.
Node-Time Conditional Prompt Learning in Dynamic Graphs
Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.
R.I.P.: Better Models by Survival of the Fittest Prompts
Ping Yu, Weizhe Yuan, Olga Golovneva et al.
ThermalGaussian: Thermal 3D Gaussian Splatting
Rongfeng Lu, Hangyu Chen, Zunjie Zhu et al.
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar, Vikrant Varma, David Lindner et al.
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
Payman Behnam, Yaosheng Fu, Ritchie Zhao et al.
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration
Yiming Zuo, Willow Yang, Zeyu Ma et al.
HyperGS: Hyperspectral 3D Gaussian Splatting
Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.
Synthesizing Software Engineering Data in a Test-Driven Manner
Lei Zhang, Jiaxi Yang, Min Yang et al.
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge, Xihui Lin, Yunan Zhang et al.
Efficient Randomized Experiments Using Foundation Models
Piersilvio De Bartolomeis, Javier Abad, Guanbo Wang et al.
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen, Guikun Chen, Wenguan Wang et al.
Continuous Diffusion for Mixed-Type Tabular Data
Markus Mueller, Kathrin Gruber, Dennis Fok
Self-Improving Embodied Foundation Models
Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson et al.
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
Emanuele Troiani, Hugo Cui, Yatin Dandi et al.
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang, Tianhan Xu, Yuta Kikuchi
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees
Haonan Yuan, Qingyun Sun, Junhua Shi et al.
Deep Linear Probe Generators for Weight Space Learning
Jonathan Kahana, Eliahu Horwitz, Imri Shuval et al.
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li et al.
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Yating Liu, Zimo Liu, Xiangyuan Lan et al.
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Hui En Pang, Shuai Liu, Zhongang Cai et al.
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Pei Wang, Yanan Wu, Zekun Wang et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
Improving Transformer World Models for Data-Efficient RL
Antoine Dedieu, Joseph Ortiz, Xinghua Lou et al.
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Peiye Zhuang, Songfang Han, Chaoyang Wang et al.
Large Convolutional Model Tuning via Filter Subspace
Wei Chen, Zichen Miao, Qiang Qiu
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen, Ruida Zhou, Jing Yang et al.
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
RoMo: Robust Motion Segmentation Improves Structure from Motion
Lily Goli, Sara Sabour, Mark Matthews et al.
Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence
Yinbin Han, Meisam Razaviyayn, Renyuan Xu
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
Shubham Dipak Ugare, Rohan Gumaste, Tarun Suresh et al.
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao et al.
FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting
Yulong Wang, Yushuo Liu, Xiaoyi Duan et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang, Yuan Dong, Hanwen Jiang et al.
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
Xudong Jiang, Fangjinhua Wang, Silvano Galliani et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy, Syrielle Montariol, Julian Blackwell et al.
Multi-Session Budget Optimization for Forward Auction-based Federated Learning
Xiaoli Tang, Han Yu, Zengxiang Li et al.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xu Wang, Yan Hu, Wenyu Du et al.
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
Identifying Macro Conditional Independencies and Macro Total Effects in Summary Causal Graphs with Latent Confounding
Simon Ferreira, Charles K. Assaad
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
Han Zhou, Wei Dong, Jun Chen
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu, Zeyu Zhang, Zhexin Li et al.
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
zijie wu, Chaohui Yu, Fan Wang et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu, Zhe Wu, Junliang Xing et al.
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Anjiang Wei, Allen Nie, Thiago Teixeira et al.
Tool Unlearning for Tool-Augmented LLMs
Jiali Cheng, Hadi Amiri
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
Wenhui Tan, Boyuan Li, Chuhao Jin et al.
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin, Yong Wang, Guiduo Duan et al.
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li, Yuanzhi Li
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao et al.
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H Beluch, Margret Keuper et al.
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu et al.
Towards Hierarchical Rectified Flow
Yichi Zhang, Yici Yan, Alex Schwing et al.
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim, Takashi Shibuya et al.
Improved Off-policy Reinforcement Learning in Biological Sequence Design
Hyeonah Kim, Minsu Kim, Taeyoung Yun et al.
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex, Sara Atito, Armin Mustafa et al.
Antidistillation Sampling
Yash Savani, Asher Trockman, Zhili Feng et al.
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
yuntao du, Kailin Jiang, Zhi Gao et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
Boosting Latent Diffusion with Perceptual Objectives
Tariq Berrada, Pietro Astolfi, Melissa Hall et al.
RANKCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto Castro, Andrei Panferov, Rush Tabesh et al.
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Kaichen Zhang, Yifei Shen, Bo Li et al.
Disentangled Motion Modeling for Video Frame Interpolation
Jaihyun Lew, Jooyoung Choi, Chaehun Shin et al.
On Measuring Long-Range Interactions in Graph Neural Networks
Jacob Bamberger, Benjamin Gutteridge, Scott le Roux et al.
ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder
Jungho Kim, Changwon Kang, Dongyoung Lee et al.
A Recipe for Generating 3D Worlds from a Single Image
Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang, Yaxiong Wang, Li Zhu et al.
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Ruida Wang, Rui Pan, Yuxin Li et al.
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He, Yuwei Ning, Yipeng Qin et al.
Sensor-Invariant Tactile Representation
Harsh Gupta, Yuchen Mo, Shengmiao Jin et al.
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong, Xian Liu, Tsung-Yi Lin et al.