Most Cited 2025 "3d all-atom models" Papers
22,274 papers found • Page 21 of 112
Conference
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul et al.
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang, Wenqi Jia, Wei Niu et al.
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov et al.
Learning-Order Autoregressive Models with Application to Molecular Graph Generation
Zhe Wang, Jiaxin Shi, Nicolas Heess et al.
From Commands to Prompts: LLM-based Semantic File System for AIOS
Zeru Shi, Kai Mei, Mingyu Jin et al.
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
Zhixuan Liang, Yao Mu, Yixiao Wang et al.
Diffusion Models for Attribution
Xiongren Chen, Jiuyong Li, Jixue Liu et al.
Efficient 3D Recognition with Event-driven Spike Sparse Convolution
Xuerui Qiu, Man Yao, Jieyuan Zhang et al.
Proxy Denoising for Source-Free Domain Adaptation
Song Tang, Wenxin Su, Yan Gan et al.
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos et al.
Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim et al.
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde, Tassilo Wald, Tobias Schumacher et al.
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang, Zekai Li, Zhi-Qi Cheng et al.
Conformal Thresholded Intervals for Efficient Regression
Rui Luo, Zhixin Zhou
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
Lu Yi, Jie Peng, Yanping Zheng et al.
Revisiting Tampered Scene Text Detection in the Era of Generative AI
Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.
Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang, Yujie Zhong, Kai Han
LinPrim: Linear Primitives for Differentiable Volumetric Rendering
Nicolas von Lützow, Matthias Niessner
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training
David Dai, Peilin Chen, Chanakya Ekbote et al.
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang, Yichen Wang, Shihui Yan et al.
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang, Jie Wang, Jilai Pan et al.
Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection
Xiaoyu Huang, Weidong Chen, Bo Hu et al.
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
David Dai, Peilin Chen, Malinda Lu et al.
Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa et al.
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian et al.
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong, Aston Zhang, Xuewei Wang et al.
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang, Zilong Xie, Yicheng Feng et al.
Exploring Vacant Classes in Label-Skewed Federated Learning
Kuangpu Guo, Yuhe Ding, Jian Liang et al.
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
Kangsan Kim, Geon Park, Youngwan Lee et al.
A Meta-Learning Approach to Bayesian Causal Discovery
Anish Dhir, Matthew Ashman, James Requeima et al.
Skill Expansion and Composition in Parameter Space
Tenglong Liu, Jianxiong Li, Yinan Zheng et al.
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Phillip Si, Peng Chen
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks
Yu Zhou, Dian Zheng, Qijie Mo et al.
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Yik Siu Chan, Narutatsu Ri, Yuxin Xiao et al.
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye, Haotian Zhang, Erik Daxberger et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.
Certified Unlearning for Neural Networks
Anastasiia Koloskova, Youssef Allouah, Animesh Jha et al.
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets
Corinna Coupette, Jeremy Wayland, Emily Simons et al.
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels
Xiangyu Dong, Xingyi Zhang, Lei Chen et al.
Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection
Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.
Debiased Multimodal Understanding for Human Language Sequences
Zhi Xu, Dingkang Yang, Mingcheng Li et al.
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Yu Qi, Yuanchen Ju, Tianming Wei et al.
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang, Pingyue Zhang, Zihan Wang et al.
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha, Yapeng Tian
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Yujun Shi, Jun Hao Liew, Hanshu Yan et al.
Learning Personalized Decision Support Policies
Umang Bhatt, Valerie Chen, Katherine M. Collins et al.
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
Test-Time Scaling of Diffusion Models via Noise Trajectory Search
Vignav Ramesh, Morteza Mardani
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Enis Simsar, Thomas Hofmann, Federico Tombari et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan, zhenyi lu, Sichen Liu et al.
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Zhe Li, Bicheng Ying, Zidong Liu et al.
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Jiangrong Shen, Qi Xu, Gang Pan et al.
LeanAgent: Lifelong Learning for Formal Theorem Proving
Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Martin Klissarov, Mikael Henaff, Roberta Raileanu et al.
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Maxime Méloux, Silviu Maniu, François Portet et al.
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Duc-Hai Pham, Tung Do, Phong Nguyen et al.
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst, Tim Woydt, Lukas Helff et al.
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
Huaize Liu, WenZhang Sun, Donglin Di et al.
In-context Time Series Predictor
Jiecheng Lu, Yan Sun, Shihao Yang
Generative Classifiers Avoid Shortcut Solutions
Alexander Li, Ananya Kumar, Deepak Pathak
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Malyaban Bal, Abhronil Sengupta
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang, Ziming Cheng, Junting Pan et al.
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.
Learning Adversarial MDPs with Stochastic Hard Constraints
Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.
Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective
Bo Ni, Yu Wang, Lu Cheng et al.
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
JIXUAN FAN, Wanhua Li, Yifei Han et al.
AKiRa: Augmentation Kit on Rays for Optical Video Generation
Xi Wang, Robin Courant, Marc Christie et al.
Solving New Tasks by Adapting Internet Video Knowledge
Calvin Luo, Zilai Zeng, Yilun Du et al.
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Yiping Wang, Xuehai He, Kuan Wang et al.
QP-SNN: Quantized and Pruned Spiking Neural Networks
Wenjie Wei, Malu Zhang, Zijian Zhou et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM
Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.
Exploring the limits of strong membership inference attacks on large language models
Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.
An OpenMind for 3D Medical Vision Self-supervised Learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi et al.
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Zui Chen, Tianqiao Liu, Tongqing et al.
Edge Prompt Tuning for Graph Neural Networks
Xingbo Fu, Yinhan He, Jundong Li
DiffPuter: Empowering Diffusion Models for Missing Data Imputation
Hengrui Zhang, Liancheng Fang, Qitian Wu et al.
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
Peng Li, Wangguandong Zheng, Yuan Liu et al.
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang et al.
Black-Box Adversarial Attacks on LLM-Based Code Completion
Slobodan Jenko, Niels Mündler, Jingxuan He et al.
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Hanxun Huang, Sarah Erfani, Yige Li et al.
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael Lepori, Jack Merullo et al.
The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Youssef Allouah, Joshua Kazdan, Rachid Guerraoui et al.
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Wenke Xia, Ruoxuan Feng, Dong Wang et al.
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.
Memory Mosaics
Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.
LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andres Almansa et al.
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Yunfei Liu, Lei Zhu, Lijian Lin et al.
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min, Tianyu Pang, Chao Du et al.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas et al.
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
Yiyang Fang, Wenke Huang, Guancheng Wan et al.
Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel et al.
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output
Yanyuan Chen, Dexuan Xu, Yu Huang et al.
Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)
Maja Pavlovic
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu Zhang, Zechun Liu, Yuandong Tian et al.
GraphGPT: Generative Pre-trained Graph Eulerian Transformer
Qifang Zhao, Weidong Ren, Tianyu Li et al.
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
Peiyan Zhang, Haibo Jin, Leyang Hu et al.
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
Jonathan Light, Min Cai, Weiqin Chen et al.
SMamba: Sparse Mamba for Event-based Object Detection
Nan Yang, Yang Wang, Zhanwen Liu et al.
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang, Xiang Liu, Qian Wang et al.
Hierarchical Equivariant Policy via Frame Transfer
Haibo Zhao, Dian Wang, Yizhe Zhu et al.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao, Gongwei Chen et al.
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hanling Zhang, Rundong Su, Zhihang Yuan et al.
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
An Li, Zhe Zhu, Mingqiang Wei
Geometry Field Splatting with Gaussian Surfels
Kaiwen Jiang, Venkataram Sivaram, Cheng Peng et al.
CellFlux: Simulating Cellular Morphology Changes via Flow Matching
Yuhui Zhang, Yuchang Su, Chenyu Wang et al.
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai, Yijie Xu, Jinhui Ye et al.
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Peiqing Yang, Shangchen Zhou, Jixin Zhao et al.
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li, Songtao Lu, Pin-Yu Chen et al.
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian et al.
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun, Yu Gai, Lijie Chen et al.
Efficiently Parameterized Neural Metriplectic Systems
Anthony Gruber, Kookjin Lee, Haksoo Lim et al.
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng, Tongjia Chen, Shoubin Yu et al.
GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
Shujuan Li, Yu-Shen Liu, Zhizhong Han
Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models
Chenrui Cao, Liangcheng Song, Zenan Li et al.
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma et al.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu, Kai Li, Guo Chen et al.
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Levi et al.
Scaling Trends in Language Model Robustness
Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth et al.
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Feize Wu, Yun Pang, Junyi Zhang et al.
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.
Rethinking Invariance in In-context Learning
Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Bocheng Zeng, Qi Wang, Mengtao Yan et al.
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Jing He, Haodong Li, huyongzhe et al.
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
Rashid Mushkani, Perampalli Shravan Nayak, Hugo Berard et al.
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Cheryl Li, Tianyuan Xu, Yiwen Guo
TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks
Mathilde Papillon, Guillermo Bernardez, Claudio Battiloro et al.
Glad: A Streaming Scene Generator for Autonomous Driving
Bin Xie, Yingfei Liu, Tiancai Wang et al.
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fating Hong, Zunnan Xu, Zixiang Zhou et al.
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang, Suyuchen Wang, Lu Li et al.
Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG
Wenbin Wang, Yongcheng Jing, Liang Ding et al.
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Wenbo Hu, Yining Hong, Yanjun Wang et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang et al.
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Songlong Xing, Zhengyu Zhao, Nicu Sebe
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
Akira Ito, Masanori Yamada, Atsutoshi Kumagai
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu et al.
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Hyeonho Jeong, Suhyeon Lee, Jong Ye
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
Yulu Pan, Ce Zhang, Gedas Bertasius
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
Jannis Chemseddine, Christian Wald, Richard Duong et al.
How Expressive are Knowledge Graph Foundation Models?
Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Yongchao Chen, Yilun Hao, Yueying Liu et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
Zhongpai Gao, Benjamin Planche, Meng Zheng et al.
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang, Yiqin Wang, Yansong Tang et al.
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang, Fanghua Ye, Jian Li et al.
MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning
Zihan Chen, Song Wang, Zhen Tan et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
One Node One Model: Featuring the Missing-Half for Graph Clustering
Xuanting Xie, Bingheng Li, Erlin Pan et al.
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Sumeet Singh, Vikas Sindhwani, Stephen Tu
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
Yun Wang, Longguang Wang, Chenghao Zhang et al.
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
Chen Wang, Chuhao Chen, Yiming Huang et al.
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu, Dongyang Dai, Zhiyong Wu
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon, Jong Chul YE
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella