Most Cited 2025 "linear measurements" Papers
22,274 papers found • Page 12 of 112
Conference
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Mario Lino, Tobias Pfaff, Nils Thuerey
Argumentative Large Language Models for Explainable and Contestable Claim Verification
Gabriel Freedman, Adam Dejl, Deniz Gorur et al.
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
Yukun Chen, Shuo Shao, Enhao Huang et al.
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li, pengyu Liu, Dan Guo et al.
Model Merging in Pre-training of Large Language Models
Yunshui Li, Yiyuan Ma, Shen Yan et al.
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Yujie Zhou, Jiazi Bu, Pengyang Ling et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei et al.
Generative Trajectory Stitching through Diffusion Composition
Yunhao Luo, Utkarsh Mishra, Yilun Du et al.
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
jiajun cao, Yuan Zhang, Tao Huang et al.
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang, Siyuan Wu, Chujie Gao et al.
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Tianyi Yan, Dongming Wu, Wencheng Han et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting
Peiyuan Liu, Beiliang Wu, Yifan Hu et al.
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Yating Wang, Haoyi Zhu, Mingyu Liu et al.
Mixture of In-Context Prompters for Tabular PFNs
Derek Xu, Olcay Cirit, Reza Asadi et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
Chengyang Ye, Yunzhi Zhuge, Pingping Zhang
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
Minghao Guo, Bohan Wang, Kaiming He et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen, Mohamed Elhoseiny
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.
Improving LLM Safety Alignment with Dual-Objective Optimization
Xuandong Zhao, Will Cai, Tianneng Shi et al.
Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning
Kai Jiang, Zhengyan Shi, Dell Zhang et al.
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong, Xu Zheng, Chenfei Liao et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
Shayan Kiyani, George Pappas, Aaron Roth et al.
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
Ma Teng, Xiaojun Jia, Ranjie Duan et al.
Do Language Models Use Their Depth Efficiently?
Róbert Csordás, Christopher D Manning, Chris Potts
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei, Jiacong Wang, Haochen Wang et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
Shengbin Yue, Siyuan Wang, Wei Chen et al.
In Search of Forgotten Domain Generalization
Prasanna Mayilvahanan, Roland Zimmermann, Thaddäus Wiedemer et al.
Reflective Gaussian Splatting
Yuxuan Yao, Zixuan Zeng, Chun Gu et al.
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
On the Benefits of Memory for Modeling Time-Dependent PDEs
Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Shuwei Shi, Wenbo Li, Yuechen Zhang et al.
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu et al.
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
Issar Tzachor, Boaz Lerner, Matan Levy et al.
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
CHUANQI CHENG, Jian Guan, Wei Wu et al.
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
Kaijie Zhu, Xianjun Yang, Jindong Wang et al.
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Junjielong Xu, Qinan Zhang, Zhiqing Zhong et al.
BaxBench: Can LLMs Generate Correct and Secure Backends?
Mark Vero, Niels Mündler, Viktor Chibotaru et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim et al.
Monte Carlo Tree Diffusion for System 2 Planning
Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.
Personality Alignment of Large Language Models
Minjun Zhu, Yixuan Weng, Linyi Yang et al.
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning
Zeyu Gan, Yun Liao, Yong Liu
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Tian Jin, Ellie Cheng, Zachary Ankner et al.
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
Dongxu Wei, Zhiqi Li, Peidong Liu
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Zhitong Xu, Haitao Wang, Jeff Phillips et al.
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
Benchmarking Agentic Workflow Generation
Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Xue zhucun, Jiangning Zhang, Teng Hu et al.
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
Does SGD really happen in tiny subspaces?
Minhak Song, Kwangjun Ahn, Chulhee Yun
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.
InfAlign: Inference-aware language model alignment
Ananth Balashankar, Ziteng Sun, Jonathan Berant et al.
Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan et al.
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal, Yifan Wang, Lu Yin et al.
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
Pranjal Aggarwal, Bryan Parno, Sean Welleck
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Div Garg, Diego Caples, Andis Draguns et al.
CRANE: Reasoning with constrained LLM generation
Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu, Anette Frank
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
Rethinking Aleatoric and Epistemic Uncertainty
Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope et al.
P(all-atom) Is Unlocking New Path For Protein Design
Wei Qu, Jiawei Guan, Rui Ma et al.
Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents
Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
Streaming DiLoCo with overlapping communication
Arthur Douillard, Yani Donchev, J Keith Rush et al.
Revelio: Interpreting and leveraging semantic information in diffusion models
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing
Kuan Li, Liwen Zhang, Yong Jiang et al.
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li, Zhen Xing, Rui Wang et al.
Influence Functions for Scalable Data Attribution in Diffusion Models
Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae et al.
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
Tarun Suresh, Revanth Gangi Reddy, Yifei Xu et al.
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Yihong Luo, Xiaolong Chen, Xinghua Qu et al.
GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution
Jintong Hu, Bin Xia, Bin Chen et al.
Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening
Jie Huang, Rui Huang, Jinghao Xu et al.
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang, Yonatan Bisk
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.
On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems
Alessio Gravina, Moshe Eliasof, Claudio Gallicchio et al.
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.
LEAPS: A discrete neural sampler via locally equivariant networks
Peter Holderrieth, Michael Albergo, Tommi Jaakkola
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Seth Aycock, David Stap, Di Wu et al.
The Illusion of Empathy: How AI Chatbots Shape Conversation Perception
Tingting Liu, Salvatore Giorgi, Ankit Aich et al.
Wyckoff Transformer: Generation of Symmetric Crystals
Nikita Kazeev, Wei Nong, Ignat Romanov et al.
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin, Haoyu Wang, Hongyan Li et al.
Great Models Think Alike and this Undermines AI Oversight
Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina et al.
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Feng Wang, Yaodong Yu, Wei Shao et al.
OSDFace: One-Step Diffusion Model for Face Restoration
Jingkai Wang, Jue Gong, Lin Zhang et al.
OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling
Hongliang Lu, Zhonglin Xie, Yaoyu Wu et al.
Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park et al.
Cut Your Losses in Large-Vocabulary Language Models
Erik Wijmans, Brody Huval, Alexander Hertzberg et al.
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang, Zhitong Xiong, Chenying Liu et al.
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
Xinnan Dai, Haohao QU, Yifei Shen et al.
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot et al.
Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Ziheng Zhou, Jinxing Zhou, Wei Qian et al.
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
Yuhui WU, Liyi Chen, Ruibin Li et al.
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Zhuofan Zong, Dongzhi Jiang, Bingqi Ma et al.
Euler Characteristic Tools for Topological Data Analysis
Olympio Hacquard, Vadim Lebovici
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.
Training-Free Diffusion Model Alignment with Sampling Demons
Po-Hung Yeh, Kuang-Huei Lee, Jun-Cheng Chen
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang, Shiqi Shen, Guangyao Shen et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
Adaptive teachers for amortized samplers
Minsu Kim, Sanghyeok Choi, Taeyoung Yun et al.
Autoregressive Pretraining with Mamba in Vision
Sucheng Ren, Xianhang Li, Haoqin Tu et al.
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
E(n) Equivariant Topological Neural Networks
Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
SAFIRE: Segment Any Forged Image Region
Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam et al.
Learning Long Range Dependencies on Graphs via Random Walks
Dexiong Chen, Till Schulz, Karsten Borgwardt
Dynamic Negative Guidance of Diffusion Models
Felix Koulischer, Johannes Deleu, Gabriel Raya et al.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Marcel Kollovieh, Marten Lienen, David Lüdke et al.
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
Zeyu Yang, Zijie Pan, Chun Gu et al.
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
Thomas Zeng, Shuibai Zhang, Shutong Wu et al.
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi et al.
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du CHEN, Tianhe Wu, Kede Ma et al.
Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang et al.
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Min Ren et al.
Sekai: A Video Dataset towards World Exploration
Zhen Li, Chuanhao Li, Xiaofeng Mao et al.
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.
KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks
Quan Zhou, Changhua Pei, Fei Sun et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu, Wanchao Su, Jing Liao
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
Gradient-Free Generation for Hard-Constrained Systems
Chaoran Cheng, Boran Han, Danielle Maddix et al.
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
Zhenglin Zhou, Xiaobo Xia, Fan Ma et al.
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun, Da-Wei Zhou, Yang Li et al.
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
Ziwei Huang, Jianan Zhou, Zhiguang Cao et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
Yuxiang Fu, Qi Yan, Ke Li et al.
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Xuechen Zhang, Zijian Huang, Yingcong Li et al.
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Yuseung Lee, Jihyeon Je, Chanho Park et al.
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Yifan Sun, Jingyan Shen, Yibin Wang et al.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Zhou, Zijie Li, Michael Schneier et al.
RocketEval: Efficient automated LLM evaluation via grading checklist
Tianjun Wei, Wei Wen, Ruizhi Qiao et al.
Learning General-purpose Biomedical Volume Representations using Randomized Synthesis
Neel Dey, Benjamin Billot, Hallee Wong et al.
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng et al.
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns
Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.
Sort-free Gaussian Splatting via Weighted Sum Rendering
Qiqi Hou, Randall Rauwendaal, Zifeng Li et al.
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao, Wenzhe Cai, Likun Tang et al.
Perturbation-Restrained Sequential Model Editing
Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu, Laura Ruis, Tim Rocktäschel et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
In Search of Adam’s Secret Sauce
Antonio Orvieto, Robert Gower