Most Cited 2025 "confidence interval prediction" Papers
22,274 papers found • Page 11 of 112
Conference
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi et al.
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
Population Transformer: Learning Population-level Representations of Neural Activity
Geeling Chau, Christopher Wang, Sabera Talukder et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua et al.
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
Qizhou Wang, Bo Han, Puning Yang et al.
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
Guy Ohayon, Tomer Michaeli, Michael Elad
Unifying Causal Representation Learning with the Invariance Principle
Dingling Yao, Dario Rancati, Riccardo Cadei et al.
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou et al.
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.
Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunovic et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Ke Fan, Shunlin Lu, Minyue Dai et al.
From Language Models over Tokens to Language Models over Characters
Tim Vieira, Benjamin LeBrun, Mario Giulianelli et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Qiang Liu, Mengyu Chu, Nils Thuerey
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction
Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
Wonjoon Jin, Qi Dai, Chong Luo et al.
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection
Xun Huang, Ziyu Xu, Hai Wu et al.
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.
Transformers are Universal In-context Learners
Takashi Furuya, Maarten V de Hoop, Gabriel Peyré
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma et al.
Effective Diffusion Transformer Architecture for Image Super-Resolution
Kun Cheng, Lei Yu, Zhijun Tu et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
Hanna Wallach, Meera Desai, A. Feder Cooper et al.
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang, Qizhou Wang, Zhuo Huang et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota, Sheridan Feucht, Samuel Marks et al.
Heavy-Tailed Diffusion Models
Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
Improving Data Efficiency via Curating LLM-Driven Rating Systems
Jinlong Pang, Jiaheng Wei, Ankit Parag Shah et al.
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang, Jianmin Bao, Shuyang Gu et al.
Mastering Board Games by External and Internal Planning with Language Models
John Schultz, Jakub Adamek, Matej Jusup et al.
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
Oscillatory State-Space Models
T. Konstantin Rusch, Daniela Rus
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu, Chuanxia Zheng, Frank Guan et al.
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang, Gang Wang, Huan Zhang
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee et al.
A Simple Model of Inference Scaling Laws
Noam Levi
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chen, Yang Zhao et al.
Automated Proof Generation for Rust Code via Self-Evolution
Tianyu Chen, Shuai Lu, Shan Lu et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
Is Noise Conditioning Necessary for Denoising Generative Models?
Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang et al.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh et al.
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang, Mengzhen Liu, Lichen Li et al.
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin et al.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.
Self-Improving Robust Preference Optimization
Eugene Choi, Arash Ahmadian, Matthieu Geist et al.
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Tianyun Yang, Ziniu Li, Juan Cao et al.
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim, James Evans, Aaron Schein
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di et al.
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Yongsen Mao, Junhao Zhong, Chuan Fang et al.
Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
Ermo Hua, Che Jiang, Xingtai Lv et al.
Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function
Christopher Subich, Syed Husain, Leo Separovic et al.
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding, Wu Shenxi, Xiangyu Zhao et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
FonTS: Text Rendering With Typography and Style Controls
Wenda SHI, Yiren Song, Dengming Zhang et al.
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Mingxin Huang, Yuliang Liu, Dingkang Liang et al.
EmbedLLM: Learning Compact Representations of Large Language Models
Richard Zhuang, Tianhao Wu, Zhaojin Wen et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Yifu Guo, Jiaye Lin, Huacan Wang et al.
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
Tanqiu Jiang, Zian Wang, Jiacheng Liang et al.
OneForecast: A Universal Framework for Global and Regional Weather Forecasting
Yuan Gao, Hao Wu, Ruiqi Shu et al.
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao, Zhengyuan Yang, Linjie Li et al.
TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop
Yushan Jiang, Wenchao Yu, Geon Lee et al.
Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins
Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.
GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation
LINHAO LUO, Zicheng Zhao, Reza Haffari et al.
Controlling Language and Diffusion Models by Transporting Activations
Pau Rodriguez, Arno Blaas, Michal Klein et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh, Shuo Han, Hazim Bukhari et al.
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee, Jack Cai, Avi Schwarzschild et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
Unhackable Temporal Reward for Scalable Video MLLMs
En Yu, Kangheng Lin, Liang Zhao et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
Meta CLIP 2: A Worldwide Scaling Recipe
Yung-Sung Chuang, Yang Li, Dong Wang et al.
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
Haohan Chi, Huan-ang Gao, Ziming Liu et al.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
Yuxuan Sun, Yunlong Zhang, Yixuan Si et al.
Inducing Programmatic Skills for Agentic Tasks
Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
MotionFollower: Editing Video Motion via Score-Guided Diffusion
Shuyuan Tu, Qi Dai, Zihao Zhang et al.
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao, Ran Cheng, Xingyu Wu et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan, Xinjian Wu, Yu Zhang et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael Curry, Sven Seuken
Scaling Optimal LR Across Token Horizons
Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.
Flow: Modularized Agentic Workflow Automation
Boye Niu, Yiliao Song, Kai Lian et al.
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov
Reinforced Lifelong Editing for Language Models
Zherui Li, Houcheng Jiang, Hao Chen et al.
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
Hongze CHEN, Zehong Lin, Jun Zhang
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Marius Memmel, Jacob Berg, Bingqing Chen et al.
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
Garrett Tanzer, Biao Zhang
Position: AI Evaluation Should Learn from How We Test Humans
Yan Zhuang, Qi Liu, Zachary Pardos et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky et al.
Radiant Foam: Real-Time Differentiable Ray Tracing
Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi et al.
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer
Yihang Wang, Yuying Qiu, Peng Chen et al.
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng, Stuart Russell, Jacob Steinhardt
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
Songtao Huang, Zhen Zhao, Can Li et al.
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Shikun Feng, Yuyan Ni, Lu yan et al.
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Junzhe Zhu, Yuanchen Ju, Junyi Zhang et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo et al.
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Jiazhe Guo, Yikang Ding, Xiwu Chen et al.
Privacy Auditing of Large Language Models
Ashwinee Panda, Xinyu Tang, Christopher Choquette-Choo et al.
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
Scaling Law with Learning Rate Annealing
Howe Tissue, Venus Wang, Lu Wang
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
Yi Chen, Yuying Ge, Weiliang Tang et al.
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan, Yibo Miao, Jialian Li et al.
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.
FoldToken: Learning Protein Language via Vector Quantization and Beyond
Zhangyang Gao, Cheng Tan, Jue Wang et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park, Andrew Owens
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
Zhifan Ye, Kejing Xia, Yonggan Fu et al.
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu, Yao Qin
Compute or Load KV Cache? Why Not Both?
Shuowei Jin, Xueshen Liu, Qingzhao Zhang et al.
Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
Youhe Jiang, Fangcheng Fu, Xiaozhe Yao et al.
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen, Zhengrong Yue, Siran Chen et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang, YaoYang Liu, Bin Xia et al.
Nonparametric Modern Hopfield Models
Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu et al.
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Aparna Elangovan, Lei Xu, Jongwoo Ko et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
Jaden Fiotto-Kaufman, Alexander Loftus, Eric Todd et al.
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Yi Liu, Changran Xu, Yunhao Zhou et al.
TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting
Peiyuan Liu, Beiliang Wu, Yifan Hu et al.
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Zhitong Xu, Haitao Wang, Jeff Phillips et al.
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Rickard Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su et al.
Does SGD really happen in tiny subspaces?
Minhak Song, Kwangjun Ahn, Chulhee Yun
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
Shengbin Yue, Siyuan Wang, Wei Chen et al.
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu et al.
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
Yukun Chen, Shuo Shao, Enhao Huang et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Mario Lino, Tobias Pfaff, Nils Thuerey
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang, Akio Kodaira, Chenfeng Xu et al.
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu, Ming Ma, Xiaomin Yu et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Shuwei Shi, Wenbo Li, Yuechen Zhang et al.
Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
Shayan Kiyani, George Pappas, Aaron Roth et al.
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
YOUHE JIANG, Ran Yan, Binhang Yuan
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Xue zhucun, Jiangning Zhang, Teng Hu et al.
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
Kaijie Zhu, Xianjun Yang, Jindong Wang et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
Structure Language Models for Protein Conformation Generation
Jiarui Lu, Xiaoyin Chen, Stephen Lu et al.