Most Cited 2025 "markov chain estimation" Papers
22,274 papers found • Page 12 of 112
Conference
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
Xiaohu Du, Fan Mo, Ming Wen et al.
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
Lukas Helff, Felix Friedrich, Manuel Brack et al.
A Second-Order Perspective on Model Compositionality and Incremental Learning
Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega et al.
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie et al.
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.
NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving
Chengyue Wang, Haicheng Liao, Bonan Wang et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Junsung Park, Jungbeom Lee, Jongyoon Song et al.
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation
Yunbei Zhang, Akshay Mehra, Shuaicheng Niu et al.
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai, Huanran Hu, Lei Wang et al.
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
Zekang Yang, Wang Zeng, Sheng Jin et al.
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Yuping Wang, Xiangyu Huang, Xiaokang Sun et al.
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Hongyuan Tao, Ying Zhang, Zhenhao Tang et al.
RelGNN: Composite Message Passing for Relational Deep Learning
Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling
Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels
Ruitao Pu, Yuan Sun, Yang Qin et al.
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Jiatao Gu, Tianrong Chen, David Berthelot et al.
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Zhi Cen, Huaijin Pi, Sida Peng et al.
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.
Sum of Squares Circuits
Lorenzo Loconte, Stefan Mengel, Antonio Vergari
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration
Jipeng Cen, Jiaxin Liu, Zhixu Li et al.
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter
Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning
Jian Lang, Zhangtao Cheng, Ting Zhong et al.
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.
Large Language Model Meets Graph Neural Network in Knowledge Distillation
Shengxiang Hu, Guobing Zou, Song Yang et al.
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Jian Wu, Linyi Yang, Zhen Wang et al.
Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning
Junlin He, Tong Nie, Wei Ma
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
Scaling Inference Time Compute for Diffusion Models
Nanye Ma, Shangyuan Tong, Haolin Jia et al.
Toward Understanding In-context vs. In-weight Learning
Bryan Chan, Xinyi Chen, Andras Gyorgy et al.
Probabilistic Language-Image Pre-Training
Sanghyuk Chun, Wonjae Kim, Song Park et al.
BotSim: LLM-Powered Malicious Social Botnet Simulation
Boyu Qiao, Kun Li, Wei Zhou et al.
Towards Universal Soccer Video Understanding
Jiayuan Rao, Haoning Wu, Hao Jiang et al.
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky, Ian Maksimov, Daniil Gavrilov
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection
Li Li, Huixian Gong, Hao Dong et al.
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization
Guanghan Li, Xun Zhang, Yufei Zhang et al.
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang, Chen-Wei Xie, Haiyang Wang et al.
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer et al.
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
Qian Wang, Zhanzhi Lou, Zhenheng Tang et al.
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
Di Wu, Liu Liu, Zhou Linli et al.
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Jun Liu, Zhenglun Kong, Pu Zhao et al.
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon et al.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv, Yuval Kirstain, Amit Zohar et al.
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao, Yifan Hao, Hanning Zhang et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
Local Conditional Controlling for Text-to-Image Diffusion Models
Yibo Zhao, Liang Peng, Yang Yang et al.
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe, Roger Girgis, Anthony Gosselin et al.
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.
DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Zhenlong Yuan, Jinguo Luo, Fei Shen et al.
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
Yang Liu, Zinan Zheng, Jiashun Cheng et al.
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
Temporal Query Network for Efficient Multivariate Time Series Forecasting
Shengsheng Lin, Haojun Chen, Haijie Wu et al.
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
Runlong Zhou, Maryam Fazel, Simon Shaolei Du
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Abdelrahman Eldesokey, Peter Wonka
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.
Contextual Bandits for Unbounded Context Distributions
Puning Zhao, Rongfei Fan, Shaowei Wang et al.
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
Michael Aerni, Javier Rando, Edoardo Debenedetti et al.
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen, Morgane M Moss, Alessandro Sordoni et al.
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin, Chaoqun Yang, Wenjie Wang et al.
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao et al.
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma, Yaohui Wang, Gengyun Jia et al.
PILAF: Optimal Human Preference Sampling for Reward Modeling
Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.
Truncated Consistency Models
Sangyun Lee, Yilun Xu, Tomas Geffner et al.
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie, Urja Pawar, Phil Blandfort et al.
Event-based Video Super-Resolution via State Space Models
Zeyu Xiao, Xinchao Wang
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Shengeng Tang, Jiayi He, Dan Guo et al.
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Yu Yuan, Xijun Wang, Yichen Sheng et al.
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Gen Zhan, Li Yang et al.
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su, Qiuxia Wu, Panpan Cai et al.
Bayesian scaling laws for in-context learning
Aryaman Arora, Dan Jurafsky, Christopher Potts et al.
AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning
Sudarshan Rajagopalan, Vishal M. Patel
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
Yingjie Chen, Yifang Men, Yuan Yao et al.
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
Junzhe Jiang, Chun Gu, Yurui Chen et al.
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Yuncong Yang, Jiageng Liu, Zheyuan Zhang et al.
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu et al.
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Jihyun Janice Ahn, Wenpeng Yin
MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
Zeyu Zhang, Quanyu Dai, Luyu Chen et al.
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu, Junyu Lai, Yunpeng Huang
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma, Jiongxiao Wang, Fei Wang et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance
Jinming Li, Yichen Zhu, Zhibin Tang et al.
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader et al.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Kongcheng Zhang, QI YAO, Shunyu Liu et al.
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei, Jielin Jiang, Xiaolong Xu
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
Yaming Yang, Dilxat Muhtar, Yelong Shen et al.
On the Relationship Between Monotone and Squared Probabilistic Circuits
Benjie Wang, Guy Van den Broeck
HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers
Yifeng Wang, Yi Zhao
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence
Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei et al.
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
Exploring More from Multiple Gait Modalities for Human Identification
Dongyang Jin, Chao Fan, Weihua Chen et al.
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Song Wang, Peng Wang, Tong Zhou et al.
Establishing Best Practices in Building Rigorous Agentic Benchmarks
Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen, Yang Li, Olga Golovneva et al.
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
Standardizing Structural Causal Models
Weronika Ormaniec, Scott Sussex, Lars Lorch et al.
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Jiachun Li, Pengfei Cao, Zhuoran Jin et al.
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras et al.
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Yuhao Qing, Boyu Zhu, Mingzhe Du et al.
On a Connection Between Imitation Learning and RLHF
Teng Xiao, Yige Yuan, Mingxiao Li et al.
Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen, Delin Chen, Rui Sun et al.
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang et al.
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Hanzhuo Huang, Yuan Liu, Ge Zheng et al.
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Ji-An Li, Huadong Xiong, Robert Wilson et al.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
Let LRMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen et al.
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
Zhanfeng Mo, Long-Kai Huang, Sinno Jialin Pan
FSTA-SNN:Frequency-Based Spatial-Temporal Attention Module for Spiking Neural Networks
Kairong Yu, Tianqing Zhang, Hongwei Wang et al.
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Jinluan Yang, Dingnan Jin, Anke Tang et al.
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
Ziye Huang, Haoqi Yuan, Yuhui Fu et al.
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
David Wessels, David Knigge, Riccardo Valperga et al.
Hyperbolic Fine-Tuning for Large Language Models
Menglin Yang, Ram Samarth B B, Aosong Feng et al.
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
Zhengyi Zhong, Weidong Bao, Ji Wang et al.
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi, Minjing Dong, Chang Xu
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han, seung-won hwang, Rajhans Samdani et al.
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang, Junru Li, Kai Zhang et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Muhammed Ildiz, Halil Gozeten, Ege Taga et al.
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu, Ruining Deng, Tianyuan Yao et al.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Tao Zhang, Cheng Da, Kun Ding et al.
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne et al.
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
Chenxin Li, Hengyu Liu, Zhiwen Fan et al.
Neuroplastic Expansion in Deep Reinforcement Learning
Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.
Are Large Vision Language Models Good Game Players?
Xinyu Wang, Bohan Zhuang, Qi Wu
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
Human Motion Instruction Tuning
Lei Li, Sen Jia, Jianhao Wang et al.
Puppeteer: Rig and Animate Your 3D Models
Chaoyue Song, Xiu Li, Fan Yang et al.
A Periodic Bayesian Flow for Material Generation
Hanlin Wu, Yuxuan Song, Jingjing Gong et al.
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Zikun Zhang, Zixiang Chen, Quanquan Gu
Learning Transformer-based World Models with Contrastive Predictive Coding
Maxime Burchi, Radu Timofte
MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert
Dapeng Zhang, Dayu Chen, Peng Zhi et al.
Conformal Prediction for Causal Effects of Continuous Treatments
Maresa Schröder, Dennis Frauen, Jonas Schweisthal et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
William Chen, Jinchuan Tian, Yifan Peng et al.
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao et al.
A Unifying Framework for Representation Learning
Shaden Alshammari, John Hershey, Axel Feldmann et al.
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Wenzhuo Liu, Fei Zhu, Longhui Wei et al.
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao et al.
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Kanghui Ning, Zijie Pan, Yu Liu et al.
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
Qianxiong Xu, Cheng Long, Ziyue Li et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
Feng yan, Fanfan Liu, Yiyang Huang et al.
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
Jiarui Fang, Jinzhe Pan, Aoyu Li et al.
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Yuzhou Gu, Zhao Song, Lichen Zhang
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.
Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding
Yixiong Fang, Ziran Yang, Zhaorun Chen et al.
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting
Jingru Fei, Kun Yi, Wei Fan et al.