Most Cited 2025 "on-device training" Papers
22,274 papers found • Page 7 of 112
Conference
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
Reward Guided Latent Consistency Distillation
William Wang, Jiachen Li, Weixi Feng et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
Instant Policy: In-Context Imitation Learning via Graph Diffusion
Vitalis Vosylius, Edward Johns
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang, Bang Liu
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Logan Cross, Violet Xiang, Agam Bhatia et al.
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang et al.
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen, Qianyu He, Siyu Yuan et al.
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Daniil Vankov, Anton Rodomanov, Angelia Nedich et al.
M-Prometheus: A Suite of Open Multilingual LLM Judges
José Pombal, Dongkeun Yoon, Patrick Fernandes et al.
JetFormer: An autoregressive generative model of raw images and text
Michael Tschannen, André Susano Pinto, Alexander Kolesnikov
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Peidong Li, Dixiao Cui
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang, Cheng Chen, xulei yang et al.
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie, Yinhong Liu, Penghao Zhou et al.
Language Representations Can be What Recommenders Need: Findings and Potentials
Leheng Sheng, An Zhang, Yi Zhang et al.
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang et al.
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie, Jie chen, Liyu Chen et al.
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Guangchen (Eric) Lan, Dong-Jun Han, Abolfazl Hashemi et al.
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Julien Siems, Timur Carstensen, Arber Zela et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro Bassi, Mehmet Yavuz, Ibrahim Ethem Hamamci et al.
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao et al.
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
Tianyu Fu, Tengxuan Liu, Qinghao Han et al.
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, JINGYUAN YANG, Wei Peng
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang, Junhong Wu, Chen Wang et al.
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, sunhaoze, Tianpeng Li et al.
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
Qizhou Wang, Bo Han, Puning Yang et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulić, Sofia Michel, Jean-Marc Andreoli
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park, Jeehye Na, Jinyoung Kim et al.
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He, Yangsibo Huang, Weijia Shi et al.
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Guangqi Jiang, Yifei Sun, Tao Huang et al.
Concept Bottleneck Large Language Models
Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong, Pengxiang Ding, Shangke Lyu et al.
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao et al.
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
Towards a Mechanistic Explanation of Diffusion Model Generalization
Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins
Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin et al.
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Junzhe Zhu, Yuanchen Ju, Junyi Zhang et al.
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li, Janardhan Kulkarni, Ishai Menache et al.
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula, Benjamin LeBrun, Li Du et al.
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan, Yibo Miao, Jialian Li et al.
Training a Scientific Reasoning Model for Chemistry
Siddharth Narayanan, James Braza, Ryan-Rhys Griffiths et al.
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael Curry, Sven Seuken
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
Yifei Liu, Li Lyna Zhang, Yi Zhu et al.
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
Jinlong Pang, Na Di, Zhaowei Zhu et al.
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Tianyun Yang, Ziniu Li, Juan Cao et al.
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Jiaru Zou, Ling Yang, Jingwen Gu et al.
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
Awsaf Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker et al.
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Jiazhe Guo, Yikang Ding, Xiwu Chen et al.
SWE-bench Goes Live!
Linghao Zhang, Shilin He, Chaoyun Zhang et al.
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma, Yonglin Deng, Chen Chen et al.
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.
NightHaze: Nighttime Image Dehazing via Self-Prior Learning
Beibei Lin, Yeying Jin, Yan Wending et al.
MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang, YaoYang Liu, Bin Xia et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
XINJIE ZHANG, Zhening Liu, Yifan Zhang et al.
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu, Chuanxia Zheng, Frank Guan et al.
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen, Zhengrong Yue, Siran Chen et al.
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade, Puyuan Peng, David Harwath
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer, Dan Valentine, Luke Bailey et al.
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.
HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation
Haoran Luo, Haihong E, Guanting Chen et al.
Inducing Programmatic Skills for Agentic Tasks
Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Mingxin Huang, Yuliang Liu, Dingkang Liang et al.
LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen, Aditya Grover
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan, Xinjian Wu, Yu Zhang et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh, Shuo Han, Hazim Bukhari et al.
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan, Haitian Liu, Tengxiao Wu et al.
MotionFollower: Editing Video Motion via Score-Guided Diffusion
Shuyuan Tu, Qi Dai, Zihao Zhang et al.
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu, Hao Wen et al.
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Aparna Elangovan, Lei Xu, Jongwoo Ko et al.
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier, Mickael Chen, David Hurych et al.
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Shikun Feng, Yuyan Ni, Lu yan et al.
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Hierarchical Classification Auxiliary Network for Time Series Forecasting
Yanru Sun, Zongxia Xie, Dongyue Chen et al.
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Peiwen Sun, Sitong Cheng, Xiangtai Li et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Oscillatory State-Space Models
T. Konstantin Rusch, Daniela Rus
Heavy-Tailed Diffusion Models
Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Yongsen Mao, Junhao Zhong, Chuan Fang et al.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Guibin Zhang, Muxin Fu, Kun Wang et al.
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Yifu Guo, Jiaye Lin, Huacan Wang et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Mintong Kang, Chejian Xu, Bo Li
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Qiang Liu, Mengyu Chu, Nils Thuerey
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng, Stuart Russell, Jacob Steinhardt
3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei et al.
Unlocking Dataset Distillation with Diffusion Models
Brian Moser, Federico Raue, Sebastian Palacio et al.
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa, Dietrich Klakow, Mariya Toneva
A Transfer Attack to Image Watermarks
Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li, Qi Ma, Runyi Yang et al.
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing, Boyan Gao, Zheng Liu et al.
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang, Siyuan Wu, Chujie Gao et al.
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Variational Diffusion Posterior Sampling with Midpoint Guidance
Badr MOUFAD, Yazid Janati el idrissi, Lisa Bedin et al.
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
Yuhui WU, Liyi Chen, Ruibin Li et al.
Generative Trajectory Stitching through Diffusion Composition
Yunhao Luo, Utkarsh Mishra, Yilun Du et al.
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Biao Yi, Tiansheng Huang, Sishuo Chen et al.
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan, Volodymyr Mnih, Aleksandra Faust et al.
Diverse Preference Learning for Capabilities and Alignment
Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell
Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
Edgar Sucar, Zihang Lai, Eldar Insafutdinov et al.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Mario Lino, Tobias Pfaff, Nils Thuerey
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion
Enrico Ventura, Beatrice Achilli, Gianluigi Silvestri et al.
TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems
Si-Yang Liu, Han-Jia Ye
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
Hongze CHEN, Zehong Lin, Jun Zhang
FonTS: Text Rendering With Typography and Style Controls
Wenda SHI, Yiren Song, Dengming Zhang et al.
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Ruoxuan Feng, Jiangyu Hu, Wenke Xia et al.
Reinforced Lifelong Editing for Language Models
Zherui Li, Houcheng Jiang, Hao Chen et al.
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Agent-Oriented Planning in Multi-Agent Systems
Ao LI, Yuexiang Xie, Songze Li et al.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
Flow: Modularized Agentic Workflow Automation
Boye Niu, Yiliao Song, Kai Lian et al.
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.