Most Cited 2025 "disparity completion" Papers
22,274 papers found • Page 7 of 112
Conference
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
Sichen Zhu, Yuchen Zhu, Molei Tao et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Xiao Li, Wenxuan Sun, Huanran Chen et al.
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
Yuzhou Ji, He Zhu, Junshu Tang et al.
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Jingyang Zhang, Jingwei Sun, Eric Yeats et al.
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao, Yifan Shen, Bolin Lai et al.
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing, Qi Dai, Zejia Weng et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun yuan, Yaling Shen et al.
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng, Tianyu Pang, Chao Du et al.
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang, Ziwei Zheng, Boxu Chen et al.
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent, Kyle Hsu, Justin Johnson et al.
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
Results of the Big ANN: NeurIPS’23 competition
Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang, Chengqi Duan, Kun Wang et al.
Self-Adapting Language Models
Adam Zweiger, Jyo Pari, Han Guo et al.
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato, Sindy Löwe, Andreas Geiger et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini et al.
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Logan Cross, Violet Xiang, Agam Bhatia et al.
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Antoine Wehenkel, Juan L. Gamella, Ozan Sener et al.
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
Instant Policy: In-Context Imitation Learning via Graph Diffusion
Vitalis Vosylius, Edward Johns
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang, Bang Liu
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang et al.
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang et al.
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen, Qianyu He, Siyu Yuan et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
M-Prometheus: A Suite of Open Multilingual LLM Judges
José Pombal, Dongkeun Yoon, Patrick Fernandes et al.
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Daniil Vankov, Anton Rodomanov, Angelia Nedich et al.
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang, Cheng Chen, xulei yang et al.
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie, Jie chen, Liyu Chen et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Rui Xie, Yinhong Liu, Penghao Zhou et al.
JetFormer: An autoregressive generative model of raw images and text
Michael Tschannen, André Susano Pinto, Alexander Kolesnikov
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Peidong Li, Dixiao Cui
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
Language Representations Can be What Recommenders Need: Findings and Potentials
Leheng Sheng, An Zhang, Yi Zhang et al.
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang et al.
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Guangchen (Eric) Lan, Dong-Jun Han, Abolfazl Hashemi et al.
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, JINGYUAN YANG, Wei Peng
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Julien Siems, Timur Carstensen, Arber Zela et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma, Yonglin Deng, Chen Chen et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro Bassi, Mehmet Yavuz, Ibrahim Ethem Hamamci et al.
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao et al.
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
Tianyu Fu, Tengxuan Liu, Qinghao Han et al.
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, sunhaoze, Tianpeng Li et al.
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
Qizhou Wang, Bo Han, Puning Yang et al.
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang, Junhong Wu, Chen Wang et al.
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulić, Sofia Michel, Jean-Marc Andreoli
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang et al.
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang, Yuchang Su, Yiming Liu et al.
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al.
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Towards a Mechanistic Explanation of Diffusion Model Generalization
Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park, Jeehye Na, Jinyoung Kim et al.
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He, Yangsibo Huang, Weijia Shi et al.
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade et al.
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou et al.
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Guangqi Jiang, Yifei Sun, Tao Huang et al.
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong, Pengxiang Ding, Shangke Lyu et al.
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
Concept Bottleneck Large Language Models
Chung-En Sun, Tuomas Oikarinen, Berk Ustun et al.
ThoughtTerminator: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Xiao Pu, Michael Saxon, Wenyue Hua et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula, Benjamin LeBrun, Li Du et al.
Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins
Aadyot Bhatnagar, Sarthak Jain, Joel Beazer et al.
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin et al.
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
Awsaf Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker et al.
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li, Janardhan Kulkarni, Ishai Menache et al.
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Junzhe Zhu, Yuanchen Ju, Junyi Zhang et al.
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan, Yibo Miao, Jialian Li et al.
Training a Scientific Reasoning Model for Chemistry
Siddharth Narayanan, James Braza, Ryan-Rhys Griffiths et al.
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
Yifei Liu, Li Lyna Zhang, Yi Zhu et al.
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Tianyun Yang, Ziniu Li, Juan Cao et al.
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
Guosheng Zhao, Xiaofeng Wang, Chaojun Ni et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Jiaru Zou, Ling Yang, Jingwen Gu et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
Jiazhe Guo, Yikang Ding, Xiwu Chen et al.
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Mintong Kang, Chejian Xu, Bo Li
Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning
Jinlong Pang, Na Di, Zhaowei Zhu et al.
SWE-bench Goes Live!
Linghao Zhang, Shilin He, Chaoyun Zhang et al.
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael Curry, Sven Seuken
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.
Unhackable Temporal Reward for Scalable Video MLLMs
En Yu, Kangheng Lin, Liang Zhao et al.
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
XINJIE ZHANG, Zhening Liu, Yifan Zhang et al.
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.
MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang, YaoYang Liu, Bin Xia et al.
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade, Puyuan Peng, David Harwath
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen, Zhengrong Yue, Siran Chen et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer, Dan Valentine, Luke Bailey et al.
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu, Chuanxia Zheng, Frank Guan et al.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Qiang Liu, Mengyu Chu, Nils Thuerey
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation
Haoran Luo, Haihong E, Guanting Chen et al.
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh, Shuo Han, Hazim Bukhari et al.
Inducing Programmatic Skills for Agentic Tasks
Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig et al.
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien et al.
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Mingxin Huang, Yuliang Liu, Dingkang Liang et al.
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan, Xinjian Wu, Yu Zhang et al.
MotionFollower: Editing Video Motion via Score-Guided Diffusion
Shuyuan Tu, Qi Dai, Zihao Zhang et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen, Aditya Grover
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan, Haitian Liu, Tengxiao Wu et al.
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Peiwen Sun, Sitong Cheng, Xiangtai Li et al.
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu, Hao Wen et al.
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Shikun Feng, Yuyan Ni, Lu yan et al.
Hierarchical Classification Auxiliary Network for Time Series Forecasting
Yanru Sun, Zongxia Xie, Dongyue Chen et al.
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier, Mickael Chen, David Hurych et al.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Aparna Elangovan, Lei Xu, Jongwoo Ko et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Yongsen Mao, Junhao Zhong, Chuan Fang et al.
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
Oscillatory State-Space Models
T. Konstantin Rusch, Daniela Rus
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Yifu Guo, Jiaye Lin, Huacan Wang et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman, Rishi Dev Jha, Vitaly Shmatikov
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clementine Domine, Nicolas Anguita, Alexandra M Proca et al.
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Michael Scholkemper, Xinyi Wu, Ali Jadbabaie et al.
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.
Heavy-Tailed Diffusion Models
Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Guibin Zhang, Muxin Fu, Kun Wang et al.
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng, Stuart Russell, Jacob Steinhardt
Unlocking Dataset Distillation with Diffusion Models
Brian Moser, Federico Raue, Sebastian Palacio et al.
A Transfer Attack to Image Watermarks
Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa, Dietrich Klakow, Mariya Toneva
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li, Qi Ma, Runyi Yang et al.
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing et al.
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing, Boyan Gao, Zheng Liu et al.
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan, Volodymyr Mnih, Aleksandra Faust et al.
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Biao Yi, Tiansheng Huang, Sishuo Chen et al.
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
Yuhui WU, Liyi Chen, Ruibin Li et al.
Generative Trajectory Stitching through Diffusion Composition
Yunhao Luo, Utkarsh Mishra, Yilun Du et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.