Most Cited 2025 "noise prediction module" Papers
22,274 papers found • Page 10 of 112
Conference
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Ruoxuan Feng, Jiangyu Hu, Wenke Xia et al.
Self-Adapting Language Models
Adam Zweiger, Jyo Pari, Han Guo et al.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang, Jisang Yoo, Jihyeon Park et al.
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang, Wei Cheng, Sijin Chen et al.
OrcaLoca: An LLM Agent Framework for Software Issue Localization
Zhongming Yu, Hejia Zhang, Yujie Zhao et al.
Towards a Mechanistic Explanation of Diffusion Model Generalization
Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy et al.
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li, Yihua Zhang, shuai ZHANG et al.
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Zizheng Pan, Bohan Zhuang, De-An Huang et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen, Aditya Grover
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae, Adam Fisch, Hrayr Harutyunyan et al.
Modifying Large Language Model Post-Training for Diverse Creative Writing
John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun yuan, Yaling Shen et al.
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.
3D Vision-Language Gaussian Splatting
Qucheng Peng, Benjamin Planche, Zhongpai Gao et al.
Generalizable Human Gaussians from Single-View Image
Jinnan Chen, Chen Li, Jianfeng Zhang et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
Fast Exact Unlearning for In-Context Learning Data for LLMs
Andrei Muresanu, Anvith Thudi, Michael Zhang et al.
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw, Shivam Singhal, Anca Dragan
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou, Austin Xu, PeiFeng Wang et al.
Core Knowledge Deficits in Multi-Modal Language Models
Yijiang Li, Qingying Gao, Tianwei Zhao et al.
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.
BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion
Huafeng Li, Dayong Su, Qing Cai et al.
Regularization by Texts for Latent Diffusion Inverse Solvers
Jeongsol Kim, Geon Yeong Park, Hyungjin Chung et al.
Instant Policy: In-Context Imitation Learning via Graph Diffusion
Vitalis Vosylius, Edward Johns
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering
Zhuowei Li, Haizhou Shi, Yunhe Gao et al.
SWE-bench Goes Live!
Linghao Zhang, Shilin He, Chaoyun Zhang et al.
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
Shivam Gupta, Linda Cai, Sitan Chen
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
Yifei Liu, Li Lyna Zhang, Yi Zhu et al.
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka et al.
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li, Weijian Ma, Xueyang Li et al.
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
Wang Jiarui, Huiyu Duan, Guangtao Zhai et al.
Weight ensembling improves reasoning in language models
Xingyu Dang, Christina Baek, Kaiyue Wen et al.
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo, Xiaodong Gu
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Chenyu Li, Oscar Michel, Xichen Pan et al.
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover et al.
JetFormer: An autoregressive generative model of raw images and text
Michael Tschannen, André Susano Pinto, Alexander Kolesnikov
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
Chongkai Gao, Haozhuo Zhang, Zhixuan Xu et al.
Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
Ying Jin, Jinlong Peng, Qingdong He et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
chengqian gao, Haonan Li, Liu Liu et al.
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng, Tianyu Pang, Chao Du et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang, Bin Shan, Wei Shi et al.
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Mingyang Liu, Gabriele Farina, Asuman Ozdaglar
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Jingyang Zhang, Jingwei Sun, Eric Yeats et al.
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
Sichen Zhu, Yuchen Zhu, Molei Tao et al.
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang, Ahmed Elgohary Ghoneim, Ahmed Magooda et al.
M-Prometheus: A Suite of Open Multilingual LLM Judges
José Pombal, Dongkeun Yoon, Patrick Fernandes et al.
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Peiwen Sun, Sitong Cheng, Xiangtai Li et al.
Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs
Jianxiang Yu, Yuxiang Ren, Chenghua Gong et al.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Backtracking Improves Generation Safety
Yiming Zhang, Jianfeng Chi, Hailey Nguyen et al.
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
Universal Image Restoration Pre-training via Degradation Classification
Jiakui Hu, Lujia Jin, Zhengjian Yao et al.
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer, Dan Valentine, Luke Bailey et al.
Agent-Oriented Planning in Multi-Agent Systems
Ao LI, Yuexiang Xie, Songze Li et al.
MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
XINJIE ZHANG, Zhening Liu, Yifan Zhang et al.
GOAL: A Generalist Combinatorial Optimization Agent Learner
Darko Drakulić, Sofia Michel, Jean-Marc Andreoli
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney et al.
Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu et al.
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu, Hao Wen et al.
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
Shiqi Chen, Jinghan Zhang, Tongyao Zhu et al.
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert Dick et al.
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
Tianyu Fu, Tengxuan Liu, Qinghao Han et al.
Halton Scheduler for Masked Generative Image Transformer
Victor Besnier, Mickael Chen, David Hurych et al.
Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
Edgar Sucar, Zihang Lai, Eldar Insafutdinov et al.
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng, Gang Xiong, Ruixi Qiao et al.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Discrepancy Minimization in Input-Sparsity Time
Yichuan Deng, Xiaoyu Li, Zhao Song et al.
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan, Haitian Liu, Tengxiao Wu et al.
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li, Xianglong Yan, Tianao Zhang et al.
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Logan Cross, Violet Xiang, Agam Bhatia et al.
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing, Qi Dai, Zejia Weng et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
Yuhao Wang, Xuehu Liu, Tianyu Yan et al.
CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer et al.
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang, Ruoxi Ning, Boqi Pan et al.
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
Xiang Lan, Feng Wu, Kai He et al.
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen, Zehuan Huang, Yaohui Wang et al.
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini, Sachin Goyal, Dylan Sam et al.
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Daniil Vankov, Anton Rodomanov, Angelia Nedich et al.
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He, Yangsibo Huang, Weijia Shi et al.
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
Andrew Szot, Bogdan Mazoure, Omar Attia et al.
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Peidong Li, Dixiao Cui
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang, Tiancheng Gu, Xiang An et al.
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu, Linchao Zhu, Yi Yang
NightHaze: Nighttime Image Dehazing via Self-Prior Learning
Beibei Lin, Yeying Jin, Yan Wending et al.
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo, Florian Eddie Dorner, Moritz Hardt
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen, Xiaojie Xu, Wenbo Li et al.
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang, Bang Liu
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li, Janardhan Kulkarni, Ishai Menache et al.
Specialized Foundation Models Struggle to Beat Supervised Baselines
Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng, Wei Zhang, Shiwei Zhang et al.
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Belinda Li, Been Kim, Zi Wang
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao, Yifan Shen, Bolin Lai et al.
Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao, Yu Yang, Yonggan Fu et al.
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training
Yunwei Lan, Zhigao Cui, Chang Liu et al.
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li et al.
RouteLLM: Learning to Route LLMs from Preference Data
Isaac Ong, Amjad Almahairi, Vincent Wu et al.
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto, Pierluca D'Oro, Amy Zhang et al.
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie, Jie chen, Liyu Chen et al.
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang, Chengqi Duan, Kun Wang et al.
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
Kaiwen Zha, Zhengqi Gao, Maohao Shen et al.
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Taekyung Ki, Dongchan Min, Gyeongsu Chae
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
Yuzhou Ji, He Zhu, Junshu Tang et al.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
Zhefei Gong, Pengxiang Ding, Shangke Lyu et al.
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Muye Huang, Han Lai, Xinyu Zhang et al.
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Yuhang Li, Ruokai Yin, Donghyun Lee et al.
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
Wanquan Feng, Jiawei Liu, Pengqi Tu et al.
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Mintong Kang, Chejian Xu, Bo Li
OpenVIS: Open-vocabulary Video Instance Segmentation
Pinxue Guo, Hao Huang, Peiyang He et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri, Nicolas Boulle
AnimateAnything: Consistent and Controllable Animation for Video Generation
guojun lei, Chi Wang, Rong Zhang et al.
Diverse Preference Learning for Capabilities and Alignment
Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell
Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang et al.
Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
Boyan Li, Jiayi Zhang, Ju Fan et al.
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi, Hritik Bansal, Arian Hosseini et al.
From Language Models over Tokens to Language Models over Characters
Tim Vieira, Benjamin LeBrun, Mario Giulianelli et al.
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion
Enrico Ventura, Beatrice Achilli, Gianluigi Silvestri et al.
Population Transformer: Learning Population-level Representations of Neural Activity
Geeling Chau, Christopher Wang, Sabera Talukder et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers
Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat et al.
Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning
Melanie Sclar, Jane Dwivedi-Yu, Maryam Fazel-Zarandi et al.
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade, Puyuan Peng, David Harwath
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang, Junhong Wu, Chen Wang et al.
A Simple Model of Inference Scaling Laws
Noam Levi
PhysAnimator: Physics-Guided Generative Cartoon Animation
Tianyi Xie, Yiwei Zhao, Ying Jiang et al.
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota, Sheridan Feucht, Samuel Marks et al.
Language Models are Advanced Anonymizers
Robin Staab, Mark Vero, Mislav Balunovic et al.
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
Hanna Wallach, Meera Desai, A. Feder Cooper et al.
Oscillatory State-Space Models
T. Konstantin Rusch, Daniela Rus
Heavy-Tailed Diffusion Models
Kushagra Pandey, Jaideep Pathak, Yilun Xu et al.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
Zicheng Zhang, Tengchuan Kou, Chunyi Li et al.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Qiang Liu, Mengyu Chu, Nils Thuerey
Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images
Tianhao Wu, Chuanxia Zheng, Frank Guan et al.
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee et al.
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
Transformers are Universal In-context Learners
Takashi Furuya, Maarten V de Hoop, Gabriel Peyré
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov, Roman Garipov, Alina Shutova et al.
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.
Grounding Video Models to Actions through Goal Conditioned Exploration
Yunhao Luo, Yilun Du
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou et al.
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang, Qizhou Wang, Zhuo Huang et al.
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.
Mastering Board Games by External and Internal Planning with Language Models
John Schultz, Jakub Adamek, Matej Jusup et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du, Zhineng Chen, Hongtao Xie et al.