Most Cited 2025 "neural representations" Papers
22,274 papers found • Page 6 of 112
Conference
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Zhongwei Ren, Yunchao Wei, Xun Guo et al.
Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai et al.
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu, Weiyang Liu, Haiwen Feng et al.
Diffusion-based Neural Network Weights Generation
Bedionita Soro, Bruno Andreis, Hayeon Lee et al.
Improving Uncertainty Estimation through Semantically Diverse Language Generation
Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi et al.
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu, Liwei Jiang, Yejin Choi
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula, Shuo Li, Botong Zhang et al.
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
Anian Ruoss, Fabio Pardo, Harris Chan et al.
When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline
Ming Li, Yongchun Gu, Yi Wang et al.
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
Daniel Marczak, Simone Magistri, Sebastian Cygert et al.
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng, Yun Xing, Zesen Cheng et al.
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu, Yichen Zhu, Ning Liu et al.
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
Zhongyi Shui, Jianpeng Zhang, Weiwei Cao et al.
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Ge Wu, Shen Zhang, Ruijing Shi et al.
Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan et al.
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas Zollo, Andrew Siah, Naimeng Ye et al.
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
Ziyue Li, Tianyi Zhou
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Hyunwoo Park, Gun Ryu, Wonjun Kim
Light3R-SfM: Towards Feed-forward Structure-from-Motion
Sven Elflein, Qunjie Zhou, Laura Leal-Taixe
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning
Yiqun Chen, Lingyong Yan, Weiwei Sun et al.
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Zhongxing Xu, Chengzhi Liu, Qingyue Wei et al.
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux, Caglar Gulcehre
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng et al.
Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning
Hao Chen, Jiaming Liu, Chenyang Gu et al.
Fast Feedforward 3D Gaussian Splatting Compression
Yihang Chen, Qianyi Wu, Mengyao Li et al.
Perception-Guided Jailbreak Against Text-to-Image Models
Yihao Huang, Le Liang, Tianlin Li et al.
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
Mingzhen Sun, Weining Wang, Li et al.
Erasing Undesirable Influence in Diffusion Models
Jing Wu, Trung Le, Munawar Hayat et al.
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
Peiyan Li, Yixiang Chen, Hongtao Wu et al.
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen, Guoan Wang, Yuanfeng Ji et al.
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan, Haozheng Luo, Manling Li et al.
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu, Aojun Zhou, Ke Wang et al.
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
Yongliang Wu, Zonghui Li, Xinting Hu et al.
Estimating Body and Hand Motion in an Ego‑sensed World
Brent Yi, Vickie Ye, Maya Zheng et al.
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin, Siyuan Cen, Daniel Jiang et al.
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenda Xu, Rujun Han, Zifeng Wang et al.
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Zhicheng YANG, Yiwei Wang, Yinya Huang et al.
Theoretical Benefit and Limitation of Diffusion Language Model
Guhao Feng, Yihan Geng, Jian Guan et al.
Chain-of-Retrieval Augmented Generation
Liang Wang, Haonan Chen, Nan Yang et al.
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song, Muxi Diao, Guanting Dong et al.
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
Cheng Sun, Jaesung Choe, Charles Loop et al.
How to build a consistency model: Learning flow maps via self-distillation
Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Junlong Cheng, Bin Fu, Jin Ye et al.
Language-Image Models with 3D Understanding
Jang Hyun Cho, Boris Ivanovic, Yulong Cao et al.
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Lichen Bai, Shitong Shao, zikai zhou et al.
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
Muhammad Gohar Javed, chuan guo, Li Cheng et al.
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur, Ekdeep S Lubana, Thomas Fel et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu, Taesung Park, Richard Zhang et al.
MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
Qingming LIU, Yuan Liu, Jiepeng Wang et al.
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi, Peihao Chen, Junyan Li et al.
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang, Yinheng Li, Dan Zhao et al.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang, qiuyu Huang, Junjie Liu et al.
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou et al.
What Makes a Good Diffusion Planner for Decision Making?
Haofei Lu, Dongqi Han, Yifei Shen et al.
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jian Jia, Yipei Wang, Yan Li et al.
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
Ziyang Wu, Tianjiao Ding, Yifu Lu et al.
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Hongxiang Li, Yaowei Li, Yuhang Yang et al.
DeFoG: Discrete Flow Matching for Graph Generation
Yiming Qin, Manuel Madeira, Dorina Thanou et al.
Steering Large Language Models between Code Execution and Textual Reasoning
Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma et al.
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou, Haote Yang, Dairong Chen et al.
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Hongzhi Huang, Defa Zhu, Banggu Wu et al.
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li et al.
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu, Shengran Hu, Jeff Clune
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
PhysGen3D: Crafting a Miniature Interactive World from a Single Image
Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.
DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance
Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.
The Superposition of Diffusion Models Using the Itô Density Estimator
Marta Skreta, Lazar Atanackovic, Joey Bose et al.
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang SHEN, Yue Li, Desong Meng et al.
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
Hongkai Zheng, Wenda Chu, Bingliang Zhang et al.
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin, Xinyu Wei, Renrui Zhang et al.
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
Dominik Grimm, Jonathan Pirnay
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Yuqi Wu, Wenzhao Zheng, Jie Zhou et al.
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang, Zimeng Huang, Yijia Fan et al.
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le et al.
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
Yiren Song, Danze Chen, Mike Zheng Shou
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
ICLR: In-Context Learning of Representations
Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana et al.
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
Bojia Zi, Penghui Ruan, Marco Chen et al.
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Anja Šurina, Amin Mansouri, Lars C.P.M. Quaedvlieg et al.
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
FineVQ: Fine-Grained User Generated Content Video Quality Assessment
Huiyu Duan, Qiang Hu, Wang Jiarui et al.
Weight ensembling improves reasoning in language models
Xingyu Dang, Christina Baek, Kaiyue Wen et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
Epona: Autoregressive Diffusion World Model for Autonomous Driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava, Andrea Tagarelli
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.
Multi-Agent Collaboration via Evolving Orchestration
Yufan Dang, Chen Qian, Xueheng Luo et al.
Moral Alignment for LLM Agents
Elizaveta Tennant, Stephen Hailes, Mirco Musolesi
Understanding Factual Recall in Transformers via Associative Memories
Eshaan Nichani, Jason Lee, Alberto Bietti
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Cong Chen, Mingyu Liu, Chenchen Jing et al.
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Jarrid Rector-Brooks, Mohsin Hasan, Zhangzhi Peng et al.
Adversarial Search Engine Optimization for Large Language Models
Fredrik Nestaas, Edoardo Debenedetti, Florian Tramer
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Zewei Zhang, Huan Liu, Jun Chen et al.
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu, Jintong Li, Yicheng Jiang et al.
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Hyun Ryu, Gyeongman Kim, Hyemin S. Lee et al.
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case
Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Dongping Chen, Yue Huang, Siyuan Wu et al.
Interleaved-Modal Chain-of-Thought
Jun Gao, Yongqi Li, Ziqiang Cao et al.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Hanlin Wang, Hao Ouyang, Qiuyu Wang et al.
ResearchTown: Simulator of Human Research Community
Haofei Yu, Zhaochen Hong, Zirui Cheng et al.
MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding
Rongchang Xie, Chen Du, Ping Song et al.
Grounded Reinforcement Learning for Visual Reasoning
Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra et al.
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita et al.
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie, Zhizheng Liu, Zhenghao Peng et al.
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou et al.
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
haiyang liu, Xingchao Yang, Tomoya Akiyama et al.
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Zizheng Pan, Bohan Zhuang, De-An Huang et al.
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Bowen Chen, Brynn zhao, Haomiao Sun et al.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li, Lu Yin, Shiwei Liu
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
Lue Fan, Hao ZHANG, Qitai Wang et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Wei Pang, Kevin Qinghong Lin, Xiangru Jian et al.
MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang et al.
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich et al.
Can LLMs Solve Longer Math Word Problems Better?
Xin Xu, Tong Xiao, Zitong Chao et al.
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu, Qiyun Xu, Tong Xiao et al.
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Zihan Zheng, Zerui Cheng, Zeyu Shen et al.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Lazar Atanackovic, Xi (Nicole) Zhang, Brandon Amos et al.
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
Shenghai Yuan, Xianyi He, Yufan Deng et al.
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Belinda Mo, Kyssen Yu, Joshua Kazdan et al.
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation
Xiaofeng Wang, Kang Zhao, Feng Liu et al.
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Yun Liu, Chengwen Zhang, Ruofan Xing et al.
AffordDP: Generalizable Diffusion Policy with Transferable Affordance
Shijie Wu, Yihang Zhu, Yunao Huang et al.
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang et al.
Diffusion Beats Autoregressive in Data-Constrained Settings
Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
Hyper-Connections
Defa Zhu, Hongzhi Huang, Zihao Huang et al.
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Chengwen Qi, Ren Ma, Bowen Li et al.
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang, Chengzhi (Martin) Hu, Paul Röttger et al.
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen, Gehui Li, Rongyuan Wu et al.
An Intelligent Agentic System for Complex Image Restoration Problems
Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu, Chen Jia, Fan Shi et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu, Xian Tao, Xinyi Gong et al.
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
Wenhao Zheng, Yixiao Chen, Weitong Zhang et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato, Sindy Löwe, Andreas Geiger et al.
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng, Tianyu Pang, Chao Du et al.
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
Sichen Zhu, Yuchen Zhu, Molei Tao et al.
Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Results of the Big ANN: NeurIPS’23 competition
Harsha Vardhan simhadri, Martin Aumüller, Matthijs Douze et al.
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Shiyuan Zhang, Weitong Zhang, Quanquan Gu
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
Shivam Gupta, Linda Cai, Sitan Chen
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Xiao Li, Wenxuan Sun, Huanran Chen et al.
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing, Qi Dai, Zejia Weng et al.
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen, Xiaozhi Wang, Zijun Yao et al.
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun yuan, Yaling Shen et al.
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Jingyang Zhang, Jingwei Sun, Eric Yeats et al.
Self-Adapting Language Models
Adam Zweiger, Jyo Pari, Han Guo et al.
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent, Kyle Hsu, Justin Johnson et al.
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
Specialized Foundation Models Struggle to Beat Supervised Baselines
Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong
RouteLLM: Learning to Route LLMs from Preference Data
Isaac Ong, Amjad Almahairi, Vincent Wu et al.
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw, Shivam Singhal, Anca Dragan
Reward Guided Latent Consistency Distillation
William Wang, Jiachen Li, Weixi Feng et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang, Chengqi Duan, Kun Wang et al.
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang, Zeyu Lu et al.
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang, Ziwei Zheng, Boxu Chen et al.
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ZIYU ZHU, Xilin Wang, Yixuan Li et al.
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov et al.
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian Eddie Dorner, Vivian Nastl, Moritz Hardt