Most Cited 2025 "ultra-low bit-widths" Papers

22,274 papers found • Page 5 of 112

#801

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456
34
citations
#802

YOLOE: Real-Time Seeing Anything

Ao Wang, Lihao Liu, Hui Chen et al.

ICCV 2025posterarXiv:2503.07465
34
citations
#803

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025posterarXiv:2406.03136
34
citations
#804

Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt

ICML 2025posterarXiv:2502.14010
34
citations
#805

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025posterarXiv:2410.06912
34
citations
#806

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025posterarXiv:2409.02076
34
citations
#807

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025posterarXiv:2502.14831
34
citations
#808

Think while You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell et al.

ICLR 2025posterarXiv:2410.06264
34
citations
#809

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025posterarXiv:2410.13722
34
citations
#810

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
34
citations
#811

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Shengji Tang, Weicai Ye, Peng Ye et al.

ICLR 2025posterarXiv:2410.06245
34
citations
#812

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
33
citations
#813

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025posterarXiv:2402.04236
33
citations
#814

The Diffusion Duality

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.

ICML 2025posterarXiv:2506.10892
33
citations
#815

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025posterarXiv:2404.10775
33
citations
#816

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Hui Zhang, Dexiang Hong, Yitong Wang et al.

ICCV 2025posterarXiv:2412.03859
33
citations
#817

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025posterarXiv:2406.04391
33
citations
#818

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki et al.

COLM 2025paperarXiv:2503.00177
33
citations
#819

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025posterarXiv:2406.07072
33
citations
#820

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025posterarXiv:2505.14489
33
citations
#821

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025posterarXiv:2409.07402
33
citations
#822

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.

ICLR 2025posterarXiv:2501.19309
33
citations
#823

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.

AAAI 2025paperarXiv:2402.05813
33
citations
#824

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347
33
citations
#825

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143
33
citations
#826

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025paperarXiv:2406.12692
33
citations
#827

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025posterarXiv:2503.03663
33
citations
#828

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025posterarXiv:2501.04689
33
citations
#829

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#830

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025posterarXiv:2410.13638
33
citations
#831

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
33
citations
#832

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations
#833

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.

ICLR 2025posterarXiv:2405.14297
33
citations
#834

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078
33
citations
#835

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025posterarXiv:2502.20694
33
citations
#836

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

ICLR 2025posterarXiv:2410.12112
33
citations
#837

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Zhe Li, Weihao Yuan, Yisheng He et al.

ICLR 2025posterarXiv:2410.07093
33
citations
#838

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NEURIPS 2025oralarXiv:2505.12434
33
citations
#839

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.

ICML 2025posterarXiv:2502.01951
33
citations
#840

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826
32
citations
#841

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight
32
citations
#842

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Andreas Auer, Patrick Podest, Daniel Klotz et al.

NEURIPS 2025posterarXiv:2505.23719
32
citations
#843

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Yang Liu, Chuanchen Luo, Zhongkai Mao et al.

ICLR 2025posterarXiv:2411.00771
32
citations
#844

Retrieval-Augmented Generation with Conflicting Evidence

Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.

COLM 2025paperarXiv:2504.13079
32
citations
#845

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.

ICLR 2025posterarXiv:2412.10319
32
citations
#846

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025posterarXiv:2410.04265
32
citations
#847

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD

ICML 2025posterarXiv:2412.17780
32
citations
#848

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025posterarXiv:2410.13640
32
citations
#849

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025poster
32
citations
#850

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025posterarXiv:2502.21079
32
citations
#851

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025posterarXiv:2412.10958
32
citations
#852

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025posterarXiv:2406.06526
32
citations
#853

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025posterarXiv:2410.07348
32
citations
#854

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025posterarXiv:2406.02720
32
citations
#855

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025posterarXiv:2410.05440
32
citations
#856

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025posterarXiv:2411.19772
32
citations
#857

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025posterarXiv:2412.04472
32
citations
#858

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Hengrui Kang, Siwei Wen, Zichen Wen et al.

ICCV 2025highlightarXiv:2503.15264
32
citations
#859

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025posterarXiv:2406.10210
32
citations
#860

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321
32
citations
#861

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Kunhao Zheng, Juliette Decugis, Jonas Gehring et al.

ICLR 2025posterarXiv:2410.08105
32
citations
#862

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

HUAKUN LUO, Haixu Wu, Hang Zhou et al.

ICML 2025posterarXiv:2502.02414
32
citations
#863

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emily Cheng, Diego Doimo, Corentin Kervadec et al.

ICLR 2025posterarXiv:2405.15471
32
citations
#864

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025posterarXiv:2505.11475
32
citations
#865

Spike No More: Stabilizing the Pre-training of Large Language Models

Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.

COLM 2025paperarXiv:2312.16903
31
citations
#866

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
31
citations
#867

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537
31
citations
#868

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025posterarXiv:2407.14414
31
citations
#869

Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions

Saffron Huang, Esin DURMUS, Kunal Handa et al.

COLM 2025paper
31
citations
#870

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

COLM 2025paperarXiv:2502.20379
31
citations
#871

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025posterarXiv:2411.18466
31
citations
#872

MAT-Agent: Adaptive Multi-Agent Training Optimization

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025posterarXiv:2510.17845
31
citations
#873

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227
31
citations
#874

McEval: Massively Multilingual Code Evaluation

Linzheng Chai, Shukai Liu, Jian Yang et al.

ICLR 2025posterarXiv:2406.07436
31
citations
#875

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
31
citations
#876

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025posterarXiv:2412.06016
31
citations
#877

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
31
citations
#878

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Alireza Rezazadeh, Zichao Li, Wei Wei et al.

ICLR 2025posterarXiv:2410.14052
31
citations
#879

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Haohan Weng, Yikai Wang, Tong Zhang et al.

ICLR 2025posterarXiv:2405.16890
31
citations
#880

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496
31
citations
#881

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025posterarXiv:2501.03218
31
citations
#882

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, Feng Chen et al.

NEURIPS 2025posterarXiv:2407.21243
31
citations
#883

VCA: Video Curious Agent for Long Video Understanding

Zeyuan Yang, Delin Chen, Xueyang Yu et al.

ICCV 2025posterarXiv:2412.10471
31
citations
#884

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Will Merrill, Ashish Sabharwal

NEURIPS 2025posterarXiv:2503.03961
31
citations
#885

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li, Yuedong Zhong et al.

ICLR 2025posterarXiv:2408.01803
31
citations
#886

Gramian Multimodal Representation Learning and Alignment

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo et al.

ICLR 2025posterarXiv:2412.11959
31
citations
#887

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025posterarXiv:2410.14273
31
citations
#888

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025posterarXiv:2410.13509
31
citations
#889

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan et al.

ICLR 2025posterarXiv:2502.17258
31
citations
#890

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.00447
31
citations
#891

Number Cookbook: Number Understanding of Language Models and How to Improve It

Haotong Yang, Yi Hu, Shijia Kang et al.

ICLR 2025posterarXiv:2411.03766
31
citations
#892

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025posterarXiv:2410.14919
31
citations
#893

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Yuheng Zhang, Dian Yu, Baolin Peng et al.

ICLR 2025posterarXiv:2407.00617
31
citations
#894

Spurious Forgetting in Continual Learning of Language Models

Junhao Zheng, Xidi Cai, Shengjie Qiu et al.

ICLR 2025posterarXiv:2501.13453
30
citations
#895

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif et al.

ICLR 2025posterarXiv:2410.05864
30
citations
#896

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535
30
citations
#897

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025posterarXiv:2407.13766
30
citations
#898

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song et al.

ICCV 2025posterarXiv:2501.15891
30
citations
#899

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

Asad Aali, Giannis Daras, Brett Levac et al.

ICLR 2025posterarXiv:2403.08728
30
citations
#900

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang, Liu Liu, Tianheng Cheng et al.

CVPR 2025posterarXiv:2412.13193
30
citations
#901

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye et al.

ICLR 2025poster
30
citations
#902

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NEURIPS 2025spotlightarXiv:2505.14460
30
citations
#903

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025posterarXiv:2411.04923
30
citations
#904

Self-Boosting Large Language Models with Synthetic Preference Data

Qingxiu Dong, Li Dong, Xingxing Zhang et al.

ICLR 2025posterarXiv:2410.06961
30
citations
#905

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality

Ying Jin, Zhimei Ren, Zhuoran Yang et al.

NEURIPS 2025posterarXiv:2212.09900
30
citations
#906

ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement

Mengqi Lei, Haochen Wu, Xinhua Lv et al.

AAAI 2025paperarXiv:2412.08345
30
citations
#907

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations
#908

Empowering LLMs to Understand and Generate Complex Vector Graphics

XiMing Xing, Juncheng Hu, Guotao Liang et al.

CVPR 2025posterarXiv:2412.11102
30
citations
#909

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.

NEURIPS 2025spotlightarXiv:2507.18624
30
citations
#910

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203
30
citations
#911

A New Perspective on Shampoo's Preconditioner

Depen Morwani, Itai Shapira, Nikhil Vyas et al.

ICLR 2025posterarXiv:2406.17748
30
citations
#912

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025posterarXiv:2506.05573
30
citations
#913

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Jiacheng Chen, Tianhao Liang, Sherman Siu et al.

ICLR 2025posterarXiv:2410.10563
30
citations
#914

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025posterarXiv:2502.06787
30
citations
#915

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025posterarXiv:2405.17013
30
citations
#916

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations
#917

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Tao Liu, Kai Wang, Senmao Li et al.

ICLR 2025posterarXiv:2501.13554
30
citations
#918

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Bolian Li, Yifan Wang, Anamika Lochab et al.

COLM 2025paperarXiv:2406.16306
30
citations
#919

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025posterarXiv:2312.11556
30
citations
#920

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Zhe Kong, Feng Gao, Yong Zhang et al.

NEURIPS 2025posterarXiv:2505.22647
30
citations
#921

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

ICLR 2025posterarXiv:2412.13337
30
citations
#922

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025posterarXiv:2406.05132
30
citations
#923

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang et al.

ICLR 2025posterarXiv:2410.03355
30
citations
#924

Herald: A Natural Language Annotated Lean 4 Dataset

Guoxiong Gao, Yutong Wang, Jiedong Jiang et al.

ICLR 2025posterarXiv:2410.10878
30
citations
#925

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

ICCV 2025posterarXiv:2504.16080
30
citations
#926

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Ziteng Wang, Jun Zhu, Jianfei Chen

ICLR 2025posterarXiv:2412.14711
29
citations
#927

Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron

Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.

ICLR 2025poster
29
citations
#928

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao et al.

ICLR 2025posterarXiv:2412.13630
29
citations
#929

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

Yuchen Lin, Chenguo Lin, Jianjin Xu et al.

ICLR 2025posterarXiv:2501.18982
29
citations
#930

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025posterarXiv:2411.11921
29
citations
#931

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

Andre Cornman, Jacob West-Roberts, Antonio Camargo et al.

ICLR 2025poster
29
citations
#932

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.

NEURIPS 2025posterarXiv:2506.09038
29
citations
#933

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

ICLR 2025posterarXiv:2409.11242
29
citations
#934

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li et al.

ICLR 2025posterarXiv:2501.18616
29
citations
#935

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

NEURIPS 2025posterarXiv:2502.00234
29
citations
#936

On Large Language Model Continual Unlearning

Chongyang Gao, Lixu Wang, Kaize Ding et al.

ICLR 2025posterarXiv:2407.10223
29
citations
#937

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025posterarXiv:2504.03886
29
citations
#938

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025posterarXiv:2407.14207
29
citations
#939

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Xiaojuan Wang, Boyang Zhou, Brian Curless et al.

ICLR 2025posterarXiv:2408.15239
29
citations
#940

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Siwei Wen, junyan ye, Peilin Feng et al.

NEURIPS 2025posterarXiv:2503.14905
29
citations
#941

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NEURIPS 2025posterarXiv:2506.01300
29
citations
#942

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Navve Wasserman, Noam Rotstein, Roy Ganz et al.

CVPR 2025posterarXiv:2404.18212
29
citations
#943

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025posterarXiv:2503.07135
29
citations
#944

MoDeGPT: Modular Decomposition for Large Language Model Compression

Chi-Heng Lin, Shangqian Gao, James Smith et al.

ICLR 2025posterarXiv:2408.09632
29
citations
#945

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Jixun Yao, Hexin Liu, CHEN CHEN et al.

ICLR 2025posterarXiv:2502.02942
29
citations
#946

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences

Chenyang Zhu, Kai Li, Yue Ma et al.

ICLR 2025posterarXiv:2412.01197
29
citations
#947

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Yepeng Liu, Yiren Song, Hai Ci et al.

ICLR 2025posterarXiv:2410.05470
29
citations
#948

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Zhihe Yang, Xufang Luo, Dongqi Han et al.

CVPR 2025posterarXiv:2501.09695
29
citations
#949

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025posterarXiv:2503.02424
29
citations
#950

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025posterarXiv:2502.07411
29
citations
#951

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Weifeng Lin, Xinyu Wei, Ruichuan An et al.

NEURIPS 2025posterarXiv:2506.05302
29
citations
#952

FlatQuant: Flatness Matters for LLM Quantization

Yuxuan Sun, Ruikang Liu, Haoli Bai et al.

ICML 2025posterarXiv:2410.09426
29
citations
#953

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.

ICLR 2025posterarXiv:2410.08182
29
citations
#954

Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting

Siru Zhong, Weilin Ruan, Ming Jin et al.

ICML 2025oralarXiv:2502.04395
29
citations
#955

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Hanyu Wang, Saksham Suri, Yixuan Ren et al.

ICLR 2025posterarXiv:2410.21264
29
citations
#956

The dark side of the forces: assessing non-conservative force models for atomistic machine learning

Filippo Bigi, Marcel Langer, Michele Ceriotti

ICML 2025oralarXiv:2412.11569
29
citations
#957

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Vishwesh Nath, Wenqi Li, Dong Yang et al.

CVPR 2025highlightarXiv:2411.12915
29
citations
#958

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Sheng Liu, Haotian Ye, James Y Zou

ICLR 2025poster
29
citations
#959

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025posterarXiv:2501.04686
29
citations
#960

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

Xuan Liu, Jie ZHANG, HaoYang Shang et al.

ICLR 2025posterarXiv:2405.14744
29
citations
#961

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.

ICLR 2025posterarXiv:2406.17216
29
citations
#962

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025posterarXiv:2412.05276
29
citations
#963

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.

ICLR 2025posterarXiv:2410.16251
29
citations
#964

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Weibo Gao, Qi Liu, Linan Yue et al.

AAAI 2025paperarXiv:2501.10332
29
citations
#965

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia et al.

ICCV 2025highlightarXiv:2503.16418
29
citations
#966

Competing Large Language Models in Multi-Agent Gaming Environments

Jen-Tse Huang, Eric John Li, Man Ho LAM et al.

ICLR 2025poster
28
citations
#967

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025posterarXiv:2502.12892
28
citations
#968

From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Etowah Adams, Liam Bai, Minji Lee et al.

ICML 2025spotlight
28
citations
#969

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng, Elvis Dohmatob, Pu Yang et al.

ICLR 2025posterarXiv:2406.07515
28
citations
#970

Denoising Autoregressive Transformers for Scalable Text-to-Image Generation

Jiatao Gu, Yuyang Wang, Yizhe Zhang et al.

ICLR 2025posterarXiv:2410.08159
28
citations
#971

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
28
citations
#972

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025posterarXiv:2503.21483
28
citations
#973

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations
#974

Long-Context State-Space Video World Models

Ryan Po, Yotam Nitzan, Richard Zhang et al.

ICCV 2025posterarXiv:2505.20171
28
citations
#975

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport

Zhenyi Zhang, Tiejun Li, Peijie Zhou

ICLR 2025posterarXiv:2410.00844
28
citations
#976

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025oralarXiv:2410.20502
28
citations
#977

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025posterarXiv:2503.17673
28
citations
#978

Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Haochen Wang, Yucheng Zhao, Tiancai Wang et al.

ICCV 2025posterarXiv:2504.01901
28
citations
#979

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.

ICLR 2025posterarXiv:2501.13468
28
citations
#980

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
28
citations
#981

CViT: Continuous Vision Transformer for Operator Learning

Sifan Wang, Jacob Seidman, Shyam Sankaran et al.

ICLR 2025oralarXiv:2405.13998
28
citations
#982

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Yuxiao Hu, Qian Li, Dongxiao Zhang et al.

ICLR 2025posterarXiv:2501.03747
28
citations
#983

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025posterarXiv:2410.07815
28
citations
#984

Graphic Design with Large Multimodal Model

Yutao Cheng, Zhao Zhang, Maoke Yang et al.

AAAI 2025paperarXiv:2404.14368
28
citations
#985

Diffusion-based Neural Network Weights Generation

Bedionita Soro, Bruno Andreis, Hayeon Lee et al.

ICLR 2025posterarXiv:2402.18153
28
citations
#986

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025posterarXiv:2501.04931
28
citations
#987

Can Large Language Models Understand Symbolic Graphics Programs?

Zeju Qiu, Weiyang Liu, Haiwen Feng et al.

ICLR 2025posterarXiv:2408.08313
28
citations
#988

Overtrained Language Models Are Harder to Fine-Tune

Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.

ICML 2025posterarXiv:2503.19206
28
citations
#989

RUN: Reversible Unfolding Network for Concealed Object Segmentation

Chunming He, Rihan Zhang, Fengyang Xiao et al.

ICML 2025posterarXiv:2501.18783
28
citations
#990

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Lecheng Kong, Jiarui Feng, Hao Liu et al.

ICLR 2025posterarXiv:2407.09709
28
citations
#991

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Zemin Huang, Zhiyang Chen, Zijun Wang et al.

NEURIPS 2025posterarXiv:2505.10446
28
citations
#992

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhiyuan Zhou, Andy Peng, Qiyang Li et al.

ICLR 2025posterarXiv:2412.07762
28
citations
#993

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460
28
citations
#994

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang et al.

ICML 2025posterarXiv:2412.18605
28
citations
#995

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu, Wentong Li, Song Wang et al.

CVPR 2025highlightarXiv:2503.00513
28
citations
#996

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025posterarXiv:2501.09781
28
citations
#997

DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding

Geng Li, Jinglin Xu, Yunzhen Zhao et al.

CVPR 2025highlightarXiv:2504.14920
28
citations
#998

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
28
citations
#999

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NEURIPS 2025posterarXiv:2503.20762
28
citations
#1000

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.

NEURIPS 2025spotlightarXiv:2505.11423
28
citations