Most Cited 2025 "broker modality" Papers

21,856 papers found • Page 4 of 110

#601

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.

ICLR 2025posterarXiv:2411.04986
39
citations
#602

Diffusion Feedback Helps CLIP See Better

Wenxuan Wang, Quan Sun, Fan Zhang et al.

ICLR 2025posterarXiv:2407.20171
39
citations
#603

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#604

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
39
citations
#605

An Architecture Search Framework for Inference-Time Techniques

Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.

ICML 2025posterarXiv:2409.15254
39
citations
#606

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025posterarXiv:2501.09466
39
citations
#607

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Kexun Zhang, Weiran Yao, Zuxin Liu et al.

ICLR 2025posterarXiv:2408.07060
39
citations
#608

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025posterarXiv:2410.18252
39
citations
#609

EG4D: Explicit Generation of 4D Object without Score Distillation

Qi Sun, Zhiyang Guo, Ziyu Wan et al.

ICLR 2025oralarXiv:2405.18132
39
citations
#610

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations
#611

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations
#612

Scaling Language-Free Visual Representation Learning

David Fan, Shengbang Tong, Jiachen Zhu et al.

ICCV 2025highlightarXiv:2504.01017
39
citations
#613

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025posterarXiv:2408.08345
38
citations
#614

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025posterarXiv:2405.18428
38
citations
#615

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025posterarXiv:2406.05285
38
citations
#616

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Alexander Mai, Peter Hedman, George Kopanas et al.

ICCV 2025posterarXiv:2410.01804
38
citations
#617

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Yiming Wang, Pei Zhang, Siyuan Huang et al.

NEURIPS 2025spotlightarXiv:2503.01422
38
citations
#618

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen et al.

ICLR 2025posterarXiv:2411.02272
38
citations
#619

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang et al.

ICLR 2025posterarXiv:2501.16764
38
citations
#620

Agents' Room: Narrative Generation through Multi-step Collaboration

Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.

ICLR 2025posterarXiv:2410.02603
38
citations
#621

Strong Model Collapse

Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.

ICLR 2025posterarXiv:2410.04840
38
citations
#622

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
38
citations
#623

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
38
citations
#624

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
38
citations
#625

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025posterarXiv:2412.03558
38
citations
#626

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.

ICLR 2025posterarXiv:2412.00876
38
citations
#627

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Zecheng Li, Wengang Zhou, Weichao Zhao et al.

ICLR 2025posterarXiv:2501.15187
38
citations
#628

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.

ICLR 2025posterarXiv:2411.04989
38
citations
#629

Scaling RL to Long Videos

Yukang Chen, Wei Huang, Baifeng Shi et al.

NEURIPS 2025posterarXiv:2507.07966
38
citations
#630

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai, Haotian Xu, Xing W et al.

NEURIPS 2025poster
38
citations
#631

Watermark Anything With Localized Messages

Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.

ICLR 2025posterarXiv:2411.07231
38
citations
#632

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

Yu Liu, Baoxiong Jia, Ruijie Lu et al.

ICLR 2025posterarXiv:2502.19459
38
citations
#633

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
38
citations
#634

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025posterarXiv:2411.17698
38
citations
#635

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.

ICCV 2025posterarXiv:2503.17973
38
citations
#636

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Li Hu, wang yuan, Zhen Shen et al.

ICCV 2025posterarXiv:2502.06145
38
citations
#637

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025posterarXiv:2501.05510
37
citations
#638

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025posterarXiv:2411.18263
37
citations
#639

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025posterarXiv:2411.16816
37
citations
#640

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

ICCV 2025posterarXiv:2408.11447
37
citations
#641

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

Zhen Zhang, Xuehai He, Weixiang Yan et al.

NEURIPS 2025posterarXiv:2505.15778
37
citations
#642

Real-Time Execution of Action Chunking Flow Policies

Kevin Black, Manuel Galliker, Sergey Levine

NEURIPS 2025oralarXiv:2506.07339
37
citations
#643

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Shanchuan Lin, Ceyuan Yang, Hao He et al.

NEURIPS 2025posterarXiv:2506.09350
37
citations
#644

PaPaGei: Open Foundation Models for Optical Physiological Signals

Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.

ICLR 2025posterarXiv:2410.20542
37
citations
#645

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.

ICLR 2025posterarXiv:2407.02687
37
citations
#646

On Scaling Up 3D Gaussian Splatting Training

Hexu Zhao, Haoyang Weng, Daohan Lu et al.

ICLR 2025posterarXiv:2406.18533
37
citations
#647

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li et al.

ICLR 2025posterarXiv:2412.15606
37
citations
#648

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.

ICLR 2025posterarXiv:2502.03638
37
citations
#649

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025posterarXiv:2409.07431
37
citations
#650

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
37
citations
#651

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
37
citations
#652

MoH: Multi-Head Attention as Mixture-of-Head Attention

Peng Jin, Bo Zhu, Li Yuan et al.

ICML 2025posterarXiv:2410.11842
37
citations
#653

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025posterarXiv:2503.22738
37
citations
#654

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025posterarXiv:2412.01818
37
citations
#655

Variational Best-of-N Alignment

Afra Amini, Tim Vieira, Elliott Ash et al.

ICLR 2025posterarXiv:2407.06057
37
citations
#656

Large Language Models Assume People are More Rational than We Really are

Ryan Liu, Jiayi Geng, Joshua Peterson et al.

ICLR 2025posterarXiv:2406.17055
37
citations
#657

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025posterarXiv:2410.21676
37
citations
#658

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Wei Zhao, Pengxiang Ding, Zhang Min et al.

ICLR 2025posterarXiv:2502.13508
37
citations
#659

Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai et al.

ICLR 2025posterarXiv:2408.14690
37
citations
#660

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Patrick Leask, Bart Bussmann, Michael Pearce et al.

ICLR 2025posterarXiv:2502.04878
37
citations
#661

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641
37
citations
#662

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025posterarXiv:2412.13303
36
citations
#663

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025posterarXiv:2504.02259
36
citations
#664

Human-Object Interaction from Human-Level Instructions

Zhen Wu, Jiaman Li, Pei Xu et al.

ICCV 2025posterarXiv:2406.17840
36
citations
#665

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025posterarXiv:2503.23765
36
citations
#666

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Qianhui Wu, Kanzhi Cheng, Rui Yang et al.

NEURIPS 2025posterarXiv:2506.03143
36
citations
#667

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025posterarXiv:2507.02833
36
citations
#668

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Kaifeng Zhao, Gen Li, Siyu Tang

ICLR 2025posterarXiv:2410.05260
36
citations
#669

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
36
citations
#670

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#671

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Dongya Jia, Zhuo Chen, Jiawei Chen et al.

ICML 2025posterarXiv:2502.03930
36
citations
#672

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025posterarXiv:2412.14963
36
citations
#673

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025posterarXiv:2411.15466
36
citations
#674

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations
#675

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Yong Liu, Zirui Zhu, Chaoyu Gong et al.

NEURIPS 2025posterarXiv:2402.15751
36
citations
#676

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NEURIPS 2025posterarXiv:2503.09501
36
citations
#677

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.

ICLR 2025posterarXiv:2407.01725
36
citations
#678

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou et al.

NEURIPS 2025posterarXiv:2501.01895
36
citations
#679

Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations

Yuhao Yang, ZhI JI, Zhaopeng Li et al.

NEURIPS 2025posterarXiv:2503.02453
36
citations
#680

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025posterarXiv:2501.15061
36
citations
#681

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025posterarXiv:2411.16575
36
citations
#682

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025posterarXiv:2412.04384
36
citations
#683

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025posterarXiv:2507.02939
36
citations
#684

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.

ICCV 2025posterarXiv:2503.19757
36
citations
#685

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing et al.

ICCV 2025posterarXiv:2503.05638
35
citations
#686

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

ICLR 2025posterarXiv:2410.04070
35
citations
#687

Sequential Controlled Langevin Diffusions

Junhua Chen, Lorenz Richter, Julius Berner et al.

ICLR 2025posterarXiv:2412.07081
35
citations
#688

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.

ICLR 2025oralarXiv:2410.23266
35
citations
#689

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
35
citations
#690

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025posterarXiv:2412.13188
35
citations
#691

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025posterarXiv:2412.09604
35
citations
#692

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025poster
35
citations
#693

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.

NEURIPS 2025posterarXiv:2506.14965
35
citations
#694

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

Rui Pan, Yinwei Dai, Zhihao Zhang et al.

NEURIPS 2025posterarXiv:2504.07891
35
citations
#695

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.

ICLR 2025posterarXiv:2410.19317
35
citations
#696

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Xiao Yu, Baolin Peng, Vineeth Vajipey et al.

ICLR 2025posterarXiv:2410.02052
35
citations
#697

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025posterarXiv:2408.11811
35
citations
#698

Think Only When You Need with Large Hybrid-Reasoning Models

Lingjie Jiang, Xun Wu, Shaohan Huang et al.

NEURIPS 2025posterarXiv:2505.14631
35
citations
#699

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Yiding Jiang, Allan Zhou, Zhili Feng et al.

ICLR 2025posterarXiv:2410.11820
35
citations
#700

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Robert Hönig, Javier Rando, Nicholas Carlini et al.

ICLR 2025posterarXiv:2406.12027
35
citations
#701

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Neta Shaul, Itai Gat, Marton Havasi et al.

ICLR 2025posterarXiv:2412.03487
35
citations
#702

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025posterarXiv:2503.15265
35
citations
#703

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025posterarXiv:2411.16318
34
citations
#704

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025posterarXiv:2412.00578
34
citations
#705

YOLOE: Real-Time Seeing Anything

Ao Wang, Lihao Liu, Hui Chen et al.

ICCV 2025posterarXiv:2503.07465
34
citations
#706

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover

NEURIPS 2025spotlightarXiv:2506.00413
34
citations
#707

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.

ICLR 2025posterarXiv:2406.03136
34
citations
#708

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang, Marta Skreta, Cher-Tian Ser et al.

ICLR 2025posterarXiv:2406.16976
34
citations
#709

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang et al.

ICLR 2025posterarXiv:2410.18079
34
citations
#710

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Yaxi Lu, Shenzhi Yang, Cheng Qian et al.

ICLR 2025posterarXiv:2410.12361
34
citations
#711

Think while You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell et al.

ICLR 2025posterarXiv:2410.06264
34
citations
#712

Persistent Pre-training Poisoning of LLMs

Yiming Zhang, Javier Rando, Ivan Evtimov et al.

ICLR 2025posterarXiv:2410.13722
34
citations
#713

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Shengji Tang, Weicai Ye, Peng Ye et al.

ICLR 2025posterarXiv:2410.06245
34
citations
#714

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
34
citations
#715

Multi-Objective Evolution of Heuristic Using Large Language Model

Shunyu Yao, Fei Liu, Xi Lin et al.

AAAI 2025paperarXiv:2409.16867
34
citations
#716

Robust Autonomy Emerges from Self-Play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.

ICML 2025posterarXiv:2502.03349
34
citations
#717

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian et al.

ICML 2025spotlightarXiv:2503.02819
34
citations
#718

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu et al.

ICML 2025posterarXiv:2502.14831
34
citations
#719

Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt

ICML 2025posterarXiv:2502.14010
34
citations
#720

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025posterarXiv:2401.10232
34
citations
#721

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123
34
citations
#722

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.

ICCV 2025posterarXiv:2503.06923
34
citations
#723

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025posterarXiv:2412.01003
34
citations
#724

Reconstructive Visual Instruction Tuning

Haochen Wang, Anlin Zheng, Yucheng Zhao et al.

ICLR 2025posterarXiv:2410.09575
34
citations
#725

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025posterarXiv:2411.07232
34
citations
#726

Dynamic Diffusion Transformer

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

ICLR 2025posterarXiv:2410.03456
34
citations
#727

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Ruchika Chavhan, Da Li, Timothy Hospedales

ICLR 2025posterarXiv:2405.19237
34
citations
#728

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NEURIPS 2025posterarXiv:2505.14521
34
citations
#729

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang et al.

NEURIPS 2025posterarXiv:2505.17412
34
citations
#730

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
34
citations
#731

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Mintong Kang, Bo Li

ICLR 2025posterarXiv:2407.05557
34
citations
#732

Text4Seg: Reimagining Image Segmentation as Text Generation

Mengcheng Lan, Chaofeng Chen, Yue Zhou et al.

ICLR 2025posterarXiv:2410.09855
34
citations
#733

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025posterarXiv:2410.06912
34
citations
#734

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025posterarXiv:2409.04431
34
citations
#735

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Michael Zhang, W. Bradley Knox, Eunsol Choi

ICLR 2025posterarXiv:2410.13788
34
citations
#736

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng et al.

ICLR 2025posterarXiv:2406.08407
34
citations
#737

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.

ICLR 2025posterarXiv:2409.02076
34
citations
#738

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025posterarXiv:2408.15881
34
citations
#739

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025posterarXiv:2307.16545
34
citations
#740

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
33
citations
#741

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025posterarXiv:2503.06661
33
citations
#742

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025posterarXiv:2503.03663
33
citations
#743

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICCV 2025posterarXiv:2503.21979
33
citations
#744

Preference Optimization for Reasoning with Pseudo Feedback

Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.

ICLR 2025posterarXiv:2411.16345
33
citations
#745

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola et al.

ICLR 2025posterarXiv:2501.19309
33
citations
#746

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025posterarXiv:2404.10775
33
citations
#747

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

ICLR 2025posterarXiv:2410.12112
33
citations
#748

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347
33
citations
#749

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.

AAAI 2025paperarXiv:2402.05813
33
citations
#750

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078
33
citations
#751

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025paperarXiv:2406.12692
33
citations
#752

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025posterarXiv:2406.04391
33
citations
#753

The Diffusion Duality

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.

ICML 2025posterarXiv:2506.10892
33
citations
#754

On the Emergence of Position Bias in Transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka et al.

ICML 2025posterarXiv:2502.01951
33
citations
#755

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations
#756

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
33
citations
#757

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025posterarXiv:2409.07402
33
citations
#758

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.

ICLR 2025posterarXiv:2405.14297
33
citations
#759

Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran et al.

ICLR 2025posterarXiv:2409.15647
33
citations
#760

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

Zhe Li, Weihao Yuan, Yisheng He et al.

ICLR 2025posterarXiv:2410.07093
33
citations
#761

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Ruichuan An, Sihan Yang, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.14671
33
citations
#762

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143
33
citations
#763

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025posterarXiv:2402.04236
33
citations
#764

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush et al.

ICLR 2025posterarXiv:2410.13638
33
citations
#765

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

Elies Gil-Fuster, Casper Gyurik, Adrian Perez-Salinas et al.

ICLR 2025posterarXiv:2406.07072
33
citations
#766

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Ziyu Zhao, tao shen, Didi Zhu et al.

ICLR 2025posterarXiv:2409.16167
33
citations
#767

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Ziniu Li, Congliang Chen, Tian Xu et al.

ICLR 2025posterarXiv:2408.16673
33
citations
#768

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#769

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316
33
citations
#770

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Hui Zhang, Dexiang Hong, Yitong Wang et al.

ICCV 2025posterarXiv:2412.03859
33
citations
#771

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025posterarXiv:2412.04472
32
citations
#772

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025posterarXiv:2406.06526
32
citations
#773

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025posterarXiv:2412.10958
32
citations
#774

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Hengrui Kang, Siwei Wen, Zichen Wen et al.

ICCV 2025highlightarXiv:2503.15264
32
citations
#775

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sifan Wang, Ananyae bhartari, Bowen Li et al.

NEURIPS 2025posterarXiv:2502.00604
32
citations
#776

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Xiangyu Zhao, Peiyuan Zhang, Kexian Tang et al.

NEURIPS 2025oralarXiv:2504.02826
32
citations
#777

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.

ICLR 2025posterarXiv:2412.10319
32
citations
#778

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Emily Cheng, Diego Doimo, Corentin Kervadec et al.

ICLR 2025posterarXiv:2405.15471
32
citations
#779

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025posterarXiv:2410.13640
32
citations
#780

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
32
citations
#781

PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion

Sophia Tang, Yinuo Zhang, Pranam Chatterjee, PhD

ICML 2025posterarXiv:2412.17780
32
citations
#782

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

HUAKUN LUO, Haixu Wu, Hang Zhou et al.

ICML 2025posterarXiv:2502.02414
32
citations
#783

Reasoning Models Better Express Their Confidence

Dongkeun Yoon, Seungone Kim, Sohee Yang et al.

NEURIPS 2025posterarXiv:2505.14489
32
citations
#784

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan et al.

ICLR 2025posterarXiv:2410.04265
32
citations
#785

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

NEURIPS 2025posterarXiv:2506.14603
32
citations
#786

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025posterarXiv:2410.05440
32
citations
#787

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025posterarXiv:2411.19772
32
citations
#788

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025posterarXiv:2406.10210
32
citations
#789

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025posterarXiv:2501.04689
32
citations
#790

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321
31
citations
#791

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.

NEURIPS 2025posterarXiv:2505.11475
31
citations
#792

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NEURIPS 2025spotlightarXiv:2502.08640
31
citations
#793

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025posterarXiv:2502.20694
31
citations
#794

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Xinze Li, Sen Mei, Zhenghao Liu et al.

ICLR 2025posterarXiv:2410.13509
31
citations
#795

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025posterarXiv:2410.07348
31
citations
#796

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025posterarXiv:2407.14414
31
citations
#797

REEF: Representation Encoding Fingerprints for Large Language Models

Jie Zhang, Dongrui Liu, Chen Qian et al.

ICLR 2025posterarXiv:2410.14273
31
citations
#798

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Mingyuan Zhou, Huangjie Zheng, Yi Gu et al.

ICLR 2025posterarXiv:2410.14919
31
citations
#799

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496
31
citations
#800

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
31
citations