Most Cited 2024 "orlicz space" Papers

12,324 papers found • Page 2 of 62

#201

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang et al.

CVPR 2024arXiv:2312.02145
332
citations
#202

Human Motion Diffusion as a Generative Prior

Yonatan Shafir, Guy Tevet, Roy Kapon et al.

ICLR 2024arXiv:2303.01418
331
citations
#203

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky et al.

ICLR 2024arXiv:2309.17400
330
citations
#204

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024arXiv:2312.06635
329
citations
#205

Statistical Rejection Sampling Improves Preference Optimization

Tianqi Liu, Yao Zhao, Rishabh Joshi et al.

ICLR 2024arXiv:2309.06657
329
citations
#206

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

CVPR 2024arXiv:2312.13150
328
citations
#207

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu et al.

CVPR 2024highlightarXiv:2311.12198
328
citations
#208

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew et al.

CVPR 2024arXiv:2311.16498
327
citations
#209

Detecting Pretraining Data from Large Language Models

Weijia Shi, Anirudh Ajith, Mengzhou Xia et al.

ICLR 2024arXiv:2310.16789
327
citations
#210

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang et al.

CVPR 2024arXiv:2312.04461
327
citations
#211

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Izzeddin Gur, Hiroki Furuta, Austin Huang et al.

ICLR 2024arXiv:2307.12856
325
citations
#212

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024arXiv:2403.12008
323
citations
#213

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Xingchao Liu, Xiwen Zhang, Jianzhu Ma et al.

ICLR 2024arXiv:2309.06380
323
citations
#214

Vision-Language Foundation Models as Effective Robot Imitators

Xinghang Li, Minghuan Liu, Hanbo Zhang et al.

ICLR 2024spotlightarXiv:2311.01378
320
citations
#215

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Jian Xie, Kai Zhang, Jiangjie Chen et al.

ICML 2024spotlightarXiv:2402.01622
319
citations
#216

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.

CVPR 2024arXiv:2311.15826
319
citations
#217

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Yujun Shi, Chuhui Xue, Jun Hao Liew et al.

CVPR 2024highlightarXiv:2306.14435
314
citations
#218

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Ori Yoran, Tomer Wolfson, Ori Ram et al.

ICLR 2024arXiv:2310.01558
314
citations
#219

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Guan Wang, Sijie Cheng, Xianyuan Zhan et al.

ICLR 2024arXiv:2309.11235
313
citations
#220

UniDepth: Universal Monocular Metric Depth Estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis et al.

CVPR 2024highlightarXiv:2403.18913
312
citations
#221

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

CVPR 2024arXiv:2303.04761
312
citations
#222

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Wei Xiong, Hanze Dong, Chenlu Ye et al.

ICML 2024arXiv:2312.11456
312
citations
#223

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Yihua Huang, Yangtian Sun, Ziyi Yang et al.

CVPR 2024arXiv:2312.14937
311
citations
#224

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Zhan Li, Zhang Chen, Zhong Li et al.

CVPR 2024arXiv:2312.16812
309
citations
#225

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, Xiaolong Wang

ICLR 2024spotlightarXiv:2310.16828
308
citations
#226

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Saleh Ashkboos, Maximilian Croci, Marcelo Gennari do Nascimento et al.

ICLR 2024arXiv:2401.15024
307
citations
#227

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou, Guansong Pang, Yu Tian et al.

ICLR 2024arXiv:2310.18961
306
citations
#228

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Zeqian Ju, Yuancheng Wang, Kai Shen et al.

ICML 2024arXiv:2403.03100
306
citations
#229

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma et al.

ICML 2024arXiv:2311.12871
305
citations
#230

AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training

Ziyu Wan, Xidong Feng, Muning Wen et al.

ICML 2024arXiv:2309.17179
304
citations
#231

Personalize Segment Anything Model with One Shot

Renrui Zhang, Zhengkai Jiang, Ziyu Guo et al.

ICLR 2024arXiv:2305.03048
301
citations
#232

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024arXiv:2312.00063
300
citations
#233

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.

ECCV 2024arXiv:2312.00451
297
citations
#234

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Yung-Sung Chuang, Yujia Xie, Hongyin Luo et al.

ICLR 2024arXiv:2309.03883
296
citations
#235

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024arXiv:2308.16911
295
citations
#236

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit et al.

CVPR 2024highlightarXiv:2401.09603
294
citations
#237

ReconFusion: 3D Reconstruction with Diffusion Priors

Rundi Wu, Ben Mildenhall, Philipp Henzler et al.

CVPR 2024arXiv:2312.02981
293
citations
#238

MemoryBank: Enhancing Large Language Models with Long-Term Memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao et al.

AAAI 2024paperarXiv:2305.10250
290
citations
#239

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Minghua Liu, Ruoxi Shi, Linghao Chen et al.

CVPR 2024arXiv:2311.07885
288
citations
#240

Long-CLIP: Unlocking the Long-Text Capability of CLIP

Beichen Zhang, Pan Zhang, Xiaoyi Dong et al.

ECCV 2024arXiv:2403.15378
287
citations
#241

DreamLLM: Synergistic Multimodal Comprehension and Creation

Runpei Dong, chunrui han, Yuang Peng et al.

ICLR 2024spotlightarXiv:2309.11499
287
citations
#242

Understanding the Effects of RLHF on LLM Generalisation and Diversity

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis et al.

ICLR 2024arXiv:2310.06452
287
citations
#243

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024paperarXiv:2304.01186
284
citations
#244

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang et al.

ICLR 2024spotlightarXiv:2310.12508
284
citations
#245

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He, Ziyan Chen et al.

ECCV 2024arXiv:2308.15070
283
citations
#246

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI 2024paperarXiv:2305.16986
283
citations
#247

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Yixun Liang, Xin Yang, Jiantao Lin et al.

CVPR 2024highlightarXiv:2311.11284
282
citations
#248

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Dongping Chen, Ruoxi Chen, Shilin Zhang et al.

ICML 2024arXiv:2402.04788
281
citations
#249

InceptionNeXt: When Inception Meets ConvNeXt

Weihao Yu, Pan Zhou, Shuicheng Yan et al.

CVPR 2024arXiv:2303.16900
280
citations
#250

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Jiasen Lu, Christopher Clark, Sangho Lee et al.

CVPR 2024highlightarXiv:2312.17172
280
citations
#251

DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang

CVPR 2024arXiv:2312.00858
279
citations
#252

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Dai Shi

CVPR 2024arXiv:2311.17132
279
citations
#253

Provable Robust Watermarking for AI-Generated Text

Xuandong Zhao, Prabhanjan Ananth, Lei Li et al.

ICLR 2024arXiv:2306.17439
279
citations
#254

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang et al.

ICML 2024arXiv:2402.13753
278
citations
#255

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Tianyang Liu, Canwen Xu, Julian McAuley

ICLR 2024arXiv:2306.03091
278
citations
#256

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo et al.

CVPR 2024arXiv:2312.09147
278
citations
#257

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662
278
citations
#258

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu et al.

CVPR 2024arXiv:2312.16256
277
citations
#259

VeRA: Vector-based Random Matrix Adaptation

Dawid Kopiczko, Tijmen Blankevoort, Yuki Asano

ICLR 2024arXiv:2310.11454
276
citations
#260

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Yiyang Zhou, Chenhang Cui, Jaehong Yoon et al.

ICLR 2024arXiv:2310.00754
275
citations
#261

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu et al.

ICLR 2024arXiv:2312.01552
274
citations
#262

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024paperarXiv:2301.11798
274
citations
#263

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Rongyuan Wu, Tao Yang, Lingchen Sun et al.

CVPR 2024arXiv:2311.16518
274
citations
#264

Building Cooperative Embodied Agents Modularly with Large Language Models

Hongxin Zhang, Weihua Du, Jiaming Shan et al.

ICLR 2024arXiv:2307.02485
273
citations
#265

Evaluating Large Language Models at Evaluating Instruction Following

Zhiyuan Zeng, Jiatong Yu, Tianyu Gao et al.

ICLR 2024arXiv:2310.07641
273
citations
#266

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Yeyun Gong et al.

ICLR 2024arXiv:2309.17452
272
citations
#267

SqueezeLLM: Dense-and-Sparse Quantization

Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.

ICML 2024arXiv:2306.07629
272
citations
#268

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024paperarXiv:2305.14836
271
citations
#269

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

Guangsheng Bao, Yanbin Zhao, Zhiyang Teng et al.

ICLR 2024arXiv:2310.05130
269
citations
#270

Factorizing Text-to-Video Generation by Explicit Image Conditioning

Rohit Girdhar, Mannat Singh, Andrew Brown et al.

ECCV 2024arXiv:2311.10709
266
citations
#271

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
264
citations
#272

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan et al.

ECCV 2024arXiv:2403.14621
264
citations
#273

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Yi-Lun Liao, Brandon Wood, Abhishek Das et al.

ICLR 2024arXiv:2306.12059
263
citations
#274

Large Language Models as Tool Makers

Tianle Cai, Xuezhi Wang, Tengyu Ma et al.

ICLR 2024arXiv:2305.17126
263
citations
#275

Stochastic Controlled Averaging for Federated Learning with Communication Compression

Xinmeng Huang, Ping Li, Xiaoyun Li

ICLR 2024spotlightarXiv:2308.08165
262
citations
#276

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

Jian Xie, Kai Zhang, Jiangjie Chen et al.

ICLR 2024spotlightarXiv:2305.13300
261
citations
#277

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

Xingyao Wang, Zihan Wang, Jiateng Liu et al.

ICLR 2024arXiv:2309.10691
260
citations
#278

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna et al.

ICML 2024arXiv:2402.07865
258
citations
#279

The Alignment Problem from a Deep Learning Perspective

Richard Ngo, Lawrence Chan, Sören Mindermann

ICLR 2024arXiv:2209.00626
258
citations
#280

Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.

CVPR 2024arXiv:2312.05251
258
citations
#281

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Yichao Fu, Peter Bailis, Ion Stoica et al.

ICML 2024arXiv:2402.02057
257
citations
#282

VTimeLLM: Empower LLM to Grasp Video Moments

Bin Huang, Xin Wang, Hong Chen et al.

CVPR 2024highlightarXiv:2311.18445
257
citations
#283

On Scaling Up a Multilingual Vision and Language Model

Xi Chen, Josip Djolonga, Piotr Padlewski et al.

CVPR 2024arXiv:2305.18565
256
citations
#284

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

Yuqi Wang, Jiawei He, Lue Fan et al.

CVPR 2024arXiv:2311.17918
255
citations
#285

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Shusheng Xu, Wei Fu, Jiaxuan Gao et al.

ICML 2024arXiv:2404.10719
253
citations
#286

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366
252
citations
#287

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Hao Shao, Yuxuan Hu, Letian Wang et al.

CVPR 2024arXiv:2312.07488
251
citations
#288

Language Models Represent Space and Time

Wes Gurnee, Max Tegmark

ICLR 2024oralarXiv:2310.02207
251
citations
#289

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024arXiv:2404.19702
250
citations
#290

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin, Adam Polyak, Uriel Singer et al.

CVPR 2024highlightarXiv:2311.10089
250
citations
#291

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.

ECCV 2024arXiv:2403.09611
250
citations
#292

Fine-Tuning Language Models for Factuality

Katherine Tian, Eric Mitchell, Huaxiu Yao et al.

ICLR 2024arXiv:2311.08401
249
citations
#293

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Tao Yang, Rongyuan Wu, Peiran Ren et al.

ECCV 2024arXiv:2308.14469
249
citations
#294

QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Jiaming Tang, Yilong Zhao, Kan Zhu et al.

ICML 2024arXiv:2406.10774
248
citations
#295

Can LLM-Generated Misinformation Be Detected?

Canyu Chen, Kai Shu

ICLR 2024arXiv:2309.13788
248
citations
#296

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu et al.

CVPR 2024arXiv:2310.11440
248
citations
#297

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg Bökman et al.

CVPR 2024arXiv:2305.15404
248
citations
#298

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

Hongtao Wu, Ya Jing, Chilam Cheang et al.

ICLR 2024arXiv:2312.13139
248
citations
#299

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli

CVPR 2024arXiv:2304.06140
247
citations
#300

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Taoran Yi, Jiemin Fang, Junjie Wang et al.

CVPR 2024arXiv:2310.08529
246
citations
#301

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Balakrishnan Varadarajan, Lemeng Wu et al.

CVPR 2024highlightarXiv:2312.00863
246
citations
#302

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Xin Guo, Jiangwei Lao, Bo Dang et al.

CVPR 2024arXiv:2312.10115
244
citations
#303

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Zeyuan Allen-Zhu, Yuanzhi Li

ICML 2024spotlightarXiv:2309.14316
244
citations
#304

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

Xiaohan Ding, Yiyuan Zhang, Yixiao Ge et al.

CVPR 2024arXiv:2311.15599
243
citations
#305

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

CVPR 2024arXiv:2403.06912
242
citations
#306

Self-Consuming Generative Models Go MAD

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi et al.

ICLR 2024arXiv:2307.01850
241
citations
#307

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Hong Liu, Zhiyuan Li, David Hall et al.

ICLR 2024arXiv:2305.14342
241
citations
#308

QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

Albert Tseng, Jerry Chee, Qingyao Sun et al.

ICML 2024arXiv:2402.04396
241
citations
#309

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi et al.

AAAI 2024paperarXiv:2308.11730
240
citations
#310

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Xuhui Zhou, Hao Zhu, Leena Mathur et al.

ICLR 2024spotlightarXiv:2310.11667
239
citations
#311

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Xiaofeng Wang, Zheng Zhu, Guan Huang et al.

ECCV 2024arXiv:2309.09777
239
citations
#312

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

ECCV 2024arXiv:2403.14781
239
citations
#313

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Haoning Wu, Zicheng Zhang, Erli Zhang et al.

ICLR 2024spotlightarXiv:2309.14181
239
citations
#314

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld et al.

CVPR 2024highlightarXiv:2312.02069
238
citations
#315

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Biao Zhang, Zhongtao Liu, Colin Cherry et al.

ICLR 2024arXiv:2402.17193
238
citations
#316

SaProt: Protein Language Modeling with Structure-aware Vocabulary

Jin Su, Chenchen Han, Yuyang Zhou et al.

ICLR 2024spotlight
237
citations
#317

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Fanghua Yu, Jinjin Gu, Zheyuan Li et al.

CVPR 2024arXiv:2401.13627
237
citations
#318

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Yizhi Li, Ruibin Yuan, Ge Zhang et al.

ICLR 2024arXiv:2306.00107
237
citations
#319

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

Yi Wang, Kunchang Li, Xinhao Li et al.

ECCV 2024arXiv:2403.15377
236
citations
#320

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Yukang Chen, Shengju Qian, Haotian Tang et al.

ICLR 2024arXiv:2309.12307
235
citations
#321

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh

ICLR 2024spotlightarXiv:2307.14539
235
citations
#322

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024arXiv:2312.00785
235
citations
#323

Omni-Kernel Network for Image Restoration

Yuning Cui, Wenqi Ren, Alois Knoll

AAAI 2024paper
235
citations
#324

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Andrew Campbell, Jason Yim, Regina Barzilay et al.

ICML 2024arXiv:2402.04997
234
citations
#325

3D-VLA: A 3D Vision-Language-Action Generative World Model

Haoyu Zhen, Xiaowen Qiu, Peihao Chen et al.

ICML 2024arXiv:2403.09631
233
citations
#326

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.

CVPR 2024arXiv:2307.06949
232
citations
#327

Better & Faster Large Language Models via Multi-token Prediction

Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.

ICML 2024arXiv:2404.19737
232
citations
#328

GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces

Yingwenqi Jiang, Jiadong Tu, Yuan Liu et al.

CVPR 2024arXiv:2311.17977
232
citations
#329

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang, Yuhui Zhang, Orr Zohar et al.

ECCV 2024arXiv:2403.10517
231
citations
#330

Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang, Peize Sun et al.

ECCV 2024arXiv:2307.04767
230
citations
#331

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu, Wei Yin, Mu Hu et al.

ECCV 2024arXiv:2403.12013
230
citations
#332

OpenEQA: Embodied Question Answering in the Era of Foundation Models

Arjun Majumdar, Anurag Ajay, Xiaohan Zhang et al.

CVPR 2024
230
citations
#333

Style Aligned Image Generation via Shared Attention

Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.

CVPR 2024arXiv:2312.02133
230
citations
#334

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

Yinghao Xu, Hao Tan, Fujun Luan et al.

ICLR 2024spotlightarXiv:2311.09217
227
citations
#335

FreeU: Free Lunch in Diffusion U-Net

Chenyang Si, Ziqi Huang, Yuming Jiang et al.

CVPR 2024arXiv:2309.11497
227
citations
#336

Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

Wei-Bang Jiang, Liming Zhao, Bao-liang Lu

ICLR 2024spotlightarXiv:2405.18765
226
citations
#337

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

Yufei Wang, Wenhan Yang, Xinyuan Chen et al.

CVPR 2024arXiv:2311.14760
226
citations
#338

MACE: Mass Concept Erasure in Diffusion Models

Shilin Lu, Zilan Wang, Leyang Li et al.

CVPR 2024arXiv:2403.06135
226
citations
#339

TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

Defu Cao, Furong Jia, Sercan Arik et al.

ICLR 2024oralarXiv:2310.04948
225
citations
#340

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova et al.

ICML 2024arXiv:2401.12070
225
citations
#341

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

CVPR 2024highlightarXiv:2312.09008
225
citations
#342

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Zhiyuan Li, Hong Liu, Denny Zhou et al.

ICLR 2024arXiv:2402.12875
224
citations
#343

Listen, Think, and Understand

Yuan Gong, Hongyin Luo, Alexander Liu et al.

ICLR 2024arXiv:2305.10790
224
citations
#344

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Alex Gu, Baptiste Roziere, Hugh Leather et al.

ICML 2024arXiv:2401.03065
224
citations
#345

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

Chao Chen, Kai Liu, Ze Chen et al.

ICLR 2024arXiv:2402.03744
224
citations
#346

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Sheng Liu, Haotian Ye, Lei Xing et al.

ICML 2024arXiv:2311.06668
224
citations
#347

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024arXiv:2402.17485
223
citations
#348

PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen, Chongjian GE, Enze Xie et al.

ECCV 2024
223
citations
#349

Data Filtering Networks

Alex Fang, Albin Madappally Jose, Amit Jain et al.

ICLR 2024arXiv:2309.17425
222
citations
#350

Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis

Simon Niedermayr, Josef Stumpfegger, rüdiger westermann

CVPR 2024arXiv:2401.02436
222
citations
#351

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation

Fangyuan Xu, Weijia Shi, Eunsol Choi

ICLR 2024
222
citations
#352

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

Licheng Wen, DAOCHENG FU, Xin Li et al.

ICLR 2024arXiv:2309.16292
222
citations
#353

One For All: Towards Training One Graph Model For All Classification Tasks

Hao Liu, Jiarui Feng, Lecheng Kong et al.

ICLR 2024spotlightarXiv:2310.00149
221
citations
#354

EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu

CVPR 2024arXiv:2405.06880
221
citations
#355

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Jiaqi Lin, Zhihao Li, Xiao Tang et al.

CVPR 2024arXiv:2402.17427
219
citations
#356

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Zhengyi Wang, Yikai Wang, Yifei Chen et al.

ECCV 2024arXiv:2403.05034
219
citations
#357

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Rishabh Agarwal, Nino Vieillard, Yongchao Zhou et al.

ICLR 2024arXiv:2306.13649
218
citations
#358

MagicDrive: Street View Generation with Diverse 3D Geometry Control

Ruiyuan Gao, Kai Chen, Enze Xie et al.

ICLR 2024arXiv:2310.02601
218
citations
#359

Demystifying CLIP Data

Hu Xu, Saining Xie, Xiaoqing Tan et al.

ICLR 2024spotlightarXiv:2309.16671
216
citations
#360

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Kai Cheng, Xiaoxiao Long, Kaizhi Yang et al.

ICML 2024arXiv:2402.14650
215
citations
#361

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.

CVPR 2024highlightarXiv:2311.15475
214
citations
#362

Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots

Xavier Puig, Eric Undersander, Andrew Szot et al.

ICLR 2024arXiv:2310.13724
214
citations
#363

ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback

Ganqu Cui, Lifan Yuan, Ning Ding et al.

ICML 2024arXiv:2310.01377
214
citations
#364

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Morteza Mardani, Jiaming Song, Jan Kautz et al.

ICLR 2024arXiv:2305.04391
213
citations
#365

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024arXiv:2312.08874
212
citations
#366

FITS: Modeling Time Series with $10k$ Parameters

Zhijian Xu, Ailing Zeng, Qiang Xu

ICLR 2024spotlightarXiv:2307.03756
212
citations
#367

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan, John Hughes, Dan Valentine et al.

ICML 2024arXiv:2402.06782
212
citations
#368

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Victoria Lin, Xilun Chen, Mingda Chen et al.

ICLR 2024arXiv:2310.01352
210
citations
#369

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

Songhua Yang, Hanjie Zhao, Senbin Zhu et al.

AAAI 2024paperarXiv:2308.03549
210
citations
#370

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Zilong Wang, Hao Zhang, Chun-Liang Li et al.

ICLR 2024arXiv:2401.04398
209
citations
#371

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Kai Yang, Jian Tao, Jiafei Lyu et al.

CVPR 2024arXiv:2311.13231
209
citations
#372

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Chong Mou, Xintao Wang, Jiechong Song et al.

ICLR 2024spotlightarXiv:2307.02421
209
citations
#373

From Sparse to Soft Mixtures of Experts

Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa et al.

ICLR 2024spotlightarXiv:2308.00951
208
citations
#374

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Yangjun Ruan, Honghua Dong, Andrew Wang et al.

ICLR 2024spotlightarXiv:2309.15817
208
citations
#375

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Dujian Ding, Ankur Mallick, Chi Wang et al.

ICLR 2024arXiv:2404.14618
208
citations
#376

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Xinyuan Chen, Yaohui Wang, Lingjun Zhang et al.

ICLR 2024oralarXiv:2310.20700
208
citations
#377

Magicoder: Empowering Code Generation with OSS-Instruct

Yuxiang Wei, Zhe Wang, Jiawei Liu et al.

ICML 2024arXiv:2312.02120
208
citations
#378

Honeybee: Locality-enhanced Projector for Multimodal LLM

Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.

CVPR 2024highlightarXiv:2312.06742
208
citations
#379

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.

CVPR 2024arXiv:2312.09228
207
citations
#380

Proving Test Set Contamination in Black-Box Language Models

Yonatan Oren, Nicole Meister, Niladri Chatterji et al.

ICLR 2024arXiv:2310.17623
203
citations
#381

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Jiawei Yang, Boris Ivanovic, Or Litany et al.

ICLR 2024oralarXiv:2311.02077
203
citations
#382

Conformal Risk Control

Anastasios Angelopoulos, Stephen Bates, Adam Fisch et al.

ICLR 2024spotlightarXiv:2208.02814
203
citations
#383

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Xinyuan Wang, Chenxi Li, Zhen Wang et al.

ICLR 2024arXiv:2310.16427
202
citations
#384

LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models

Yixiao Li, Yifan Yu, Chen Liang et al.

ICLR 2024arXiv:2310.08659
202
citations
#385

OneLLM: One Framework to Align All Modalities with Language

Jiaming Han, Kaixiong Gong, Yiyuan Zhang et al.

CVPR 2024arXiv:2312.03700
201
citations
#386

Multilingual Jailbreak Challenges in Large Language Models

Yue Deng, Wenxuan Zhang, Sinno Pan et al.

ICLR 2024arXiv:2310.06474
201
citations
#387

COLMAP-Free 3D Gaussian Splatting

Yang Fu, Sifei Liu, Amey Kulkarni et al.

CVPR 2024highlightarXiv:2312.07504
201
citations
#388

TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series

Chenxi Sun, Hongyan Li, Yaliang Li et al.

ICLR 2024arXiv:2308.08241
200
citations
#389

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Shilong Liu, Hao Cheng, Haotian Liu et al.

ECCV 2024arXiv:2311.05437
200
citations
#390

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

ICLR 2024arXiv:2310.02226
200
citations
#391

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Ling Yang, Zhaochen Yu, Chenlin Meng et al.

ICML 2024arXiv:2401.11708
200
citations
#392

Fast Timing-Conditioned Latent Audio Diffusion

Zach Evans, CJ Carr, Josiah Taylor et al.

ICML 2024arXiv:2402.04825
199
citations
#393

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024arXiv:2311.17600
199
citations
#394

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Hengrui Zhang, Jiani Zhang, Zhengyuan Shen et al.

ICLR 2024arXiv:2310.09656
199
citations
#395

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

ICLR 2024arXiv:2308.07921
198
citations
#396

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Le Xue, Ning Yu, Shu Zhang et al.

CVPR 2024arXiv:2305.08275
198
citations
#397

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang et al.

ICLR 2024arXiv:2307.07697
198
citations
#398

PixelLM: Pixel Reasoning with Large Multimodal Model

Zhongwei Ren, Zhicheng Huang, Yunchao Wei et al.

CVPR 2024arXiv:2312.02228
197
citations
#399

Function Vectors in Large Language Models

Eric Todd, Millicent Li, Arnab Sen Sharma et al.

ICLR 2024arXiv:2310.15213
197
citations
#400

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024arXiv:2312.02134
197
citations