Most Cited 2024 "self-supervised acoustic probing" Papers

12,324 papers found • Page 3 of 62

#401

Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model

Fei Liu, Tong Xialiang, Mingxuan Yuan et al.

ICML 2024arXiv:2401.02051
196
citations
#402

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Enneng Yang, Zhenyi Wang, Li Shen et al.

ICLR 2024arXiv:2310.02575
196
citations
#403

Nash Learning from Human Feedback

REMI MUNOS, Michal Valko, Daniele Calandriello et al.

ICML 2024spotlightarXiv:2312.00886
195
citations
#404

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Zechun Liu, Changsheng Zhao, Forrest Iandola et al.

ICML 2024arXiv:2402.14905
195
citations
#405

AlphaFold Meets Flow Matching for Generating Protein Ensembles

Bowen Jing, Bonnie Berger, Tommi Jaakkola

ICML 2024arXiv:2402.04845
195
citations
#406

Nougat: Neural Optical Understanding for Academic Documents

Lukas Blecher, Guillem Cucurull Preixens, Thomas Scialom et al.

ICLR 2024arXiv:2308.13418
194
citations
#407

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Yiming Xie, Varun Jampani, Lei Zhong et al.

ICLR 2024arXiv:2310.08580
194
citations
#408

OctoPack: Instruction Tuning Code Large Language Models

Niklas Muennighoff, Qian Liu, Armel Zebaze et al.

ICLR 2024spotlightarXiv:2308.07124
194
citations
#409

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Seokju Cho, Heeseong Shin, Sunghwan Hong et al.

CVPR 2024highlightarXiv:2303.11797
193
citations
#410

Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

Bowen Song, Soo Min Kwon, Zecheng Zhang et al.

ICLR 2024spotlightarXiv:2307.08123
193
citations
#411

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Hubert Siuzdak

ICLR 2024arXiv:2306.00814
192
citations
#412

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
192
citations
#413

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang, Fan-Yun Sun, Luca Weihs et al.

CVPR 2024arXiv:2312.09067
192
citations
#414

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Haozhe Zhao, Zefan Cai, Shuzheng Si et al.

ICLR 2024arXiv:2309.07915
191
citations
#415

GS-IR: 3D Gaussian Splatting for Inverse Rendering

Zhihao Liang, Qi Zhang, Ying Feng et al.

CVPR 2024arXiv:2311.16473
191
citations
#416

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen et al.

ICLR 2024arXiv:2310.06117
190
citations
#417

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

Jiazuo Yu, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2024arXiv:2403.11549
190
citations
#418

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

Youtian Lin, Zuozhuo Dai, Siyu Zhu et al.

CVPR 2024highlightarXiv:2312.03431
190
citations
#419

Making LLaMA SEE and Draw with SEED Tokenizer

Yuying Ge, Sijie Zhao, Ziyun Zeng et al.

ICLR 2024arXiv:2310.01218
190
citations
#420

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Weiyao Lin et al.

ECCV 2024arXiv:2403.14530
188
citations
#421

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Bo He, Hengduo Li, Young Kyun Jang et al.

CVPR 2024arXiv:2404.05726
188
citations
#422

Reward Model Ensembles Help Mitigate Overoptimization

Thomas Coste, Usman Anwar, Robert Kirk et al.

ICLR 2024arXiv:2310.02743
188
citations
#423

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024arXiv:2402.07872
188
citations
#424

Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization

Joe Benton, Valentin De Bortoli, Arnaud Doucet et al.

ICLR 2024spotlightarXiv:2308.03686
188
citations
#425

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen et al.

ICML 2024arXiv:2311.01455
188
citations
#426

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Tao Hu, Stefan Andreas Baumann, Ming Gui et al.

ECCV 2024arXiv:2403.13802
188
citations
#427

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2024paperarXiv:2401.01686
188
citations
#428

PMET: Precise Model Editing in a Transformer

Xiaopeng Li, Shasha Li, Shezheng Song et al.

AAAI 2024paperarXiv:2308.08742
187
citations
#429

ReLoRA: High-Rank Training Through Low-Rank Updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde et al.

ICLR 2024arXiv:2307.05695
187
citations
#430

Data Engineering for Scaling Language Models to 128K Context

Yao Fu, Rameswar Panda, Xinyao Niu et al.

ICML 2024arXiv:2402.10171
186
citations
#431

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu, Chuanchen Luo, Lue Fan et al.

ECCV 2024arXiv:2404.01133
186
citations
#432

Putting the Object Back into Video Object Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price et al.

CVPR 2024highlightarXiv:2310.12982
185
citations
#433

On the Reliability of Watermarks for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen et al.

ICLR 2024arXiv:2306.04634
185
citations
#434

MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting

Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.

AAAI 2024paperarXiv:2401.00423
185
citations
#435

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Fred Zhang, Neel Nanda

ICLR 2024arXiv:2309.16042
185
citations
#436

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Boyi Wei, Kaixuan Huang, Yangsibo Huang et al.

ICML 2024arXiv:2402.05162
184
citations
#437

RMT: Retentive Networks Meet Vision Transformers

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

CVPR 2024arXiv:2309.11523
184
citations
#438

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

Sharath Girish, Kamal Gupta, Abhinav Shrivastava

ECCV 2024arXiv:2312.04564
184
citations
#439

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande et al.

ICLR 2024oralarXiv:2311.04892
184
citations
#440

Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians

Guangchi Fang, Bing Wang

ECCV 2024arXiv:2403.14166
183
citations
#441

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang, Zachary Izzo, Yaohui Zhang et al.

ICML 2024arXiv:2403.07183
183
citations
#442

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.

CVPR 2024arXiv:2312.16217
182
citations
#443

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Xiaoxuan Wang, ziniu hu, Pan Lu et al.

ICML 2024arXiv:2307.10635
181
citations
#444

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Haixu Wu, Huakun Luo, Haowen Wang et al.

ICML 2024spotlightarXiv:2402.02366
181
citations
#445

InstanceDiffusion: Instance-level Control for Image Generation

XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.

CVPR 2024arXiv:2402.03290
180
citations
#446

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Fahim Tajwar, Anikait Singh, Archit Sharma et al.

ICML 2024arXiv:2404.14367
179
citations
#447

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

ECCV 2024arXiv:2408.12569
179
citations
#448

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

Ning Miao, Yee Whye Teh, Tom Rainforth

ICLR 2024arXiv:2308.00436
179
citations
#449

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

CVPR 2024arXiv:2312.01725
179
citations
#450

Fundamental Limitations of Alignment in Large Language Models

Yotam Wolf, Noam Wies, Oshri Avnery et al.

ICML 2024arXiv:2304.11082
178
citations
#451

tinyBenchmarks: evaluating LLMs with fewer examples

Felipe Maia Polo, Lucas Weber, Leshem Choshen et al.

ICML 2024arXiv:2402.14992
178
citations
#452

Generalized Planning in PDDL Domains with Pretrained Large Language Models

Tom Silver, Soham Dan, Kavitha Srinivas et al.

AAAI 2024paperarXiv:2305.11014
178
citations
#453

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

Jin-Chuan Shi, Miao Wang, Haobin Duan et al.

CVPR 2024arXiv:2311.18482
177
citations
#454

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now

Yimeng Zhang, jinghan jia, Xin Chen et al.

ECCV 2024arXiv:2310.11868
176
citations
#455

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024paperarXiv:2308.07707
176
citations
#456

BioCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew Thompson et al.

CVPR 2024arXiv:2311.18803
176
citations
#457

RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

Ximiao Zhang, Min Xu, Xiuzhuang Zhou

CVPR 2024arXiv:2403.05897
176
citations
#458

Uni3D: Exploring Unified 3D Representation at Scale

Junsheng Zhou, Jinsheng Wang, Baorui Ma et al.

ICLR 2024spotlightarXiv:2310.06773
175
citations
#459

Beyond Memorization: Violating Privacy via Inference with Large Language Models

Robin Staab, Mark Vero, Mislav Balunovic et al.

ICLR 2024spotlightarXiv:2310.07298
175
citations
#460

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole et al.

ECCV 2024arXiv:2311.13600
175
citations
#461

LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images

Zonghao Guo, Ruyi Xu, Yuan Yao et al.

ECCV 2024arXiv:2403.11703
174
citations
#462

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Wang Zhaolong, Zhuang Li et al.

CVPR 2024arXiv:2403.05087
174
citations
#463

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Huan Ling, Seung Wook Kim, Antonio Torralba et al.

CVPR 2024highlightarXiv:2312.13763
174
citations
#464

Talk like a Graph: Encoding Graphs for Large Language Models

Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi

ICLR 2024arXiv:2310.04560
174
citations
#465

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Yukang Cao, Yan-Pei Cao, Kai Han et al.

CVPR 2024arXiv:2304.00916
174
citations
#466

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

Sherwin Bahmani, Ivan Skorokhodov, Victor Rong et al.

CVPR 2024arXiv:2311.17984
174
citations
#467

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie et al.

ICLR 2024arXiv:2310.10012
173
citations
#468

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular Stereo and RGB-D Cameras

Huajian Huang, Longwei Li, Hui Cheng et al.

CVPR 2024arXiv:2311.16728
173
citations
#469

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Lingteng Qiu, Guanying Chen, Xiaodong Gu et al.

CVPR 2024highlightarXiv:2311.16918
173
citations
#470

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Asma Ghandeharioun, ‪Avi Caciularu‬‏, Adam Pearce et al.

ICML 2024arXiv:2401.06102
173
citations
#471

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

ECCV 2024arXiv:2311.16933
173
citations
#472

Logit Standardization in Knowledge Distillation

Shangquan Sun, Wenqi Ren, Jingzhi Li et al.

CVPR 2024highlightarXiv:2403.01427
172
citations
#473

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Zhiqi Li, Zhiding Yu, Shiyi Lan et al.

CVPR 2024arXiv:2312.03031
172
citations
#474

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.

ECCV 2024arXiv:2311.16038
172
citations
#475

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Jingxiang Sun, Bo Zhang, Ruizhi Shao et al.

ICLR 2024arXiv:2310.16818
172
citations
#476

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Chancharik Mitra, Brandon Huang, Trevor Darrell et al.

CVPR 2024arXiv:2311.17076
171
citations
#477

Can Large Language Models Infer Causation from Correlation?

Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.

ICLR 2024arXiv:2306.05836
171
citations
#478

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Zeyi Sun, Ye Fang, Tong Wu et al.

CVPR 2024arXiv:2312.03818
170
citations
#479

What Algorithms can Transformers Learn? A Study in Length Generalization

Hattie Zhou, Arwen Bradley, Etai Littwin et al.

ICLR 2024arXiv:2310.16028
170
citations
#480

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.

CVPR 2024highlightarXiv:2311.09257
170
citations
#481

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan et al.

ICML 2024arXiv:2403.03234
170
citations
#482

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

Shoukang Hu, Tao Hu, Ziwei Liu

CVPR 2024arXiv:2312.02973
170
citations
#483

Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration

Peyman Milanfar, Mauricio Delbracio

ICLR 2024arXiv:2303.11435
169
citations
#484

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj et al.

CVPR 2024arXiv:2312.07531
169
citations
#485

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Ke Wang, Houxing Ren, Aojun Zhou et al.

ICLR 2024arXiv:2310.03731
169
citations
#486

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

Qian Huang, Jian Vora, Percy Liang et al.

ICML 2024arXiv:2310.03302
168
citations
#487

Is Self-Repair a Silver Bullet for Code Generation?

Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang et al.

ICLR 2024arXiv:2306.09896
168
citations
#488

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Junjie Wang, Jiemin Fang, Xiaopeng Zhang et al.

CVPR 2024arXiv:2311.16037
168
citations
#489

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang, Xinyi Yang, Yihao Feng et al.

CVPR 2024arXiv:2303.09618
168
citations
#490

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Zhifeng Kong, ARUSHI GOEL, Rohan Badlani et al.

ICML 2024arXiv:2402.01831
168
citations
#491

ZipIt! Merging Models from Different Tasks without Training

George Stoica, Daniel Bolya, Jakob Bjorner et al.

ICLR 2024arXiv:2305.03053
167
citations
#492

Self-Alignment with Instruction Backtranslation

Xian Li, Ping Yu, Chunting Zhou et al.

ICLR 2024arXiv:2308.06259
167
citations
#493

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Rui Zhao, Yuchao Gu, Jay Zhangjie Wu et al.

ECCV 2024arXiv:2310.08465
167
citations
#494

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou et al.

ICLR 2024spotlightarXiv:2310.17884
166
citations
#495

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng, Boyao ZHOU, Ruizhi Shao et al.

CVPR 2024highlightarXiv:2312.02155
166
citations
#496

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

Andrew Lee, Xiaoyan Bai, Itamar Pres et al.

ICML 2024arXiv:2401.01967
165
citations
#497

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Xuan JU, Xian Liu, Xintao Wang et al.

ECCV 2024arXiv:2403.06976
165
citations
#498

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Yapei Chang, Kyle Lo, Tanya Goyal et al.

ICLR 2024arXiv:2310.00785
164
citations
#499

Equivariant Multi-Modality Image Fusion

Zixiang Zhao, Haowen Bai, Jiangshe Zhang et al.

CVPR 2024arXiv:2305.11443
164
citations
#500

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

Xinyu Yuan, Yan Qiao

ICLR 2024oralarXiv:2403.01742
164
citations
#501

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024paperarXiv:2308.11681
163
citations
#502

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024arXiv:2404.16006
163
citations
#503

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Zigang Geng, Binxin Yang, Tiankai Hang et al.

CVPR 2024arXiv:2309.03895
162
citations
#504

Generative End-to-End Autonomous Driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.

ECCV 2024arXiv:2402.11502
162
citations
#505

Grounded Text-to-Image Synthesis with Attention Refocusing

Quynh Phung, Songwei Ge, Jia-Bin Huang

CVPR 2024arXiv:2306.05427
162
citations
#506

Repeat After Me: Transformers are Better than State Space Models at Copying

Samy Jelassi, David Brandfonbrener, Sham Kakade et al.

ICML 2024arXiv:2402.01032
162
citations
#507

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Yue Fan, Xiaojian Ma, Rujie Wu et al.

ECCV 2024arXiv:2403.11481
161
citations
#508

RAIN: Your Language Models Can Align Themselves without Finetuning

Yuhui Li, Fangyun Wei, Jinjing Zhao et al.

ICLR 2024arXiv:2309.07124
161
citations
#509

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Niels Mündler, Jingxuan He, Slobodan Jenko et al.

ICLR 2024arXiv:2305.15852
161
citations
#510

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024arXiv:2401.06118
160
citations
#511

Learning to Act from Actionless Videos through Dense Correspondences

Po-Chen Ko, Jiayuan Mao, Yilun Du et al.

ICLR 2024spotlightarXiv:2310.08576
160
citations
#512

Multi-Modal Hallucination Control by Visual Information Grounding

Alessandro Favero, Luca Zancato, Matthew Trager et al.

CVPR 2024arXiv:2403.14003
160
citations
#513

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Fuzhao Xue, Zian Zheng, Yao Fu et al.

ICML 2024arXiv:2402.01739
160
citations
#514

Flow Matching on General Geometries

Ricky T. Q. Chen, Yaron Lipman

ICLR 2024arXiv:2302.03660
159
citations
#515

Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering

Zhiwen Yan, Weng Fei Low, Yu Chen et al.

CVPR 2024arXiv:2311.17089
159
citations
#516

Chain of Hindsight aligns Language Models with Feedback

Hao Liu, Carmelo Sferrazza, Pieter Abbeel

ICLR 2024arXiv:2302.02676
158
citations
#517

Emu: Generative Pretraining in Multimodality

Quan Sun, Qiying Yu, Yufeng Cui et al.

ICLR 2024arXiv:2307.05222
158
citations
#518

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu, Zicheng Zhang, Erli Zhang et al.

CVPR 2024arXiv:2311.06783
158
citations
#519

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.

CVPR 2024arXiv:2312.04433
158
citations
#520

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.

ECCV 2024arXiv:2401.01339
157
citations
#521

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

Yue Huang, Jiawen Shi, Yuan Li et al.

ICLR 2024arXiv:2310.03128
157
citations
#522

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop et al.

ECCV 2024arXiv:2404.05719
157
citations
#523

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

Chaoqi Wang, Yibo Jiang, Chenghao Yang et al.

ICLR 2024spotlightarXiv:2309.16240
157
citations
#524

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

Xingang Guo, Fangxu Yu, Huan Zhang et al.

ICML 2024arXiv:2402.08679
156
citations
#525

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Yuhui Xu, Lingxi Xie, Xiaotao Gu et al.

ICLR 2024arXiv:2309.14717
156
citations
#526

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Junyoung Seo, Wooseok Jang, Min-Seop Kwak et al.

ICLR 2024arXiv:2303.07937
155
citations
#527

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Peng Chen, Yingying ZHANG, Yunyao Cheng et al.

ICLR 2024oralarXiv:2402.05956
155
citations
#528

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Mu Cai, Haotian Liu, Siva Mustikovela et al.

CVPR 2024arXiv:2312.00784
155
citations
#529

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Peng Wang, Hao Tan, Sai Bi et al.

ICLR 2024spotlightarXiv:2311.12024
155
citations
#530

Global Structure-from-Motion Revisited

Linfei Pan, Daniel Barath, Marc Pollefeys et al.

ECCV 2024arXiv:2407.20219
155
citations
#531

Generative Judge for Evaluating Alignment

Junlong Li, Shichao Sun, Weizhe Yuan et al.

ICLR 2024arXiv:2310.05470
155
citations
#532

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Mingyuan Zhou, Huangjie Zheng, Zhendong Wang et al.

ICML 2024arXiv:2404.04057
154
citations
#533

Interpreting CLIP's Image Representation via Text-Based Decomposition

Yossi Gandelsman, Alexei Efros, Jacob Steinhardt

ICLR 2024arXiv:2310.05916
154
citations
#534

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

Vaidehi Ramesh Patil, Peter Hase, Mohit Bansal

ICLR 2024spotlightarXiv:2309.17410
154
citations
#535

Hypothesis Search: Inductive Reasoning with Language Models

Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.

ICLR 2024arXiv:2309.05660
154
citations
#536

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.

ECCV 2024arXiv:2312.03594
153
citations
#537

Rotary Position Embedding for Vision Transformer

Byeongho Heo, Song Park, Dongyoon Han et al.

ECCV 2024arXiv:2403.13298
153
citations
#538

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Ming Li, Taojiannan Yang, Huafeng Kuang et al.

ECCV 2024arXiv:2404.07987
153
citations
#539

SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D

Weiyu LI, Rui Chen, Xuelin Chen et al.

ICLR 2024arXiv:2310.02596
153
citations
#540

HUGS: Human Gaussian Splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.

CVPR 2024arXiv:2311.17910
153
citations
#541

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

Haonan Qiu, Menghan Xia, Yong Zhang et al.

ICLR 2024oralarXiv:2310.15169
152
citations
#542

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Lu Yin, You Wu, Zhenyu Zhang et al.

ICML 2024arXiv:2310.05175
152
citations
#543

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.

ICLR 2024arXiv:2305.13269
151
citations
#544

Time Travel in LLMs: Tracing Data Contamination in Large Language Models

Shahriar Golchin, Mihai Surdeanu

ICLR 2024spotlightarXiv:2308.08493
151
citations
#545

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.

ICML 2024arXiv:2402.05749
150
citations
#546

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yijun Yang, Ruiyuan Gao, Xiaosen Wang et al.

CVPR 2024arXiv:2311.17516
150
citations
#547

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Seonghyeon Ye, Doyoung Kim, Sungdong Kim et al.

ICLR 2024spotlightarXiv:2307.10928
150
citations
#548

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

Arvind Mahankali, Tatsunori Hashimoto, Tengyu Ma

ICLR 2024arXiv:2307.03576
150
citations
#549

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Yunxing Liao, Xing Liu et al.

ICML 2024arXiv:2402.17152
150
citations
#550

Infrared Small Target Detection with Scale and Location Sensitivity

Qiankun Liu, Rui Liu, Bolun Zheng et al.

CVPR 2024arXiv:2403.19366
149
citations
#551

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Tsu-Jui Fu, Wenze Hu, Xianzhi Du et al.

ICLR 2024spotlightarXiv:2309.17102
149
citations
#552

Osprey: Pixel Understanding with Visual Instruction Tuning

Yuqian Yuan, Wentong Li, Jian liu et al.

CVPR 2024arXiv:2312.10032
149
citations
#553

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang, Wei Cheng, Yue Wu et al.

ICLR 2024arXiv:2305.17359
149
citations
#554

Make Pixels Dance: High-Dynamic Video Generation

Yan Zeng, Guoqiang Wei, Jiani Zheng et al.

CVPR 2024arXiv:2311.10982
149
citations
#555

Timer: Generative Pre-trained Transformers Are Large Time Series Models

Yong Liu, Haoran Zhang, Chenyu Li et al.

ICML 2024arXiv:2402.02368
148
citations
#556

LLaGA: Large Language and Graph Assistant

Runjin Chen, Tong Zhao, Ajay Jaiswal et al.

ICML 2024arXiv:2402.08170
148
citations
#557

AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

Yunkang Cao, Jiangning Zhang, Luca Frittoli et al.

ECCV 2024arXiv:2407.15795
148
citations
#558

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Yuwen Xiong, Zhiqi Li, Yuntao Chen et al.

CVPR 2024highlightarXiv:2401.06197
148
citations
#559

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Zahra Kadkhodaie, Florentin Guth, Eero Simoncelli et al.

ICLR 2024arXiv:2310.02557
147
citations
#560

Video Language Planning

Yilun Du, Sherry Yang, Pete Florence et al.

ICLR 2024arXiv:2310.10625
147
citations
#561

Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration

Chen Zhao, Weiling Cai, Chenyu Dong et al.

CVPR 2024arXiv:2311.16845
147
citations
#562

SE(3)-Stochastic Flow Matching for Protein Backbone Generation

Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet et al.

ICLR 2024spotlightarXiv:2310.02391
147
citations
#563

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Yuelang Xu, Benwang Chen, Zhe Li et al.

CVPR 2024
147
citations
#564

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Ziniu Li, Tian Xu, Yushun Zhang et al.

ICML 2024arXiv:2310.10505
147
citations
#565

GRiT: A Generative Region-to-text Transformer for Object Understanding

Jialian Wu, Jianfeng Wang, Zhengyuan Yang et al.

ECCV 2024arXiv:2212.00280
147
citations
#566

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Yuren Cong, Mengmeng Xu, Christian Simon et al.

ICLR 2024oralarXiv:2310.05922
147
citations
#567

Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo, Javier Civera

CVPR 2024arXiv:2311.15937
146
citations
#568

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Tong Wu, Guandao Yang, Zhibing Li et al.

CVPR 2024arXiv:2401.04092
146
citations
#569

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Zhengyang Tang, Xingxing Zhang, Benyou Wang et al.

ICML 2024arXiv:2403.02884
146
citations
#570

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Yifan Wang, Xingyi He, Sida Peng et al.

CVPR 2024highlightarXiv:2403.04765
146
citations
#571

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Hao Fei, Shengqiong Wu, Wei Ji et al.

ICML 2024oralarXiv:2501.03230
146
citations
#572

Stealing part of a production language model

Nicholas Carlini, Daniel Paleka, Krishnamurthy Dvijotham et al.

ICML 2024arXiv:2403.06634
145
citations
#573

XFeat: Accelerated Features for Lightweight Image Matching

Guilherme Potje, Felipe Cadar, André Araujo et al.

CVPR 2024arXiv:2404.19174
145
citations
#574

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri et al.

ICLR 2024arXiv:2310.03668
145
citations
#575

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Yutao Hu, Tianbin, Quanfeng Lu et al.

CVPR 2024arXiv:2402.09181
144
citations
#576

Small-scale proxies for large-scale Transformer training instabilities

Mitchell Wortsman, Peter Liu, Lechao Xiao et al.

ICLR 2024arXiv:2309.14322
144
citations
#577

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum et al.

ICLR 2024oralarXiv:2305.11854
144
citations
#578

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Zayne Sprague, Xi Ye, Kaj Bostrom et al.

ICLR 2024spotlightarXiv:2310.16049
144
citations
#579

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

Teng Hu, Jiangning Zhang, Ran Yi et al.

AAAI 2024paperarXiv:2312.05767
144
citations
#580

Compact 3D Scene Representation via Self-Organizing Gaussian Grids

Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.

ECCV 2024arXiv:2312.13299
143
citations
#581

Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

Mingzhan Yang, Guangxin Han, Bin Yan et al.

AAAI 2024paperarXiv:2308.00783
143
citations
#582

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Yuzhou Huang, Liangbin Xie, Xintao Wang et al.

CVPR 2024highlightarXiv:2312.06739
143
citations
#583

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

Jiakai Sun, Han Jiao, Guangyuan Li et al.

CVPR 2024highlightarXiv:2403.01444
143
citations
#584

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.

ICLR 2024arXiv:2309.05444
143
citations
#585

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Yuxi Xiao, Qianqian Wang, Shangzhan Zhang et al.

CVPR 2024highlightarXiv:2404.04319
143
citations
#586

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

ICLR 2024spotlightarXiv:2308.09124
143
citations
#587

Neural Video Compression with Feature Modulation

Jiahao Li, Bin Li, Yan Lu

CVPR 2024arXiv:2402.17414
143
citations
#588

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024arXiv:2402.04291
142
citations
#589

Physics-Based Interaction with 3D Objects via Video Generation

Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.

ECCV 2024arXiv:2404.13026
142
citations
#590

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey, Euan Ong, Stuart Russell et al.

ICML 2024arXiv:2309.00236
142
citations
#591

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.

ICML 2024arXiv:2403.00425
142
citations
#592

Does Writing with Language Models Reduce Content Diversity?

Vishakh Padmakumar, He He

ICLR 2024arXiv:2309.05196
142
citations
#593

Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection

Huan Liu, Zichang Tan, Chuangchuang Tan et al.

CVPR 2024arXiv:2312.16649
141
citations
#594

Improving LoRA in Privacy-preserving Federated Learning

Youbang Sun, Zitao Li, Yaliang Li et al.

ICLR 2024arXiv:2403.12313
141
citations
#595

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin, Maxime Gasse, Massimo Caccia et al.

ICML 2024arXiv:2403.07718
141
citations
#596

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Dongyang Liu, Renrui Zhang, Longtian Qiu et al.

ICML 2024arXiv:2402.05935
141
citations
#597

BetterV: Controlled Verilog Generation with Discriminative Guidance

Zehua Pei, Huiling Zhen, Mingxuan Yuan et al.

ICML 2024arXiv:2402.03375
141
citations
#598

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.

CVPR 2024arXiv:2311.11278
140
citations
#599

Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, Sabri Eyuboglu, Michael Zhang et al.

ICML 2024spotlightarXiv:2402.18668
140
citations
#600

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.

AAAI 2024paperarXiv:2312.12856
140
citations