Most Cited 2024 "informative perturbations" Papers

12,324 papers found • Page 4 of 62

#601

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.

AAAI 2024paperarXiv:2312.12856
140
citations
#602

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Hongyu Zhou, Jiahao Shao, Lu Xu et al.

CVPR 2024arXiv:2403.12722
139
citations
#603

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Gokul Swamy, Christoph Dann, Rahul Kidambi et al.

ICML 2024arXiv:2401.04056
139
citations
#604

The Pitfalls of Next-Token Prediction

Gregor Bachmann, Vaishnavh Nagarajan

ICML 2024arXiv:2403.06963
139
citations
#605

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Cong Wei, Yang Chen, Haonan Chen et al.

ECCV 2024arXiv:2311.17136
139
citations
#606

DisCo: Disentangled Control for Realistic Human Dance Generation

Tan Wang, Linjie Li, Kevin Lin et al.

CVPR 2024arXiv:2307.00040
139
citations
#607

Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.

CVPR 2024arXiv:2404.08636
138
citations
#608

Zipformer: A faster and better encoder for automatic speech recognition

Zengwei Yao, Liyong Guo, Xiaoyu Yang et al.

ICLR 2024arXiv:2310.11230
138
citations
#609

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

Haibo Jin, Haoxuan Che, Yi Lin et al.

AAAI 2024paperarXiv:2308.12604
137
citations
#610

Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.

ICLR 2024arXiv:2310.01714
137
citations
#611

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.

ICLR 2024arXiv:2310.12921
137
citations
#612

Mixture of LoRA Experts

xun wu, Shaohan Huang, Furu Wei

ICLR 2024arXiv:2404.13628
136
citations
#613

DiffusionSat: A Generative Foundation Model for Satellite Imagery

Samar Khanna, Patrick Liu, Linqi Zhou et al.

ICLR 2024oralarXiv:2312.03606
136
citations
#614

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

Zelai Xu, Chao Yu, Fei Fang et al.

ICML 2024arXiv:2310.18940
136
citations
#615

SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters

Shengsheng Lin, Weiwei Lin, Wentai Wu et al.

ICML 2024oralarXiv:2405.00946
136
citations
#616

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou et al.

ECCV 2024arXiv:2402.03246
136
citations
#617

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Chengshu Li, Jacky Liang, Andy Zeng et al.

ICML 2024arXiv:2312.04474
135
citations
#618

Denoising Diffusion Bridge Models

Linqi Zhou, Aaron Lou, Samar Khanna et al.

ICLR 2024arXiv:2309.16948
135
citations
#619

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Yifei Zhou, Andrea Zanette, Jiayi Pan et al.

ICML 2024oralarXiv:2402.19446
135
citations
#620

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.

CVPR 2024arXiv:2306.11290
134
citations
#621

Rich Human Feedback for Text-to-Image Generation

Youwei Liang, Junfeng He, Gang Li et al.

CVPR 2024arXiv:2312.10240
134
citations
#622

Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs

Shi Liu, Kecheng Zheng, Wei Chen

ECCV 2024arXiv:2407.21771
134
citations
#623

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar, Zhenshi Li, Feng Gu et al.

ECCV 2024arXiv:2402.02544
133
citations
#624

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Xiaoxin He, Xavier Bresson, Thomas Laurent et al.

ICLR 2024arXiv:2305.19523
133
citations
#625

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Chunlong Xia, Xinliang Wang, Feng Lv et al.

CVPR 2024highlightarXiv:2403.07392
133
citations
#626

VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.

CVPR 2024arXiv:2401.05577
132
citations
#627

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Zhi-Yi Chin, Chieh Ming Jiang, Ching-Chun Huang et al.

ICML 2024arXiv:2309.06135
132
citations
#628

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024paperarXiv:2312.16337
132
citations
#629

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

Xuanchi Ren, Jiahui Huang, Xiaohui Zeng et al.

CVPR 2024highlightarXiv:2312.03806
132
citations
#630

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun, Yang Han, Zihan Zhao et al.

AAAI 2024paperarXiv:2308.13149
132
citations
#631

In-context Autoencoder for Context Compression in a Large Language Model

Tao Ge, Hu Jing, Lei Wang et al.

ICLR 2024arXiv:2307.06945
132
citations
#632

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Yang Liu, Muzhi Zhu, Hengtao Li et al.

ICLR 2024arXiv:2305.13310
131
citations
#633

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Xilie Xu, Keyi Kong, Ning Liu et al.

ICLR 2024arXiv:2310.13345
131
citations
#634

LEDITS++: Limitless Image Editing using Text-to-Image Models

Manuel Brack, Felix Friedrich, Katharina Kornmeier et al.

CVPR 2024arXiv:2311.16711
131
citations
#635

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li, Zedong Wang, Zicheng Liu et al.

ICLR 2024arXiv:2211.03295
131
citations
#636

GART: Gaussian Articulated Template Models

Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.

CVPR 2024highlightarXiv:2311.16099
131
citations
#637

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Baoxiong Jia, Yixin Chen, Huangyue Yu et al.

ECCV 2024arXiv:2401.09340
131
citations
#638

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024arXiv:2404.03384
131
citations
#639

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

Keiran Paster, Marco Dos Santos, Zhangir Azerbayev et al.

ICLR 2024arXiv:2310.06786
130
citations
#640

SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation

Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.

AAAI 2024paperarXiv:2312.17071
130
citations
#641

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Man Yao, Jiakui Hu, Tianxiang Hu et al.

ICLR 2024arXiv:2404.03663
130
citations
#642

Learning Performance-Improving Code Edits

Alexander Shypula, Aman Madaan, Yimeng Zeng et al.

ICLR 2024spotlightarXiv:2302.07867
130
citations
#643

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat et al.

ICLR 2024arXiv:2310.08461
130
citations
#644

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

Xunpeng Yi, Han Xu, HAO ZHANG et al.

CVPR 2024arXiv:2403.16387
130
citations
#645

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Tai Wang, Xiaohan Mao, Chenming Zhu et al.

CVPR 2024arXiv:2312.16170
130
citations
#646

NeuRAD: Neural Rendering for Autonomous Driving

Adam Tonderski, Carl Lindström, Georg Hess et al.

CVPR 2024highlightarXiv:2311.15260
130
citations
#647

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

Jaemin Cho, Yushi Hu, Jason Baldridge et al.

ICLR 2024arXiv:2310.18235
130
citations
#648

GSVA: Generalized Segmentation via Multimodal Large Language Models

Zhuofan Xia, Dongchen Han, Yizeng Han et al.

CVPR 2024arXiv:2312.10103
130
citations
#649

Relightable Gaussian Codec Avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon et al.

CVPR 2024arXiv:2312.03704
129
citations
#650

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024arXiv:2404.00308
129
citations
#651

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

Qifan Yu, Juncheng Li, Longhui Wei et al.

CVPR 2024arXiv:2311.13614
129
citations
#652

Manifold Preserving Guided Diffusion

Yutong He, Naoki Murata, Chieh-Hsin Lai et al.

ICLR 2024arXiv:2311.16424
129
citations
#653

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Jiehong Lin, lihua liu, Dekun Lu et al.

CVPR 2024arXiv:2311.15707
129
citations
#654

Behavior Generation with Latent Actions

Seungjae Lee, Yibin Wang, Haritheja Etukuru et al.

ICML 2024spotlightarXiv:2403.03181
129
citations
#655

Large Language Model Cascades with Mixture of Thought Representations for Cost-Efficient Reasoning

Murong Yue, Jie Zhao, Min Zhang et al.

ICLR 2024arXiv:2310.03094
129
citations
#656

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.

CVPR 2024arXiv:2403.03431
128
citations
#657

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.

CVPR 2024highlightarXiv:2403.09630
128
citations
#658

QuRating: Selecting High-Quality Data for Training Language Models

Alexander Wettig, Aatmik Gupta, Saumya Malik et al.

ICML 2024spotlightarXiv:2402.09739
128
citations
#659

The Illusion of State in State-Space Models

William Merrill, Jackson Petty, Ashish Sabharwal

ICML 2024arXiv:2404.08819
128
citations
#660

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz et al.

CVPR 2024arXiv:2312.06709
128
citations
#661

Dolphins: Multimodal Language Model for Driving

Yingzi Ma, Yulong Cao, Jiachen Sun et al.

ECCV 2024arXiv:2312.00438
128
citations
#662

Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.

CVPR 2024arXiv:2403.12580
127
citations
#663

TSLANet: Rethinking Transformers for Time Series Representation Learning

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen et al.

ICML 2024oralarXiv:2404.08472
127
citations
#664

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Feng Wang, Jieru Mei, Alan Yuille

ECCV 2024arXiv:2312.01597
127
citations
#665

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld et al.

CVPR 2024arXiv:2312.03209
126
citations
#666

What's In My Big Data?

Yanai Elazar, Akshita Bhagia, Ian Magnusson et al.

ICLR 2024spotlightarXiv:2310.20707
126
citations
#667

Adapting Large Language Models via Reading Comprehension

Daixuan Cheng, Shaohan Huang, Furu Wei

ICLR 2024
126
citations
#668

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Yan-Shuo Liang, Wu-Jun Li

CVPR 2024arXiv:2404.00228
126
citations
#669

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024arXiv:2402.10207
125
citations
#670

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Zhenyu Li, Sunqi Fan, Yu Gu et al.

AAAI 2024paperarXiv:2308.12060
125
citations
#671

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Yuxuan Zhang, Yiren Song, Jiaming Liu et al.

CVPR 2024arXiv:2312.16272
125
citations
#672

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.

ECCV 2024arXiv:2311.12092
125
citations
#673

AnyText: Multilingual Visual Text Generation and Editing

Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.

ICLR 2024spotlightarXiv:2311.03054
125
citations
#674

Large Language Models to Enhance Bayesian Optimization

Tennison Liu, Nicolás Astorga, Nabeel Seedat et al.

ICLR 2024arXiv:2402.03921
125
citations
#675

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing Yu, Haoyi Duan, Junhwa Hur et al.

CVPR 2024arXiv:2312.03884
124
citations
#676

An LLM Compiler for Parallel Function Calling

Sehoon Kim, Suhong Moon, Ryan Tabrizi et al.

ICML 2024arXiv:2312.04511
124
citations
#677

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

CVPR 2024arXiv:2312.00777
123
citations
#678

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.

ICML 2024arXiv:2402.02207
123
citations
#679

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Borui Zhang et al.

CVPR 2024arXiv:2311.12754
123
citations
#680

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Nikhil Sardana, Jacob Portes, Alexandre (Sasha) Doubov et al.

ICML 2024arXiv:2401.00448
123
citations
#681

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Lingyi Hong, Shilin Yan, Renrui Zhang et al.

CVPR 2024highlightarXiv:2403.09634
123
citations
#682

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Yin Fang, Xiaozhuan Liang, Ningyu Zhang et al.

ICLR 2024arXiv:2306.08018
123
citations
#683

Zoology: Measuring and Improving Recall in Efficient Language Models

Simran Arora, Sabri Eyuboglu, Aman Timalsina et al.

ICLR 2024arXiv:2312.04927
123
citations
#684

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024arXiv:2312.02139
122
citations
#685

Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems

Hyungjin Chung, Suhyeon Lee, Jong Chul YE

ICLR 2024arXiv:2303.05754
121
citations
#686

Scaling Laws of RoPE-based Extrapolation

Xiaoran Liu, Hang Yan, Chenxin An et al.

ICLR 2024arXiv:2310.05209
121
citations
#687

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Lirui Wang, Yiyang Ling, Zhecheng Yuan et al.

ICLR 2024spotlightarXiv:2310.01361
121
citations
#688

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.

ECCV 2024arXiv:2404.19759
121
citations
#689

Drag Anything: Motion Control for Anything using Entity Representation

Weijia Wu, Zhuang Li, Yuchao Gu et al.

ECCV 2024
121
citations
#690

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lù, Zdeněk Kasner, Siva Reddy

ICML 2024spotlightarXiv:2402.05930
121
citations
#691

Text-to-3D with Classifier Score Distillation

Xin Yu, Yuan-Chen Guo, Yangguang Li et al.

ICLR 2024arXiv:2310.19415
121
citations
#692

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.

CVPR 2024arXiv:2312.06968
121
citations
#693

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Yuxi Wei, Zi Wang, Yifan Lu et al.

CVPR 2024highlightarXiv:2402.05746
121
citations
#694

Teaching Arithmetic to Small Transformers

Nayoung Lee, Kartik Sreenivasan, Jason Lee et al.

ICLR 2024arXiv:2307.03381
121
citations
#695

Token-level Direct Preference Optimization

Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.

ICML 2024arXiv:2404.11999
120
citations
#696

Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao et al.

ICLR 2024arXiv:2309.14859
120
citations
#697

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang et al.

ECCV 2024arXiv:2402.17766
120
citations
#698

Scaling Laws for Fine-Grained Mixture of Experts

Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.

ICML 2024arXiv:2402.07871
120
citations
#699

LGMRec: Local and Global Graph Learning for Multimodal Recommendation

Zhiqiang Guo, Jianjun Li, Guohui Li et al.

AAAI 2024paperarXiv:2312.16400
120
citations
#700

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Ceyao Zhang, Kaijie Yang, Siyi Hu et al.

AAAI 2024paperarXiv:2308.11339
120
citations
#701

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Hao Ouyang, Qiuyu Wang, Yuxi Xiao et al.

CVPR 2024highlightarXiv:2308.07926
120
citations
#702

LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models

Gunho Park, baeseong park, Minsub Kim et al.

ICLR 2024arXiv:2206.09557
119
citations
#703

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Yufei Wang, Zhanyi Sun, Jesse Zhang et al.

ICML 2024arXiv:2402.03681
119
citations
#704

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

Xuanyu Zhang, Runyi Li, Jiwen Yu et al.

CVPR 2024arXiv:2312.08883
119
citations
#705

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.

CVPR 2024highlightarXiv:2312.16145
119
citations
#706

RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches

Jiayuan Gu, Sean Kirmani, Paul Wohlhart et al.

ICLR 2024spotlightarXiv:2311.01977
119
citations
#707

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.

CVPR 2024arXiv:2304.06819
118
citations
#708

Controlled Decoding from Language Models

Sidharth Mudgal, Jong Lee, Harish Ganapathy et al.

ICML 2024arXiv:2310.17022
118
citations
#709

Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models

Fei Shen, Hu Ye, Jun Zhang et al.

ICLR 2024arXiv:2310.06313
118
citations
#710

Towards Learning a Generalist Model for Embodied Navigation

Duo Zheng, Shijia Huang, Lin Zhao et al.

CVPR 2024highlightarXiv:2312.02010
118
citations
#711

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

Jinxia Xie, Bineng Zhong, Zhiyi Mo et al.

CVPR 2024
118
citations
#712

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Weiyun Wang, Min Shi, Qingyun Li et al.

ICLR 2024arXiv:2308.01907
118
citations
#713

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Pratyusha Sharma, Jordan Ash, Dipendra Kumar Misra

ICLR 2024arXiv:2312.13558
118
citations
#714

InstructIR: High-Quality Image Restoration Following Human Instructions

Marcos Conde, Gregor Geigle, Radu Timofte

ECCV 2024arXiv:2401.16468
118
citations
#715

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Yuqi Wang, Yuntao Chen, Xingyu Liao et al.

CVPR 2024arXiv:2306.10013
118
citations
#716

SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow

Yihan Wang, Lahav Lipson, Jia Deng

ECCV 2024arXiv:2405.14793
117
citations
#717

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang, Yuchun Wang, Jie Guo et al.

ECCV 2024arXiv:2407.07520
117
citations
#718

Implicit Style-Content Separation using B-LoRA

Yarden Frenkel, Yael Vinker, Ariel Shamir et al.

ECCV 2024arXiv:2403.14572
117
citations
#719

Cameras as Rays: Pose Estimation via Ray Diffusion

Jason Zhang, Amy Lin, Moneish Kumar et al.

ICLR 2024arXiv:2402.14817
117
citations
#720

Efficient Test-Time Adaptation of Vision-Language Models

Adilbek Karmanov, Dayan Guan, Shijian Lu et al.

CVPR 2024arXiv:2403.18293
116
citations
#721

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024arXiv:2406.11816
116
citations
#722

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo et al.

CVPR 2024arXiv:2311.17113
116
citations
#723

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.

CVPR 2024arXiv:2403.12030
116
citations
#724

Challenges in Training PINNs: A Loss Landscape Perspective

Pratik Rathore, Weimu Lei, Zachary Frangella et al.

ICML 2024arXiv:2402.01868
116
citations
#725

Retrieval meets Long Context Large Language Models

Peng Xu, Wei Ping, Xianchao Wu et al.

ICLR 2024arXiv:2310.03025
116
citations
#726

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024arXiv:2311.18159
115
citations
#727

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024arXiv:2312.03661
115
citations
#728

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Zijin Yang, Kai Zeng, Kejiang Chen et al.

CVPR 2024arXiv:2404.04956
115
citations
#729

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024highlightarXiv:2402.05408
115
citations
#730

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models

Ahmad Faiz, Sotaro Kaneda, Ruhan Wang et al.

ICLR 2024arXiv:2309.14393
115
citations
#731

Universal Jailbreak Backdoors from Poisoned Human Feedback

Javier Rando, Florian Tramer

ICLR 2024arXiv:2311.14455
114
citations
#732

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis

ECCV 2024arXiv:2312.00112
114
citations
#733

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Suhas Kotha, Jacob Springer, Aditi Raghunathan

ICLR 2024arXiv:2309.10105
114
citations
#734

Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding

AAAI 2024paperarXiv:2312.08697
114
citations
#735

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Hao Zhang, Hongyang Li, Feng Li et al.

ECCV 2024arXiv:2312.02949
114
citations
#736

Motion Mamba: Efficient and Long Sequence Motion Generation

Zeyu Zhang, Akide Liu, Ian Reid et al.

ECCV 2024arXiv:2403.07487
114
citations
#737

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue, Jing Zhang, Kun Hu et al.

AAAI 2024paperarXiv:2308.08746
114
citations
#738

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia et al.

CVPR 2024highlightarXiv:2403.00486
114
citations
#739

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan et al.

CVPR 2024arXiv:2404.05231
114
citations
#740

CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller, Angela Dai

CVPR 2024arXiv:2311.16097
114
citations
#741

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Ziyao Guo, Kai Wang, George Cazenavette et al.

ICLR 2024arXiv:2310.05773
114
citations
#742

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi et al.

CVPR 2024arXiv:2312.13913
113
citations
#743

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

Jiahui Zhang, Fangneng Zhan, MUYU XU et al.

CVPR 2024arXiv:2403.06908
113
citations
#744

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin et al.

CVPR 2024arXiv:2311.14405
113
citations
#745

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327
113
citations
#746

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Neel Jain, Ping-yeh Chiang, Yuxin Wen et al.

ICLR 2024arXiv:2310.05914
113
citations
#747

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Konstantin Mishchenko, Aaron Defazio

ICML 2024arXiv:2306.06101
113
citations
#748

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Zhiwei Lin, Zhe Liu, Zhongyu Xia et al.

CVPR 2024arXiv:2403.16440
113
citations
#749

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Di Chang, Yichun Shi, Quankai Gao et al.

ICML 2024arXiv:2311.12052
113
citations
#750

Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing

Jian Gao, chun gu, Youtian Lin et al.

ECCV 2024arXiv:2311.16043
112
citations
#751

FasterViT: Fast Vision Transformers with Hierarchical Attention

Ali Hatamizadeh, Greg Heinrich, Hongxu Yin et al.

ICLR 2024arXiv:2306.06189
112
citations
#752

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang, Chao Feng, Ziyang Chen et al.

CVPR 2024arXiv:2401.18084
112
citations
#753

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Raghav Kapoor, Yash Parag Butala, Melisa A Russak et al.

ECCV 2024arXiv:2402.17553
112
citations
#754

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024paperarXiv:2311.13314
112
citations
#755

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.

ECCV 2024arXiv:2403.08321
112
citations
#756

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Yazhou Xing, Yingqing He, Zeyue Tian et al.

CVPR 2024arXiv:2402.17723
112
citations
#757

LEGO-Prover: Neural Theorem Proving with Growing Libraries

Haiming Wang, Huajian Xin, Chuanyang Zheng et al.

ICLR 2024arXiv:2310.00656
112
citations
#758

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Yingqing He, Shaoshu Yang, Haoxin Chen et al.

ICLR 2024spotlightarXiv:2310.07702
111
citations
#759

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Shangchen Zhou, Peiqing Yang, Jianyi Wang et al.

CVPR 2024highlightarXiv:2312.06640
111
citations
#760

HumanTOMATO: Text-aligned Whole-body Motion Generation

Shunlin Lu, Ling-Hao Chen, Ailing Zeng et al.

ICML 2024arXiv:2310.12978
111
citations
#761

SmartPlay : A Benchmark for LLMs as Intelligent Agents

Yue Wu, Xuan Tang, Tom Mitchell et al.

ICLR 2024arXiv:2310.01557
111
citations
#762

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

Tianbao Xie, Siheng Zhao, Chen Henry Wu et al.

ICLR 2024spotlightarXiv:2309.11489
111
citations
#763

Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang, Chuming Wang, Weitao Wang et al.

ECCV 2024arXiv:2403.15704
111
citations
#764

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

Linlu Qiu, Liwei Jiang, Ximing Lu et al.

ICLR 2024arXiv:2310.08559
110
citations
#765

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Longtao Zheng, Rundong Wang, Xinrun Wang et al.

ICLR 2024arXiv:2306.07863
110
citations
#766

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICLR 2024spotlightarXiv:2310.01403
110
citations
#767

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

Chunming He, Kai Li, Yachao Zhang et al.

ICLR 2024arXiv:2308.03166
110
citations
#768

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Lichang Chen, Chen Zhu, Jiuhai Chen et al.

ICML 2024arXiv:2402.07319
110
citations
#769

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.

ICML 2024arXiv:2402.07043
110
citations
#770

Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation

Xiao Ma, Sumit Patidar, Iain Haughton et al.

CVPR 2024arXiv:2403.03890
110
citations
#771

AI Control: Improving Safety Despite Intentional Subversion

Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.

ICML 2024arXiv:2312.06942
110
citations
#772

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.

ICLR 2024arXiv:2305.15086
109
citations
#773

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

Wenjing Wang, Huan Yang, Jianlong Fu et al.

CVPR 2024arXiv:2403.12933
109
citations
#774

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

Keming Lu, Hongyi Yuan, Zheng Yuan et al.

ICLR 2024arXiv:2308.07074
109
citations
#775

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.

CVPR 2024arXiv:2404.05225
109
citations
#776

LITA: Language Instructed Temporal-Localization Assistant

De-An Huang, Shijia Liao, Subhashree Radhakrishnan et al.

ECCV 2024arXiv:2403.19046
108
citations
#777

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.

ICLR 2024arXiv:2309.11674
108
citations
#778

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou, Kai Chen, Zhili LIU et al.

ECCV 2024arXiv:2403.09572
108
citations
#779

OMG-Seg: Is One Model Good Enough For All Segmentation?

Xiangtai Li, Haobo Yuan, Wei Li et al.

CVPR 2024arXiv:2401.10229
108
citations
#780

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.

ICLR 2024spotlightarXiv:2308.02151
108
citations
#781

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi, Or Patashnik, Andrey Voynov et al.

ECCV 2024arXiv:2403.14602
108
citations
#782

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Raphael Schumann, Wanrong Zhu, Weixi Feng et al.

AAAI 2024paperarXiv:2307.06082
108
citations
#783

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.

AAAI 2024paperarXiv:2307.05722
108
citations
#784

Scaling Laws of Synthetic Images for Model Training ... for Now

Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.

CVPR 2024arXiv:2312.04567
108
citations
#785

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.

CVPR 2024arXiv:2311.14155
108
citations
#786

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.

ECCV 2024arXiv:2409.18964
107
citations
#787

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Yihua Zhang, Pingzhi Li, Junyuan Hong et al.

ICML 2024arXiv:2402.11592
107
citations
#788

Scaling Up Dynamic Human-Scene Interaction Modeling

Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.

CVPR 2024highlightarXiv:2403.08629
107
citations
#789

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Chuofan Ma, Yi Jiang, Jiannan Wu et al.

ECCV 2024arXiv:2404.13013
107
citations
#790

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Nate Gruver, Anuroop Sriram, Andrea Madotto et al.

ICLR 2024arXiv:2402.04379
107
citations
#791

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

Ziyang Luo, Nian Liu, Wangbo Zhao et al.

CVPR 2024arXiv:2311.15011
107
citations
#792

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Jong Ho Park, Jaden Park, Zheyang Xiong et al.

ICML 2024arXiv:2402.04248
107
citations
#793

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho et al.

ICLR 2024spotlightarXiv:2309.07311
107
citations
#794

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong et al.

ICML 2024arXiv:2403.03950
107
citations
#795

RoboDreamer: Learning Compositional World Models for Robot Imagination

Siyuan Zhou, Yilun Du, Jiaben Chen et al.

ICML 2024arXiv:2404.12377
107
citations
#796

SimDA: Simple Diffusion Adapter for Efficient Video Generation

Zhen Xing, Qi Dai, Han Hu et al.

CVPR 2024arXiv:2308.09710
107
citations
#797

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Mendes et al.

ICLR 2024spotlightarXiv:2311.01011
106
citations
#798

Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video

Yanqin Jiang, Li Zhang, Jin Gao et al.

ICLR 2024oralarXiv:2311.02848
106
citations
#799

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg et al.

ECCV 2024arXiv:2403.16292
106
citations
#800

On Prompt-Driven Safeguarding for Large Language Models

Chujie Zheng, Fan Yin, Hao Zhou et al.

ICML 2024arXiv:2401.18018
106
citations