Most Cited 2024 "large-scale graph dataset" Papers

12,324 papers found • Page 2 of 62

#201

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop et al.

ECCV 2024posterarXiv:2404.05719
154
citations
#202

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Mu Cai, Haotian Liu, Siva Mustikovela et al.

CVPR 2024posterarXiv:2312.00784
153
citations
#203

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.

ECCV 2024posterarXiv:2312.03594
152
citations
#204

SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D

Weiyu LI, Rui Chen, Xuelin Chen et al.

ICLR 2024posterarXiv:2310.02596
151
citations
#205

Hypothesis Search: Inductive Reasoning with Language Models

Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.

ICLR 2024posterarXiv:2309.05660
151
citations
#206

Generative End-to-End Autonomous Driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.

ECCV 2024posterarXiv:2402.11502
150
citations
#207

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.

ECCV 2024posterarXiv:2401.01339
149
citations
#208

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Ming Li, Taojiannan Yang, Huafeng Kuang et al.

ECCV 2024posterarXiv:2404.07987
148
citations
#209

Osprey: Pixel Understanding with Visual Instruction Tuning

Yuqian Yuan, Wentong Li, Jian liu et al.

CVPR 2024posterarXiv:2312.10032
147
citations
#210

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians

Yuelang Xu, Benwang Chen, Zhe Li et al.

CVPR 2024poster
147
citations
#211

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.

ICLR 2024posterarXiv:2305.13269
145
citations
#212

Video Language Planning

Yilun Du, Sherry Yang, Pete Florence et al.

ICLR 2024posterarXiv:2310.10625
144
citations
#213

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Yifan Wang, Xingyi He, Sida Peng et al.

CVPR 2024highlightarXiv:2403.04765
142
citations
#214

MMA-Diffusion: MultiModal Attack on Diffusion Models

Yijun Yang, Ruiyuan Gao, Xiaosen Wang et al.

CVPR 2024posterarXiv:2311.17516
141
citations
#215

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum et al.

ICLR 2024oralarXiv:2305.11854
141
citations
#216

Linearity of Relation Decoding in Transformer Language Models

Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.

ICLR 2024spotlightarXiv:2308.09124
140
citations
#217

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.

AAAI 2024paperarXiv:2303.08714
139
citations
#218

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Yuzhou Huang, Liangbin Xie, Xintao Wang et al.

CVPR 2024highlightarXiv:2312.06739
139
citations
#219

Optimal Transport Aggregation for Visual Place Recognition

Sergio Izquierdo, Javier Civera

CVPR 2024posterarXiv:2311.15937
138
citations
#220

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.

ICLR 2024posterarXiv:2309.05444
138
citations
#221

Physics-Based Interaction with 3D Objects via Video Generation

Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.

ECCV 2024posterarXiv:2404.13026
137
citations
#222

Rotary Position Embedding for Vision Transformer

Byeongho Heo, Song Park, Dongyoon Han et al.

ECCV 2024posterarXiv:2403.13298
135
citations
#223

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.

ICLR 2024posterarXiv:2310.12921
133
citations
#224

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.

CVPR 2024posterarXiv:2311.11278
133
citations
#225

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Chunlong Xia, Xinliang Wang, Feng Lv et al.

CVPR 2024highlightarXiv:2403.07392
131
citations
#226

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Zayne Sprague, Xi Ye, Kaj Bostrom et al.

ICLR 2024spotlightarXiv:2310.16049
131
citations
#227

Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.

ICLR 2024posterarXiv:2310.01714
131
citations
#228

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou et al.

ECCV 2024posterarXiv:2402.03246
131
citations
#229

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024paperarXiv:2312.16337
130
citations
#230

Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.

CVPR 2024posterarXiv:2404.08636
130
citations
#231

GART: Gaussian Articulated Template Models

Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.

CVPR 2024highlightarXiv:2311.16099
129
citations
#232

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024posterarXiv:2404.03384
128
citations
#233

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar, Zhenshi Li, Feng Gu et al.

ECCV 2024posterarXiv:2402.02544
127
citations
#234

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Cong Wei, Yang Chen, Haonan Chen et al.

ECCV 2024posterarXiv:2311.17136
127
citations
#235

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun, Yang Han, Zihan Zhao et al.

AAAI 2024paperarXiv:2308.13149
127
citations
#236

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

Xuanchi Ren, Jiahui Huang, Xiaohui Zeng et al.

CVPR 2024highlightarXiv:2312.03806
127
citations
#237

VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.

CVPR 2024posterarXiv:2401.05577
127
citations
#238

Relightable Gaussian Codec Avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon et al.

CVPR 2024posterarXiv:2312.03704
127
citations
#239

GSVA: Generalized Segmentation via Multimodal Large Language Models

Zhuofan Xia, Dongchen Han, Yizeng Han et al.

CVPR 2024posterarXiv:2312.10103
127
citations
#240

SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation

Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.

AAAI 2024paperarXiv:2312.17071
126
citations
#241

NeuRAD: Neural Rendering for Autonomous Driving

Adam Tonderski, Carl Lindström, Georg Hess et al.

CVPR 2024highlightarXiv:2311.15260
126
citations
#242

Adapting Large Language Models via Reading Comprehension

Daixuan Cheng, Shaohan Huang, Furu Wei

ICLR 2024poster
126
citations
#243

Dolphins: Multimodal Language Model for Driving

Yingzi Ma, Yulong Cao, Jiachen Sun et al.

ECCV 2024posterarXiv:2312.00438
126
citations
#244

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li, Zedong Wang, Zicheng Liu et al.

ICLR 2024posterarXiv:2211.03295
125
citations
#245

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024posterarXiv:2404.00308
125
citations
#246

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

Xunpeng Yi, Han Xu, HAO ZHANG et al.

CVPR 2024posterarXiv:2403.16387
123
citations
#247

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Zhenyu Li, Sunqi Fan, Yu Gu et al.

AAAI 2024paperarXiv:2308.12060
122
citations
#248

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.

CVPR 2024highlightarXiv:2403.09630
122
citations
#249

Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs

Shi Liu, Kecheng Zheng, Wei Chen

ECCV 2024posterarXiv:2407.21771
121
citations
#250

AnyText: Multilingual Visual Text Generation and Editing

Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.

ICLR 2024spotlightarXiv:2311.03054
121
citations
#251

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Feng Wang, Jieru Mei, Alan Yuille

ECCV 2024posterarXiv:2312.01597
120
citations
#252

Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.

CVPR 2024posterarXiv:2403.12580
120
citations
#253

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Lirui Wang, Yiyang Ling, Zhecheng Yuan et al.

ICLR 2024spotlightarXiv:2310.01361
120
citations
#254

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.

ECCV 2024posterarXiv:2311.12092
120
citations
#255

Drag Anything: Motion Control for Anything using Entity Representation

Weijia Wu, Zhuang Li, Yuchao Gu et al.

ECCV 2024poster
120
citations
#256

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024posterarXiv:2312.02139
119
citations
#257

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Lingyi Hong, Shilin Yan, Renrui Zhang et al.

CVPR 2024highlightarXiv:2403.09634
118
citations
#258

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

Xuanyu Zhang, Runyi Li, Jiwen Yu et al.

CVPR 2024posterarXiv:2312.08883
118
citations
#259

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

Jinxia Xie, Bineng Zhong, Zhiyi Mo et al.

CVPR 2024poster
118
citations
#260

VideoBooth: Diffusion-based Video Generation with Image Prompts

Yuming Jiang, Tianxing Wu, Shuai Yang et al.

CVPR 2024posterarXiv:2312.00777
118
citations
#261

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Weiyun Wang, Min Shi, Qingyun Li et al.

ICLR 2024posterarXiv:2308.01907
118
citations
#262

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.

ECCV 2024posterarXiv:2404.19759
117
citations
#263

Towards Learning a Generalist Model for Embodied Navigation

Duo Zheng, Shijia Huang, Lin Zhao et al.

CVPR 2024highlightarXiv:2312.02010
117
citations
#264

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Yan-Shuo Liang, Wu-Jun Li

CVPR 2024posterarXiv:2404.00228
117
citations
#265

Teaching Arithmetic to Small Transformers

Nayoung Lee, Kartik Sreenivasan, Jason Lee et al.

ICLR 2024posterarXiv:2307.03381
117
citations
#266

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.

CVPR 2024posterarXiv:2312.06968
116
citations
#267

Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems

Hyungjin Chung, Suhyeon Lee, Jong Chul YE

ICLR 2024posterarXiv:2303.05754
116
citations
#268

InstructIR: High-Quality Image Restoration Following Human Instructions

Marcos Conde, Gregor Geigle, Radu Timofte

ECCV 2024posterarXiv:2401.16468
114
citations
#269

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Hao Zhang, Hongyang Li, Feng Li et al.

ECCV 2024posterarXiv:2312.02949
114
citations
#270

SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow

Yihan Wang, Lahav Lipson, Jia Deng

ECCV 2024posterarXiv:2405.14793
113
citations
#271

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang et al.

ECCV 2024posterarXiv:2402.17766
113
citations
#272

Human Gaussian Splatting: Real-time Rendering of Animatable Avatars

Arthur Moreau, Jifei Song, Helisa Dhamo et al.

CVPR 2024posterarXiv:2311.17113
113
citations
#273

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis

ECCV 2024posterarXiv:2312.00112
113
citations
#274

Implicit Style-Content Separation using B-LoRA

Yarden Frenkel, Yael Vinker, Ariel Shamir et al.

ECCV 2024posterarXiv:2403.14572
113
citations
#275

One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications

Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.

CVPR 2024highlightarXiv:2312.16145
112
citations
#276

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024posterarXiv:2312.03661
112
citations
#277

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue, Jing Zhang, Kun Hu et al.

AAAI 2024paperarXiv:2308.08746
110
citations
#278

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Yingqing He, Shaoshu Yang, Haoxin Chen et al.

ICLR 2024spotlightarXiv:2310.07702
110
citations
#279

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models

Ahmad Faiz, Sotaro Kaneda, Ruhan Wang et al.

ICLR 2024posterarXiv:2309.14393
110
citations
#280

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang, Yuchun Wang, Jie Guo et al.

ECCV 2024posterarXiv:2407.07520
110
citations
#281

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Fengyu Yang, Chao Feng, Ziyang Chen et al.

CVPR 2024posterarXiv:2401.18084
109
citations
#282

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Yazhou Xing, Yingqing He, Zeyue Tian et al.

CVPR 2024posterarXiv:2402.17723
109
citations
#283

VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu et al.

CVPR 2024posterarXiv:2406.11816
109
citations
#284

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327
109
citations
#285

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Dewei Zhou, You Li, Fan Ma et al.

CVPR 2024highlightarXiv:2402.05408
109
citations
#286

Efficient Test-Time Adaptation of Vision-Language Models

Adilbek Karmanov, Dayan Guan, Shijian Lu et al.

CVPR 2024posterarXiv:2403.18293
109
citations
#287

Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang, Chuming Wang, Weitao Wang et al.

ECCV 2024posterarXiv:2403.15704
109
citations
#288

Motion Mamba: Efficient and Long Sequence Motion Generation

Zeyu Zhang, Akide Liu, Ian Reid et al.

ECCV 2024posterarXiv:2403.07487
108
citations
#289

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024paperarXiv:2311.13314
108
citations
#290

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi et al.

CVPR 2024posterarXiv:2312.13913
108
citations
#291

Universal Jailbreak Backdoors from Poisoned Human Feedback

Javier Rando, Florian Tramer

ICLR 2024posterarXiv:2311.14455
108
citations
#292

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.

ICLR 2024posterarXiv:2305.15086
107
citations
#293

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Chuofan Ma, Yi Jiang, Jiannan Wu et al.

ECCV 2024posterarXiv:2404.13013
107
citations
#294

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.

ECCV 2024posterarXiv:2403.08321
106
citations
#295

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

Jiahui Zhang, Fangneng Zhan, MUYU XU et al.

CVPR 2024posterarXiv:2403.06908
106
citations
#296

SimDA: Simple Diffusion Adapter for Efficient Video Generation

Zhen Xing, Qi Dai, Han Hu et al.

CVPR 2024posterarXiv:2308.09710
106
citations
#297

OMG-Seg: Is One Model Good Enough For All Segmentation?

Xiangtai Li, Haobo Yuan, Wei Li et al.

CVPR 2024posterarXiv:2401.10229
106
citations
#298

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024posterarXiv:2311.18159
106
citations
#299

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.

AAAI 2024paperarXiv:2307.05722
105
citations
#300

GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.

CVPR 2024posterarXiv:2311.14155
105
citations
#301

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.

ICLR 2024posterarXiv:2309.11674
105
citations
#302

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi, Or Patashnik, Andrey Voynov et al.

ECCV 2024posterarXiv:2403.14602
105
citations
#303

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

Xiaofan Li, Zhizhong Zhang, Xin Tan et al.

CVPR 2024posterarXiv:2404.05231
104
citations
#304

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Size Wu, Wenwei Zhang, Lumin Xu et al.

ICLR 2024spotlightarXiv:2310.01403
104
citations
#305

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.

ICLR 2024spotlightarXiv:2308.02151
104
citations
#306

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Jingye Chen, Yupan Huang, Tengchao Lv et al.

ECCV 2024posterarXiv:2311.16465
104
citations
#307

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.

ECCV 2024posterarXiv:2409.18964
104
citations
#308

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu, Chenhang Cui, Zijun Wang et al.

ECCV 2024posterarXiv:2311.16101
103
citations
#309

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Jihan Yang, Runyu Ding, Weipeng DENG et al.

CVPR 2024posterarXiv:2304.00962
103
citations
#310

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Suhas Kotha, Jacob Springer, Aditi Raghunathan

ICLR 2024posterarXiv:2309.10105
103
citations
#311

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024posterarXiv:2402.05892
103
citations
#312

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Longtao Zheng, Rundong Wang, Xinrun Wang et al.

ICLR 2024posterarXiv:2306.07863
103
citations
#313

TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning

jiexi Liu, Songcan Chen

AAAI 2024paperarXiv:2312.15709
102
citations
#314

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg et al.

ECCV 2024posterarXiv:2403.16292
102
citations
#315

PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

Tianchen Deng, Guole Shen, Tong Qin et al.

CVPR 2024posterarXiv:2312.09866
101
citations
#316

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

Wenjing Wang, Huan Yang, Jianlong Fu et al.

CVPR 2024posterarXiv:2403.12933
101
citations
#317

MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Shitao Tang, Jiacheng Chen, Dilin Wang et al.

ECCV 2024poster
100
citations
#318

Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

AAAI 2024paperarXiv:2309.05305
100
citations
#319

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
100
citations
#320

Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

ZHIXIANG WEI, Lin Chen, Xiaoxiao Ma et al.

CVPR 2024posterarXiv:2312.04265
100
citations
#321

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Zeqi Xiao, Tai Wang, Jingbo Wang et al.

ICLR 2024spotlightarXiv:2309.07918
100
citations
#322

An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.

AAAI 2024paperarXiv:2312.10325
99
citations
#323

3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis

Zhicheng Lu, xiang guo, Le Hui et al.

CVPR 2024posterarXiv:2404.06270
99
citations
#324

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

MohammadReza Davari, Eugene Belilovsky

ECCV 2024posterarXiv:2312.06795
98
citations
#325

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan, Xianda Guo, Zheng Zhu

ECCV 2024posterarXiv:2303.05021
98
citations
#326

Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation

Yutong Liu, Haijiang Zhu, Mengting Liu et al.

AAAI 2024paper
98
citations
#327

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.

CVPR 2024posterarXiv:2404.05225
98
citations
#328

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Wenxuan Zhou, Sheng Zhang, Yu Gu et al.

ICLR 2024posterarXiv:2308.03279
98
citations
#329

DOCCI: Descriptions of Connected and Contrasting Images

Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.

ECCV 2024posterarXiv:2404.19753
98
citations
#330

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu et al.

ECCV 2024posterarXiv:2407.08939
98
citations
#331

UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation

Kefu Yi, Kai Luo, Xiaolei Luo et al.

AAAI 2024paperarXiv:2312.08952
97
citations
#332

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Seokju Yun, Youngmin Ro

CVPR 2024posterarXiv:2401.16456
97
citations
#333

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Nikhil Prakash, Tamar Shaham, Tal Haklay et al.

ICLR 2024posterarXiv:2402.14811
97
citations
#334

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Zheng Zhang, WENBO HU, Yixing Lao et al.

ECCV 2024posterarXiv:2403.15530
96
citations
#335

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
96
citations
#336

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.

CVPR 2024posterarXiv:2311.17009
96
citations
#337

HIPTrack: Visual Tracking with Historical Prompts

Wenrui Cai, Qingjie Liu, Yunhong Wang

CVPR 2024posterarXiv:2311.02072
96
citations
#338

Single-Model and Any-Modality for Video Object Tracking

Zongwei Wu, Jilai Zheng, Xiangxuan Ren et al.

CVPR 2024posterarXiv:2311.15851
96
citations
#339

GARField: Group Anything with Radiance Fields

Chung Min Kim, Mingxuan Wu, Justin Kerr et al.

CVPR 2024posterarXiv:2401.09419
96
citations
#340

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.

ICLR 2024posterarXiv:2309.09668
96
citations
#341

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

Huanran Chen, Yichi Zhang, Yinpeng Dong et al.

ICLR 2024posterarXiv:2303.09105
96
citations
#342

VISA: Reasoning Video Object Segmentation via Large Language Model

Cilin Yan, haochen wang, Shilin Yan et al.

ECCV 2024posterarXiv:2407.11325
95
citations
#343

Self-correcting LLM-controlled Diffusion Models

Tsung-Han Wu, Long Lian, Joseph Gonzalez et al.

CVPR 2024posterarXiv:2311.16090
95
citations
#344

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.

ECCV 2024posterarXiv:2405.17429
95
citations
#345

Revising Densification in Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ECCV 2024posterarXiv:2404.06109
95
citations
#346

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Yifan Li, hangyu guo, Kun Zhou et al.

ECCV 2024posterarXiv:2403.09792
95
citations
#347

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
94
citations
#348

HyperAttention: Long-context Attention in Near-Linear Time

Insu Han, Rajesh Jayaram, Amin Karbasi et al.

ICLR 2024posterarXiv:2310.05869
94
citations
#349

Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely et al.

CVPR 2024posterarXiv:2309.07906
93
citations
#350

Noise-free Score Distillation

Oren Katzir, Or Patashnik, Daniel Cohen-Or et al.

ICLR 2024posterarXiv:2310.17590
93
citations
#351

Towards Open-ended Visual Quality Comparison

Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.

ECCV 2024posterarXiv:2402.16641
93
citations
#352

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.

ECCV 2024posterarXiv:2403.14939
92
citations
#353

8976 PointAttN: You Only Need Attention for Point Cloud Completion

Jun Wang, Ying Cui, Dongyan Guo et al.

AAAI 2024paper
92
citations
#354

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024paperarXiv:2305.13921
92
citations
#355

Decoding Natural Images from EEG for Object Recognition

Yonghao Song, Bingchuan Liu, Xiang Li et al.

ICLR 2024oralarXiv:2308.13234
92
citations
#356

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

Lunjun Zhang, Yuwen Xiong, Ze Yang et al.

ICLR 2024posterarXiv:2311.01017
92
citations
#357

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

Liting Lin, Heng Fan, Zhipeng Zhang et al.

ECCV 2024posterarXiv:2403.05231
92
citations
#358

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You, Zheyuan Li, Jinjin Gu et al.

ECCV 2024posterarXiv:2312.08962
92
citations
#359

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Taylor Sorensen, Liwei Jiang, Jena Hwang et al.

AAAI 2024paperarXiv:2309.00779
91
citations
#360

Consistency-guided Prompt Learning for Vision-Language Models

Shuvendu Roy, Ali Etemad

ICLR 2024posterarXiv:2306.01195
91
citations
#361

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Yiding Lu, Yijie Lin, Mouxing Yang et al.

AAAI 2024paperarXiv:2308.11164
90
citations
#362

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Bin Xie, Jiale Cao, Jin Xie et al.

CVPR 2024posterarXiv:2311.15537
90
citations
#363

At Which Training Stage Does Code Data Help LLMs Reasoning?

ma yingwei, Yue Liu, Yue Yu et al.

ICLR 2024spotlightarXiv:2309.16298
90
citations
#364

Brain decoding: toward real-time reconstruction of visual perception

Yohann Benchetrit, Hubert Banville, Jean-Remi King

ICLR 2024oralarXiv:2310.19812
90
citations
#365

CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.

ECCV 2024posterarXiv:2405.12110
90
citations
#366

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Haoran Wei, Lingyu Kong, Jinyue Chen et al.

ECCV 2024posterarXiv:2312.06109
89
citations
#367

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Chong Mou, Xintao Wang, Jiechong Song et al.

CVPR 2024posterarXiv:2402.02583
89
citations
#368

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

CVPR 2024posterarXiv:2312.10656
89
citations
#369

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.

CVPR 2024posterarXiv:2312.12470
89
citations
#370

Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields

Leili Goli, Cody Reading, Silvia Sellán et al.

CVPR 2024highlightarXiv:2309.03185
89
citations
#371

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.

ICLR 2024posterarXiv:2311.12786
89
citations
#372

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024paperarXiv:2402.16897
88
citations
#373

GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering

Abdullah J Hamdi, Luke Melas-Kyriazi, Jinjie Mai et al.

CVPR 2024posterarXiv:2402.10128
88
citations
#374

Boosting Adversarial Transferability by Block Shuffle and Rotation

Kunyu Wang, he xuanran, Wenxuan Wang et al.

CVPR 2024posterarXiv:2308.10299
88
citations
#375

Improved sampling via learned diffusions

Lorenz Richter, Julius Berner

ICLR 2024posterarXiv:2307.01198
88
citations
#376

Training Socially Aligned Language Models on Simulated Social Interactions

Ruibo Liu, Ruixin Yang, Chenyan Jia et al.

ICLR 2024posterarXiv:2305.16960
88
citations
#377

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Sewon Min, Suchin Gururangan, Eric Wallace et al.

ICLR 2024spotlightarXiv:2308.04430
87
citations
#378

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305
86
citations
#379

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024posterarXiv:2402.19474
86
citations
#380

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Jonas Ricker, Denis Lukovnikov, Asja Fischer

CVPR 2024posterarXiv:2401.17879
85
citations
#381

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.

CVPR 2024posterarXiv:2405.14881
85
citations
#382

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

Kyle Sargent, Zizhang Li, Tanmay Shah et al.

CVPR 2024posterarXiv:2310.17994
85
citations
#383

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.

ICLR 2024posterarXiv:2306.09296
85
citations
#384

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Jingfeng Wu, Difan Zou, Zixiang Chen et al.

ICLR 2024spotlightarXiv:2310.08391
85
citations
#385

Finetuning Text-to-Image Diffusion Models for Fairness

Xudong Shen, Chao Du, Tianyu Pang et al.

ICLR 2024posterarXiv:2311.07604
85
citations
#386

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
84
citations
#387

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.

CVPR 2024posterarXiv:2311.17049
84
citations
#388

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024posterarXiv:2403.16182
84
citations
#389

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.

CVPR 2024posterarXiv:2406.11824
84
citations
#390

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Michael Zhang, Kush Bhatia, Hermann Kumbong et al.

ICLR 2024posterarXiv:2402.04347
84
citations
#391

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.

ECCV 2024posterarXiv:2310.10123
83
citations
#392

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2024posterarXiv:2404.01547
83
citations
#393

TUMTraf V2X Cooperative Perception Dataset

Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.

CVPR 2024posterarXiv:2403.01316
83
citations
#394

Human Feedback is not Gold Standard

Tom Hosking, Phil Blunsom, Max Bartolo

ICLR 2024posterarXiv:2309.16349
83
citations
#395

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

Yuxin Wen, Yuchen Liu, Chen Chen et al.

ICLR 2024posterarXiv:2407.21720
83
citations
#396

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng et al.

ECCV 2024posterarXiv:2403.14610
83
citations
#397

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08733
82
citations
#398

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553
82
citations
#399

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024posterarXiv:2309.14611
82
citations
#400

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024posterarXiv:2402.13250
82
citations