Most Cited 2024 "footprint recognition" Papers

12,324 papers found • Page 5 of 62

#801

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis et al.

CVPR 2024arXiv:2312.10671
106
citations
#802

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg et al.

ECCV 2024arXiv:2403.16292
106
citations
#803

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

MohammadReza Davari, Eugene Belilovsky

ECCV 2024arXiv:2312.06795
106
citations
#804

On Prompt-Driven Safeguarding for Large Language Models

Chujie Zheng, Fan Yin, Hao Zhou et al.

ICML 2024arXiv:2401.18018
106
citations
#805

HIPTrack: Visual Tracking with Historical Prompts

Wenrui Cai, Qingjie Liu, Yunhong Wang

CVPR 2024arXiv:2311.02072
106
citations
#806

Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video

Yanqin Jiang, Li Zhang, Jin Gao et al.

ICLR 2024oralarXiv:2311.02848
106
citations
#807

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024arXiv:2402.05892
106
citations
#808

SNI-SLAM: Semantic Neural Implicit SLAM

Siting Zhu, Guangming Wang, Hermann Blum et al.

CVPR 2024arXiv:2311.11016
105
citations
#809

HIFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo

ICLR 2024arXiv:2305.18766
105
citations
#810

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu, Chenhang Cui, Zijun Wang et al.

ECCV 2024arXiv:2311.16101
105
citations
#811

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

Yuchen Zhuang, Xiang Chen, Tong Yu et al.

ICLR 2024arXiv:2310.13227
105
citations
#812

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets

Lifan Yuan, Yangyi Chen, Xingyao Wang et al.

ICLR 2024arXiv:2309.17428
105
citations
#813

TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning

jiexi Liu, Songcan Chen

AAAI 2024paperarXiv:2312.15709
105
citations
#814

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang et al.

CVPR 2024highlightarXiv:2311.17061
105
citations
#815

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Yu Wang, Xiaogeng Liu, Yu Li et al.

ECCV 2024arXiv:2403.09513
105
citations
#816

Conformal Language Modeling

Victor Quach, Adam Fisch, Tal Schuster et al.

ICLR 2024arXiv:2306.10193
105
citations
#817

PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

Tianchen Deng, Guole Shen, Tong Qin et al.

CVPR 2024arXiv:2312.09866
105
citations
#818

Large Language Models as Generalizable Policies for Embodied Tasks

Andrew Szot, Max Schwarzer, Harsh Agrawal et al.

ICLR 2024arXiv:2310.17722
105
citations
#819

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
105
citations
#820

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.

CVPR 2024arXiv:2303.14207
104
citations
#821

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

Haokai Pang, Heming Zhu, Adam Kortylewski et al.

CVPR 2024arXiv:2312.05941
104
citations
#822

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

Iman Mirzadeh, Keivan Alizadeh-Vahid, Sachin Mehta et al.

ICLR 2024arXiv:2310.04564
104
citations
#823

Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

Caoyun Fan, Jindou Chen, Yaohui Jin et al.

AAAI 2024paperarXiv:2312.05488
104
citations
#824

Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

AAAI 2024paperarXiv:2309.05305
104
citations
#825

An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.

AAAI 2024paperarXiv:2312.10325
104
citations
#826

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Long Qian, Juncheng Li, Yu Wu et al.

ICML 2024oralarXiv:2402.11435
104
citations
#827

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Jihan Yang, Runyu Ding, Weipeng DENG et al.

CVPR 2024arXiv:2304.00962
104
citations
#828

Knowledge Fusion of Large Language Models

Fanqi Wan, Xinting Huang, Deng Cai et al.

ICLR 2024arXiv:2401.10491
104
citations
#829

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li, Tianle Cai, Jiaxin Cao et al.

CVPR 2024highlightarXiv:2402.19481
104
citations
#830

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov et al.

CVPR 2024highlightarXiv:2402.14797
103
citations
#831

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Dawei Zhu, Nan Yang, Liang Wang et al.

ICLR 2024arXiv:2309.10400
103
citations
#832

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.

CVPR 2024highlightarXiv:2312.03596
103
citations
#833

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

ICML 2024arXiv:2403.00409
103
citations
#834

Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

ZHIXIANG WEI, Lin Chen, Xiaoxiao Ma et al.

CVPR 2024arXiv:2312.04265
103
citations
#835

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.

ICML 2024arXiv:2402.08567
103
citations
#836

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Yangyi Chen, Karan Sikka, Michael Cogswell et al.

CVPR 2024arXiv:2311.10081
103
citations
#837

Restoring Images in Adverse Weather Conditions via Histogram Transformer

Shangquan Sun, Wenqi Ren, Xinwei Gao et al.

ECCV 2024arXiv:2407.10172
103
citations
#838

Rethinking Inductive Biases for Surface Normal Estimation

Gwangbin Bae, Andrew J. Davison

CVPR 2024arXiv:2403.00712
103
citations
#839

Universal Humanoid Motion Representations for Physics-Based Control

Zhengyi Luo, Jinkun Cao, Josh Merel et al.

ICLR 2024spotlightarXiv:2310.04582
102
citations
#840

GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Alexander Havrilla, Sharath Chandra Raparthy, Christoforos Nalmpantis et al.

ICML 2024arXiv:2402.10963
102
citations
#841

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Yang Qin, Yingke Chen, Dezhong Peng et al.

CVPR 2024arXiv:2308.09911
102
citations
#842

Vision-by-Language for Training-Free Compositional Image Retrieval

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.

ICLR 2024arXiv:2310.09291
102
citations
#843

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Seokju Yun, Youngmin Ro

CVPR 2024arXiv:2401.16456
102
citations
#844

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Wenxuan Zhou, Sheng Zhang, Yu Gu et al.

ICLR 2024arXiv:2308.03279
102
citations
#845

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

Haoyu Lu, Guoxing Yang, Nanyi Fei et al.

ICLR 2024oralarXiv:2305.13311
102
citations
#846

How Transformers Learn Causal Structure with Gradient Descent

Eshaan Nichani, Alex Damian, Jason Lee

ICML 2024arXiv:2402.14735
102
citations
#847

BadEdit: Backdooring Large Language Models by Model Editing

Yanzhou Li, Tianlin Li, Kangjie Chen et al.

ICLR 2024arXiv:2403.13355
102
citations
#848

3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis

Zhicheng Lu, xiang guo, Le Hui et al.

CVPR 2024arXiv:2404.06270
102
citations
#849

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Sicheng Mo, Fangzhou Mu, Kuan Heng Lin et al.

CVPR 2024arXiv:2312.07536
102
citations
#850

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu et al.

ECCV 2024arXiv:2407.08939
102
citations
#851

The Expressive Power of Low-Rank Adaptation

Yuchen Zeng, Kangwook Lee

ICLR 2024arXiv:2310.17513
101
citations
#852

Tag2Text: Guiding Vision-Language Model via Image Tagging

Xinyu Huang, Youcai Zhang, Jinyu Ma et al.

ICLR 2024arXiv:2303.05657
101
citations
#853

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Xin Huang, Ruizhi Shao, Qi Zhang et al.

CVPR 2024arXiv:2310.01406
101
citations
#854

Self-correcting LLM-controlled Diffusion Models

Tsung-Han Wu, Long Lian, Joseph Gonzalez et al.

CVPR 2024arXiv:2311.16090
101
citations
#855

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Zeqi Xiao, Tai Wang, Jingbo Wang et al.

ICLR 2024spotlightarXiv:2309.07918
101
citations
#856

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

Bairu Hou, Yujian Liu, Kaizhi Qian et al.

ICML 2024arXiv:2311.08718
101
citations
#857

Language Model Cascades: Token-Level Uncertainty And Beyond

Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum et al.

ICLR 2024arXiv:2404.10136
101
citations
#858

UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation

Kefu Yi, Kai Luo, Xiaolei Luo et al.

AAAI 2024paperarXiv:2312.08952
101
citations
#859

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.

ECCV 2024arXiv:2403.14939
101
citations
#860

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Yifan Li, hangyu guo, Kun Zhou et al.

ECCV 2024arXiv:2403.09792
101
citations
#861

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.

CVPR 2024arXiv:2311.17009
100
citations
#862

Proactive Detection of Voice Cloning with Localized Watermarking

Robin San Roman, Pierre Fernandez, Hady Elsahar et al.

ICML 2024arXiv:2401.17264
100
citations
#863

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlightarXiv:2312.17655
100
citations
#864

Single-Model and Any-Modality for Video Object Tracking

Zongwei Wu, Jilai Zheng, Xiangxuan Ren et al.

CVPR 2024arXiv:2311.15851
100
citations
#865

MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Shitao Tang, Jiacheng Chen, Dilin Wang et al.

ECCV 2024
100
citations
#866

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

Yifan Bai, Zeyang Zhao, Yihong Gong et al.

CVPR 2024arXiv:2312.17133
100
citations
#867

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Ziqiao Peng, Wentao Hu, Yue Shi et al.

CVPR 2024arXiv:2311.17590
100
citations
#868

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Zhen Xu, Sida Peng, Haotong Lin et al.

CVPR 2024arXiv:2310.11448
100
citations
#869

DOCCI: Descriptions of Connected and Contrasting Images

Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.

ECCV 2024arXiv:2404.19753
100
citations
#870

Circuit Component Reuse Across Tasks in Transformer Language Models

Jack Merullo, Carsten Eickhoff, Ellie Pavlick

ICLR 2024spotlightarXiv:2310.08744
99
citations
#871

Towards image compression with perfect realism at ultra-low bitrates

Marlene Careil, Matthew J Muckley, Jakob Verbeek et al.

ICLR 2024arXiv:2310.10325
99
citations
#872

VISA: Reasoning Video Object Segmentation via Large Language Model

Cilin Yan, haochen wang, Shilin Yan et al.

ECCV 2024arXiv:2407.11325
99
citations
#873

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar et al.

ICML 2024arXiv:2308.10379
99
citations
#874

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan, Xianda Guo, Zheng Zhu

ECCV 2024arXiv:2303.05021
99
citations
#875

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.

ECCV 2024arXiv:2405.17429
99
citations
#876

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Nikhil Prakash, Tamar Shaham, Tal Haklay et al.

ICLR 2024arXiv:2402.14811
99
citations
#877

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

ICLR 2024arXiv:2312.08358
99
citations
#878

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Haoyi Jiang, Tianheng Cheng, Naiyu Gao et al.

CVPR 2024arXiv:2306.15670
99
citations
#879

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Zheng Zhang, WENBO HU, Yixing Lao et al.

ECCV 2024arXiv:2403.15530
99
citations
#880

Decoding Natural Images from EEG for Object Recognition

Yonghao Song, Bingchuan Liu, Xiang Li et al.

ICLR 2024oralarXiv:2308.13234
99
citations
#881

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

Yongchan Kwon, Eric Wu, Kevin Wu et al.

ICLR 2024arXiv:2310.00902
99
citations
#882

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Weiyang Liu, Zeju Qiu, Yao Feng et al.

ICLR 2024arXiv:2311.06243
98
citations
#883

ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles

Jiawei Zhang, Chejian Xu, Bo Li

CVPR 2024arXiv:2405.14062
98
citations
#884

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.

ICLR 2024arXiv:2309.09668
98
citations
#885

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Siyuan Liang, Mingli Zhu, Aishan Liu et al.

CVPR 2024highlightarXiv:2311.12075
98
citations
#886

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

Xueyu Hu, Ziyu Zhao, Shuang Wei et al.

ICML 2024arXiv:2401.05507
98
citations
#887

GARField: Group Anything with Radiance Fields

Chung Min Kim, Mingxuan Wu, Justin Kerr et al.

CVPR 2024arXiv:2401.09419
98
citations
#888

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

Huanran Chen, Yichi Zhang, Yinpeng Dong et al.

ICLR 2024arXiv:2303.09105
98
citations
#889

FoundPose: Unseen Object Pose Estimation with Foundation Features

Evin Pınar Örnek, Yann Labbé, Bugra Tekin et al.

ECCV 2024arXiv:2311.18809
98
citations
#890

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
98
citations
#891

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh, Jarrid Rector-Brooks, Joey Bose et al.

ICML 2024arXiv:2402.06121
98
citations
#892

HyperAttention: Long-context Attention in Near-Linear Time

Insu Han, Rajesh Jayaram, Amin Karbasi et al.

ICLR 2024arXiv:2310.05869
98
citations
#893

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Xichen Pan, Li Dong, Shaohan Huang et al.

ICLR 2024arXiv:2310.02992
98
citations
#894

Revising Densification in Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ECCV 2024arXiv:2404.06109
98
citations
#895

Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation

Yutong Liu, Haijiang Zhu, Mengting Liu et al.

AAAI 2024paper
98
citations
#896

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

Liting Lin, Heng Fan, Zhipeng Zhang et al.

ECCV 2024arXiv:2403.05231
97
citations
#897

LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Duolikun Danier, Fan Zhang, David Bull

AAAI 2024paperarXiv:2303.09508
97
citations
#898

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Paul Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva et al.

ICML 2024arXiv:2403.11207
97
citations
#899

Boosting Adversarial Transferability by Block Shuffle and Rotation

Kunyu Wang, he xuanran, Wenxuan Wang et al.

CVPR 2024arXiv:2308.10299
97
citations
#900

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.

ICLR 2024arXiv:2311.12786
97
citations
#901

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Bin Xie, Jiale Cao, Jin Xie et al.

CVPR 2024arXiv:2311.15537
97
citations
#902

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley et al.

ICLR 2024spotlightarXiv:2310.17347
96
citations
#903

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024paperarXiv:2402.16897
96
citations
#904

Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely et al.

CVPR 2024arXiv:2309.07906
96
citations
#905

A Semantic Invariant Robust Watermark for Large Language Models

Aiwei Liu, Leyi Pan, Xuming Hu et al.

ICLR 2024arXiv:2310.06356
96
citations
#906

Explicit Visual Prompts for Visual Object Tracking

Liangtao Shi, Bineng Zhong, Qihua Liang et al.

AAAI 2024paperarXiv:2401.03142
96
citations
#907

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Peng Sun, Bei Shi, Daiwei Yu et al.

CVPR 2024arXiv:2312.03526
96
citations
#908

Flora: Low-Rank Adapters Are Secretly Gradient Compressors

Yongchang Hao, Yanshuai Cao, Lili Mou

ICML 2024arXiv:2402.03293
96
citations
#909

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai et al.

ECCV 2024arXiv:2311.17717
96
citations
#910

ARGS: Alignment as Reward-Guided Search

Maxim Khanov, Jirayu Burapacheep, Yixuan Li

ICLR 2024arXiv:2402.01694
96
citations
#911

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong et al.

ICML 2024arXiv:2402.07207
96
citations
#912

Preserving Fairness Generalization in Deepfake Detection

Li Lin, Xinan He, Yan Ju et al.

CVPR 2024arXiv:2402.17229
96
citations
#913

Noise-free Score Distillation

Oren Katzir, Or Patashnik, Daniel Cohen-Or et al.

ICLR 2024arXiv:2310.17590
96
citations
#914

Label-free Node Classification on Graphs with Large Language Models (LLMs)

Zhikai Chen, Haitao Mao, Hongzhi Wen et al.

ICLR 2024arXiv:2310.04668
95
citations
#915

GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Jing Wu, Jiawang Bian, Xinghui Li et al.

ECCV 2024arXiv:2403.08733
95
citations
#916

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You, Zheyuan Li, Jinjin Gu et al.

ECCV 2024arXiv:2312.08962
95
citations
#917

Diffusion Language Models Are Versatile Protein Learners

Xinyou Wang, Zaixiang Zheng, Fei YE et al.

ICML 2024arXiv:2402.18567
95
citations
#918

MiniLLM: Knowledge Distillation of Large Language Models

Yuxian Gu, Li Dong, Furu Wei et al.

ICLR 2024arXiv:2306.08543
95
citations
#919

Towards Open-ended Visual Quality Comparison

Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.

ECCV 2024arXiv:2402.16641
95
citations
#920

At Which Training Stage Does Code Data Help LLMs Reasoning?

ma yingwei, Yue Liu, Yue Yu et al.

ICLR 2024spotlightarXiv:2309.16298
95
citations
#921

SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.

CVPR 2024arXiv:2403.16002
95
citations
#922

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.

ICML 2024arXiv:2403.09636
94
citations
#923

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

Lunjun Zhang, Yuwen Xiong, Ze Yang et al.

ICLR 2024arXiv:2311.01017
94
citations
#924

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

Xinyu Tang, Richard Shin, Huseyin Inan et al.

ICLR 2024arXiv:2309.11765
94
citations
#925

DsDm: Model-Aware Dataset Selection with Datamodels

Logan Engstrom

ICML 2024spotlightarXiv:2401.12926
94
citations
#926

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Taylor Sorensen, Liwei Jiang, Jena Hwang et al.

AAAI 2024paperarXiv:2309.00779
93
citations
#927

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation

Thuan Nguyen, Anh Tran

CVPR 2024arXiv:2312.05239
93
citations
#928

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024paperarXiv:2305.13921
93
citations
#929

Brain decoding: toward real-time reconstruction of visual perception

Yohann Benchetrit, Hubert Banville, Jean-Remi King

ICLR 2024oralarXiv:2310.19812
93
citations
#930

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Stephanie Fu, Mark Hamilton, Laura E. Brandt et al.

ICLR 2024arXiv:2403.10516
93
citations
#931

VidToMe: Video Token Merging for Zero-Shot Video Editing

Xirui Li, Chao Ma, Xiaokang Yang et al.

CVPR 2024arXiv:2312.10656
93
citations
#932

Improved sampling via learned diffusions

Lorenz Richter, Julius Berner

ICLR 2024arXiv:2307.01198
93
citations
#933

Consistency-guided Prompt Learning for Vision-Language Models

Shuvendu Roy, Ali Etemad

ICLR 2024arXiv:2306.01195
93
citations
#934

Residual Denoising Diffusion Models

Jiawei Liu, Qiang Wang, Huijie Fan et al.

CVPR 2024arXiv:2308.13712
93
citations
#935

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.

CVPR 2024highlightarXiv:2404.09465
93
citations
#936

Large Language Models are Geographically Biased

Rohin Manvi, Samar Khanna, Marshall Burke et al.

ICML 2024oralarXiv:2402.02680
93
citations
#937

Position: Levels of AGI for Operationalizing Progress on the Path to AGI

Meredith Morris, Jascha Sohl-Dickstein, Noah Fiedel et al.

ICML 2024spotlightarXiv:2311.02462
93
citations
#938

Unbiased Watermark for Large Language Models

Zhengmian Hu, Lichang Chen, Xidong Wu et al.

ICLR 2024spotlightarXiv:2310.10669
93
citations
#939

Diffusion Models Without Attention

Jing Nathan Yan, Jiatao Gu, Alexander Rush

CVPR 2024arXiv:2311.18257
93
citations
#940

An Extensible Framework for Open Heterogeneous Collaborative Perception

Yifan Lu, Yue Hu, Yiqi Zhong et al.

ICLR 2024arXiv:2401.13964
92
citations
#941

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Zhongkai Hao, Chang Su, LIU SONGMING et al.

ICML 2024arXiv:2403.03542
92
citations
#942

Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields

Leili Goli, Cody Reading, Silvia Sellán et al.

CVPR 2024highlightarXiv:2309.03185
92
citations
#943

GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering

Abdullah J Hamdi, Luke Melas-Kyriazi, Jinjie Mai et al.

CVPR 2024arXiv:2402.10128
92
citations
#944

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Yiding Lu, Yijie Lin, Mouxing Yang et al.

AAAI 2024paperarXiv:2308.11164
92
citations
#945

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

Jiawen Zhu, Guansong Pang

CVPR 2024arXiv:2403.06495
92
citations
#946

8976 PointAttN: You Only Need Attention for Point Cloud Completion

Jun Wang, Ying Cui, Dongyan Guo et al.

AAAI 2024paper
92
citations
#947

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.

ICML 2024arXiv:2402.05602
92
citations
#948

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.

ICML 2024arXiv:2405.07813
92
citations
#949

Bayesian Low-rank Adaptation for Large Language Models

Adam Yang, Maxime Robeyns, Xi Wang et al.

ICLR 2024arXiv:2308.13111
92
citations
#950

Online Speculative Decoding

Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.

ICML 2024arXiv:2310.07177
92
citations
#951

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.

CVPR 2024arXiv:2312.12470
92
citations
#952

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Peng Qi, Zehong Yan, Wynne Hsu et al.

CVPR 2024arXiv:2403.03170
91
citations
#953

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.

CVPR 2024highlightarXiv:2312.04524
91
citations
#954

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat NGUYEN, Nesryne Mejri, Inder Pal Singh et al.

CVPR 2024arXiv:2401.13856
91
citations
#955

CapsFusion: Rethinking Image-Text Data at Scale

Qiying Yu, Quan Sun, Xiaosong Zhang et al.

CVPR 2024arXiv:2310.20550
91
citations
#956

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Michael Zhang, Kush Bhatia, Hermann Kumbong et al.

ICLR 2024arXiv:2402.04347
91
citations
#957

CrossKD: Cross-Head Knowledge Distillation for Object Detection

JiaBao Wang, yuming chen, Zhaohui Zheng et al.

CVPR 2024arXiv:2306.11369
91
citations
#958

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

Linwei Chen, Lin Gu, Dezhi Zheng et al.

CVPR 2024highlightarXiv:2403.05369
91
citations
#959

CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

Aditya Bhatt, Daniel Palenicek, Boris Belousov et al.

ICLR 2024spotlightarXiv:1902.05605
91
citations
#960

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Chong Mou, Xintao Wang, Jiechong Song et al.

CVPR 2024arXiv:2402.02583
91
citations
#961

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Rohin Manvi, Samar Khanna, Gengchen Mai et al.

ICLR 2024arXiv:2310.06213
91
citations
#962

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Ilyass Hammouamri, Ismail Khalfaoui Hassani, Timothée Masquelier

ICLR 2024oralarXiv:2306.17670
91
citations
#963

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

Martin Klissarov, Pierluca D'Oro, Shagun Sodhani et al.

ICLR 2024arXiv:2310.00166
91
citations
#964

Training Socially Aligned Language Models on Simulated Social Interactions

Ruibo Liu, Ruixin Yang, Chenyan Jia et al.

ICLR 2024arXiv:2305.16960
91
citations
#965

CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.

ECCV 2024arXiv:2405.12110
91
citations
#966

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek, Florian Bordes, Pietro Astolfi et al.

CVPR 2024arXiv:2312.08578
91
citations
#967

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Haobo Yuan, Xiangtai Li, Chong Zhou et al.

ECCV 2024arXiv:2401.02955
91
citations
#968

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.

ICLR 2024arXiv:2401.10480
90
citations
#969

Turning large language models into cognitive models

Marcel Binz, Eric Schulz

ICLR 2024arXiv:2306.03917
90
citations
#970

SparQ Attention: Bandwidth-Efficient LLM Inference

Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley et al.

ICML 2024arXiv:2312.04985
90
citations
#971

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Zhenhui Ye, Tianyun Zhong, Yi Ren et al.

ICLR 2024spotlightarXiv:2401.08503
90
citations
#972

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Tianhao Qi, Shancheng Fang, Yanze Wu et al.

CVPR 2024highlightarXiv:2403.06951
90
citations
#973

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.

CVPR 2024arXiv:2405.14881
90
citations
#974

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

Juncheng Li, Kaihang Pan, Zhiqi Ge et al.

ICLR 2024spotlightarXiv:2308.04152
90
citations
#975

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.

CVPR 2024arXiv:2311.17049
90
citations
#976

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.

CVPR 2024arXiv:2312.04746
89
citations
#977

Controllable Human-Object Interaction Synthesis

Jiaman Li, Alexander Clegg, Roozbeh Mottaghi et al.

ECCV 2024arXiv:2312.03913
89
citations
#978

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

CVPR 2024arXiv:2312.00845
89
citations
#979

DemoFusion: Democratising High-Resolution Image Generation With No $$$

Ruoyi DU, Dongliang Chang, Timothy Hospedales et al.

CVPR 2024arXiv:2311.16973
89
citations
#980

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Jonas Ricker, Denis Lukovnikov, Asja Fischer

CVPR 2024arXiv:2401.17879
89
citations
#981

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Haoran Wei, Lingyu Kong, Jinyue Chen et al.

ECCV 2024arXiv:2312.06109
89
citations
#982

TUMTraf V2X Cooperative Perception Dataset

Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.

CVPR 2024arXiv:2403.01316
89
citations
#983

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024arXiv:2312.03777
89
citations
#984

Lemur: Harmonizing Natural Language and Code for Language Agents

Yiheng Xu, Hongjin SU, Chen Xing et al.

ICLR 2024spotlightarXiv:2310.06830
89
citations
#985

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Jingfeng Wu, Difan Zou, Zixiang Chen et al.

ICLR 2024spotlightarXiv:2310.08391
89
citations
#986

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.

CVPR 2024highlightarXiv:2403.12570
89
citations
#987

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024arXiv:2402.19474
89
citations
#988

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.

CVPR 2024arXiv:2406.11824
89
citations
#989

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
88
citations
#990

Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

Eliya Nachmani, Alon Levkovitch, Roy Hirsch et al.

ICLR 2024arXiv:2305.15255
88
citations
#991

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024arXiv:2402.12336
88
citations
#992

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.

ECCV 2024arXiv:2310.10123
88
citations
#993

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.

ICLR 2024arXiv:2306.09296
88
citations
#994

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024arXiv:2403.19651
88
citations
#995

Human Alignment of Large Language Models through Online Preference Optimisation

Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.

ICML 2024arXiv:2403.08635
88
citations
#996

MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08636
88
citations
#997

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.

ICML 2024arXiv:2402.04833
88
citations
#998

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024arXiv:2402.08925
88
citations
#999

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Sewon Min, Suchin Gururangan, Eric Wallace et al.

ICLR 2024spotlightarXiv:2308.04430
88
citations
#1000

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024arXiv:2402.17726
87
citations