Most Cited 2024 "ucb-style exploration" Papers

12,324 papers found • Page 6 of 62

#1001

FlowMM: Generating Materials with Riemannian Flow Matching

Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram et al.

ICML 2024arXiv:2406.04713
87
citations
#1002

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

Kyle Sargent, Zizhang Li, Tanmay Shah et al.

CVPR 2024arXiv:2310.17994
87
citations
#1003

Finetuning Text-to-Image Diffusion Models for Fairness

Xudong Shen, Chao Du, Tianyu Pang et al.

ICLR 2024arXiv:2311.07604
87
citations
#1004

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht et al.

ICML 2024arXiv:2402.08682
87
citations
#1005

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024arXiv:2402.17726
87
citations
#1006

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024arXiv:2403.16182
87
citations
#1007

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

Yuxin Wen, Yuchen Liu, Chen Chen et al.

ICLR 2024arXiv:2407.21720
87
citations
#1008

Representation Surgery for Multi-Task Model Merging

Enneng Yang, Li Shen, Zhenyi Wang et al.

ICML 2024arXiv:2402.02705
87
citations
#1009

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

Han Zhou, Xingchen Wan, Lev Proleev et al.

ICLR 2024arXiv:2309.17249
87
citations
#1010

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gege Gao, Weiyang Liu, Anpei Chen et al.

CVPR 2024arXiv:2312.00093
87
citations
#1011

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024arXiv:2309.14611
86
citations
#1012

Neural Common Neighbor with Completion for Link Prediction

Xiyuan Wang, Haotong Yang, Muhan Zhang

ICLR 2024arXiv:2302.00890
86
citations
#1013

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
86
citations
#1014

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

Xinpeng Ding, Jianhua Han, Hang Xu et al.

CVPR 2024arXiv:2401.00988
86
citations
#1015

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305
86
citations
#1016

Amortizing intractable inference in large language models

Edward Hu, Moksh Jain, Eric Elmoznino et al.

ICLR 2024arXiv:2310.04363
86
citations
#1017

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng et al.

ECCV 2024arXiv:2403.14610
86
citations
#1018

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Zheng Li, Xiang Li, xinyi fu et al.

CVPR 2024arXiv:2403.02781
85
citations
#1019

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08733
85
citations
#1020

Language Models with Conformal Factuality Guarantees

Christopher Mohri, Tatsunori Hashimoto

ICML 2024arXiv:2402.10978
85
citations
#1021

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.

CVPR 2024arXiv:2404.04650
85
citations
#1022

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

YU DU, Fangyun Wei, Hongyang Zhang

ICML 2024arXiv:2402.04253
85
citations
#1023

LQ-LoRA: Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

Han Guo, Philip Greengard, Eric Xing et al.

ICLR 2024arXiv:2311.12023
85
citations
#1024

FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly

Yuhan Li, Yishun Dou, Yue Shi et al.

AAAI 2024paperarXiv:2308.10608
85
citations
#1025

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

Haoran Ye, Jiarui Wang, Helan Liang et al.

AAAI 2024paperarXiv:2312.08224
85
citations
#1026

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta et al.

ICML 2024arXiv:2402.09727
85
citations
#1027

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553
85
citations
#1028

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024arXiv:2402.13250
85
citations
#1029

Large-scale Training of Foundation Models for Wearable Biosignals

Salar Abbaspourazad, Oussama Elachqar, Andrew Miller et al.

ICLR 2024arXiv:2312.05409
85
citations
#1030

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024arXiv:2403.07719
85
citations
#1031

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024arXiv:2310.08588
85
citations
#1032

Robust Classification via a Single Diffusion Model

Huanran Chen, Yinpeng Dong, Zhengyi Wang et al.

ICML 2024arXiv:2305.15241
84
citations
#1033

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization

Xu Peng, Junwei Zhu, Boyuan Jiang et al.

CVPR 2024arXiv:2312.06354
84
citations
#1034

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Katherine Crowson, Stefan Baumann, Alex Birch et al.

ICML 2024arXiv:2401.11605
84
citations
#1035

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024arXiv:2405.14705
84
citations
#1036

SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

Jaehyung Kim, Jaehyun Nam, Sangwoo Mo et al.

ICLR 2024arXiv:2404.13081
84
citations
#1037

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Anke Tang, Li Shen, Yong Luo et al.

ICML 2024arXiv:2402.00433
84
citations
#1038

Controlling Vision-Language Models for Multi-Task Image Restoration

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao et al.

ICLR 2024arXiv:2310.01018
84
citations
#1039

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024arXiv:2401.07487
84
citations
#1040

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

ICLR 2024arXiv:2401.12242
84
citations
#1041

Human Feedback is not Gold Standard

Tom Hosking, Phil Blunsom, Max Bartolo

ICLR 2024arXiv:2309.16349
84
citations
#1042

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Qi Zhao, Shijie Wang, Ce Zhang et al.

ICLR 2024oralarXiv:2307.16368
84
citations
#1043

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2024arXiv:2404.01547
84
citations
#1044

LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Gongwei Chen, Leyang Shen, Rui Shao et al.

CVPR 2024arXiv:2311.11860
84
citations
#1045

Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation

Xuefei Ning, Zinan Lin, Zixuan Zhou et al.

ICLR 2024arXiv:2307.15337
83
citations
#1046

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.

CVPR 2024arXiv:2403.05419
83
citations
#1047

A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICML 2024arXiv:2402.05119
83
citations
#1048

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024arXiv:2401.12973
83
citations
#1049

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.

CVPR 2024arXiv:2312.12490
83
citations
#1050

CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting

xue wang, Tian Zhou, Qingsong Wen et al.

ICLR 2024oralarXiv:2305.12095
83
citations
#1051

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024highlightarXiv:2403.15241
83
citations
#1052

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo et al.

CVPR 2024arXiv:2312.02214
83
citations
#1053

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024arXiv:2309.00616
83
citations
#1054

PSALM: Pixelwise Segmentation with Large Multi-modal Model

Zheng Zhang, YeYao Ma, Enming Zhang et al.

ECCV 2024arXiv:2403.14598
83
citations
#1055

Evaluating Quantized Large Language Models

Shiyao Li, Xuefei Ning, Luning Wang et al.

ICML 2024arXiv:2402.18158
83
citations
#1056

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Wan-Duo Ma, Avisek Lahiri, J. P. Lewis et al.

AAAI 2024paperarXiv:2302.13153
83
citations
#1057

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

ziniu hu, Ahmet Iscen, Aashi Jain et al.

ICML 2024arXiv:2403.01248
83
citations
#1058

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024arXiv:2310.04562
82
citations
#1059

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Jiawei Wang, Yuchen Zhang, Jiaxin Zou et al.

ICML 2024arXiv:2402.01566
82
citations
#1060

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Yang Jin, Zhicheng Sun, Kun Xu et al.

ICML 2024oralarXiv:2402.03161
82
citations
#1061

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

ICML 2024arXiv:2403.06988
82
citations
#1062

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

CVPR 2024arXiv:2403.16131
82
citations
#1063

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034
82
citations
#1064

ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs

Yogesh Verma, Markus Heinonen, Vikas Garg

ICLR 2024oralarXiv:2404.10024
82
citations
#1065

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024highlightarXiv:2312.09158
82
citations
#1066

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Zhongjie Ba, Qingyu Liu, Zhenguang Liu et al.

AAAI 2024paperarXiv:2403.01786
82
citations
#1067

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.

CVPR 2024arXiv:2305.16223
82
citations
#1068

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863
82
citations
#1069

EscherNet: A Generative Model for Scalable View Synthesis

Xin Kong, Shikun Liu, Xiaoyang Lyu et al.

CVPR 2024arXiv:2402.03908
82
citations
#1070

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146
82
citations
#1071

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

CVPR 2024arXiv:2307.07511
81
citations
#1072

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

Pin Tang, Zhongdao Wang, Guoqing Wang et al.

CVPR 2024arXiv:2404.09502
81
citations
#1073

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Zihan Zhong, Zhiqiang Tang, Tong He et al.

ICLR 2024arXiv:2401.17868
81
citations
#1074

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Tianxing Wu, Chenyang Si, Yuming Jiang et al.

ECCV 2024arXiv:2312.07537
81
citations
#1075

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang, Zhehao Shen, Penghao Wang et al.

CVPR 2024arXiv:2312.03461
81
citations
#1076

Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification

Hai Ci, Pei Yang, Yiren Song et al.

ECCV 2024arXiv:2404.14055
81
citations
#1077

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024arXiv:2403.11641
81
citations
#1078

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024arXiv:2403.12550
81
citations
#1079

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Yuhang Huang, Renjiao Yi et al.

AAAI 2024paperarXiv:2401.02032
81
citations
#1080

Language Model Self-improvement by Reinforcement Learning Contemplation

Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li et al.

ICLR 2024arXiv:2305.14483
81
citations
#1081

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

Kaijie Zhu, Jiaao Chen, Jindong Wang et al.

ICLR 2024spotlightarXiv:2309.17167
81
citations
#1082

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

ICLR 2024spotlightarXiv:2309.16575
81
citations
#1083

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

ICLR 2024spotlightarXiv:2310.10638
81
citations
#1084

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Siyuan Guo, Cheng Deng, Ying Wen et al.

ICML 2024arXiv:2402.17453
81
citations
#1085

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036
81
citations
#1086

Fast ODE-based Sampling for Diffusion Models in Around 5 Steps

Zhenyu Zhou, Defang Chen, Can Wang et al.

CVPR 2024highlightarXiv:2312.00094
80
citations
#1087

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Ziheng Qin, Kai Wang, Zangwei Zheng et al.

ICLR 2024arXiv:2303.04947
80
citations
#1088

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024arXiv:2402.19231
80
citations
#1089

DeepZero: Scaling Up Zeroth-Order Optimization for Deep Model Training

AOCHUAN CHEN, Yimeng Zhang, Jinghan Jia et al.

ICLR 2024arXiv:2310.02025
80
citations
#1090

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024arXiv:2403.16095
80
citations
#1091

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

CVPR 2024arXiv:2311.17034
80
citations
#1092

Multimodal Representation Learning by Alternating Unimodal Adaptation

Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.

CVPR 2024arXiv:2311.10707
80
citations
#1093

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024arXiv:2311.17002
80
citations
#1094

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024arXiv:2401.00374
80
citations
#1095

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

Lang Yu, Qin Chen, Jie Zhou et al.

AAAI 2024paperarXiv:2312.11795
80
citations
#1096

Low-Cost High-Power Membership Inference Attacks

Sajjad Zarifzadeh, Philippe Liu, Reza Shokri

ICML 2024arXiv:2312.03262
80
citations
#1097

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024arXiv:2311.10959
80
citations
#1098

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024arXiv:2309.16496
80
citations
#1099

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024arXiv:2405.11643
79
citations
#1100

Curiosity-driven Red-teaming for Large Language Models

Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.

ICLR 2024arXiv:2402.19464
79
citations
#1101

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024arXiv:2311.17901
79
citations
#1102

EcomGPT: Instruction-Tuning Large Language Models with Chain-of-Task Tasks for E-commerce

Li Yangning, Shirong Ma, Xiaobin Wang et al.

AAAI 2024paperarXiv:2308.06966
79
citations
#1103

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.

AAAI 2024paperarXiv:2305.15072
79
citations
#1104

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Chao Gong, Kai Chen, Zhipeng Wei et al.

ECCV 2024arXiv:2407.12383
79
citations
#1105

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Yang Jin, Kun Xu, Kun Xu et al.

ICLR 2024arXiv:2309.04669
79
citations
#1106

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

Jianqiang Wan, Sibo Song, Wenwen Yu et al.

CVPR 2024arXiv:2403.19128
79
citations
#1107

On the Stability of Iterative Retraining of Generative Models on their own Data

Quentin Bertrand, Joey Bose, Alexandre Duplessis et al.

ICLR 2024spotlightarXiv:2310.00429
79
citations
#1108

Towards 3D Molecule-Text Interpretation in Language Models

Sihang Li, Zhiyuan Liu, Yanchen Luo et al.

ICLR 2024arXiv:2401.13923
79
citations
#1109

Linear attention is (maybe) all you need (to understand Transformer optimization)

Kwangjun Ahn, Xiang Cheng, Minhak Song et al.

ICLR 2024arXiv:2310.01082
79
citations
#1110

Position: Graph Foundation Models Are Already Here

Haitao Mao, Zhikai Chen, Wenzhuo Tang et al.

ICML 2024spotlightarXiv:2402.02216
79
citations
#1111

FairCLIP: Harnessing Fairness in Vision-Language Learning

Yan Luo, MIN SHI, Muhammad Osama Khan et al.

CVPR 2024arXiv:2403.19949
79
citations
#1112

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

AAAI 2024paperarXiv:2307.16424
79
citations
#1113

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Lorenzo Pacchiardi, Alex Chan, Sören Mindermann et al.

ICLR 2024arXiv:2309.15840
79
citations
#1114

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024arXiv:2401.00834
79
citations
#1115

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong, Xinyu Yang, Zhenyu Zhang et al.

ICML 2024arXiv:2402.09398
79
citations
#1116

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024arXiv:2312.11557
79
citations
#1117

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.

ICML 2024spotlightarXiv:2307.08678
79
citations
#1118

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024arXiv:2403.08730
78
citations
#1119

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Wang Yu Hsiang, Jun-Wei Hsieh, Ping-Yang Chen et al.

AAAI 2024paperarXiv:2211.08824
78
citations
#1120

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024arXiv:2407.12366
78
citations
#1121

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024arXiv:2312.01919
78
citations
#1122

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Yuchuan Tian, Hanting Chen, Xutao Wang et al.

ICLR 2024spotlightarXiv:2305.18149
78
citations
#1123

Learning to Act without Actions

Dominik Schmidt, Minqi Jiang

ICLR 2024oralarXiv:2312.10812
78
citations
#1124

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024arXiv:2312.11360
78
citations
#1125

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685
78
citations
#1126

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang, Liang Luo, Yuxin Chen et al.

ICML 2024arXiv:2403.02545
78
citations
#1127

Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Utkarsh Kumar Mall, Cheng Perng Phoo, Meilin Liu et al.

ICLR 2024arXiv:2312.06960
78
citations
#1128

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024arXiv:2312.16051
78
citations
#1129

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

Yukun Huang, Jianan Wang, Yukai Shi et al.

ICLR 2024arXiv:2306.12422
78
citations
#1130

NExT-Chat: An LMM for Chat, Detection and Segmentation

Ao Zhang, Yuan Yao, Wei Ji et al.

ICML 2024arXiv:2311.04498
78
citations
#1131

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

CVPR 2024arXiv:2308.09718
78
citations
#1132

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024arXiv:2311.16926
78
citations
#1133

LLM-grounded Video Diffusion Models

Long Lian, Baifeng Shi, Adam Yala et al.

ICLR 2024oralarXiv:2309.17444
77
citations
#1134

Instruct-Imagen: Image Generation with Multi-modal Instruction

Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.

CVPR 2024arXiv:2401.01952
77
citations
#1135

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024arXiv:2403.16034
77
citations
#1136

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024arXiv:2407.12442
77
citations
#1137

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024arXiv:2401.12244
77
citations
#1138

Graph Neural Prompting with Large Language Models

Yijun Tian, Huan Song, Zichen Wang et al.

AAAI 2024paperarXiv:2309.15427
77
citations
#1139

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye et al.

AAAI 2024paperarXiv:2306.05783
77
citations
#1140

Single Motion Diffusion

Sigal Raab, Inbal Leibovitch, Guy Tevet et al.

ICLR 2024oralarXiv:2302.05905
77
citations
#1141

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024arXiv:2403.19522
77
citations
#1142

A Dynamical Model of Neural Scaling Laws

Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

ICML 2024arXiv:2402.01092
77
citations
#1143

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024arXiv:2312.07472
77
citations
#1144

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Jinyi Hu, Yuan Yao, Chongyi Wang et al.

ICLR 2024spotlightarXiv:2308.12038
77
citations
#1145

Model Merging by Uncertainty-Based Gradient Matching

Nico Daheim, Thomas Möllenhoff, Edoardo M. Ponti et al.

ICLR 2024arXiv:2310.12808
77
citations
#1146

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024arXiv:2405.05967
77
citations
#1147

FiT: Flexible Vision Transformer for Diffusion Model

Zeyu Lu, ZiDong Wang, Di Huang et al.

ICML 2024spotlightarXiv:2402.12376
77
citations
#1148

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024arXiv:2403.11157
77
citations
#1149

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024arXiv:2405.01527
77
citations
#1150

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.

ICLR 2024arXiv:2310.00076
77
citations
#1151

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024
77
citations
#1152

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

Tianyu Guo, Wei Hu, Song Mei et al.

ICLR 2024arXiv:2310.10616
77
citations
#1153

Elucidating the Exposure Bias in Diffusion Models

Mang Ning, Mingxiao Li, Jianlin Su et al.

ICLR 2024arXiv:2308.15321
77
citations
#1154

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Pingzhi Li, Zhenyu Zhang, Prateek Yadav et al.

ICLR 2024spotlightarXiv:2310.01334
77
citations
#1155

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024arXiv:2311.17135
77
citations
#1156

Learning Vision from Models Rivals Learning Vision from Data

Yonglong Tian, Lijie Fan, Kaifeng Chen et al.

CVPR 2024arXiv:2312.17742
77
citations
#1157

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Zechuan Zhang, Zongxin Yang, Yi Yang

CVPR 2024highlightarXiv:2312.06704
77
citations
#1158

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024arXiv:2312.03052
76
citations
#1159

Streaming Dense Video Captioning

Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.

CVPR 2024arXiv:2404.01297
76
citations
#1160

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024arXiv:2407.09359
76
citations
#1161

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024arXiv:2312.00878
76
citations
#1162

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024arXiv:2402.16846
76
citations
#1163

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024arXiv:2402.17300
76
citations
#1164

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot et al.

ICLR 2024arXiv:2405.01534
76
citations
#1165

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024arXiv:2401.08209
76
citations
#1166

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Guangzhi Sun, Wenyi Yu, Changli Tang et al.

ICML 2024oralarXiv:2406.15704
76
citations
#1167

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul et al.

ICML 2024oralarXiv:2402.05863
76
citations
#1168

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu et al.

ECCV 2024arXiv:2405.12218
76
citations
#1169

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024arXiv:2311.16714
76
citations
#1170

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

Jiachen Lu, Ze Huang, Zeyu Yang et al.

ECCV 2024arXiv:2312.02934
76
citations
#1171

Rolling Diffusion Models

David Ruhe, Jonathan Heek, Tim Salimans et al.

ICML 2024oralarXiv:2402.09470
76
citations
#1172

DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation

Hong Chen, Yipeng Zhang, Simin Wu et al.

ICLR 2024arXiv:2305.03374
75
citations
#1173

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Yiwen Chen, Chi Zhang, Xiaofeng Yang et al.

AAAI 2024paperarXiv:2308.11473
75
citations
#1174

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300
75
citations
#1175

Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz, Aaditya Singh, DJ Strouse et al.

ICLR 2024spotlightarXiv:2310.04373
75
citations
#1176

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024arXiv:2403.12019
75
citations
#1177

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024arXiv:2404.08531
75
citations
#1178

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Xian Liu, Jian Ren, Aliaksandr Siarohin et al.

ICLR 2024arXiv:2310.08579
75
citations
#1179

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.

AAAI 2024paperarXiv:2312.12142
75
citations
#1180

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496
75
citations
#1181

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2024arXiv:2309.00610
75
citations
#1182

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024arXiv:2404.03613
75
citations
#1183

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

Satwik Bhattamishra, Arkil Patel, Phil Blunsom et al.

ICLR 2024arXiv:2310.03016
75
citations
#1184

A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng, Xueting Li, Koki Nagano et al.

CVPR 2024arXiv:2311.16854
75
citations
#1185

FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update

Ji Liu, Juncheng Jia, Tianshi Che et al.

AAAI 2024paperarXiv:2312.05770
75
citations
#1186

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024arXiv:2403.05139
75
citations
#1187

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024arXiv:2311.16512
75
citations
#1188

Expressive Whole-Body 3D Gaussian Avatar

Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito

ECCV 2024arXiv:2407.21686
75
citations
#1189

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024oralarXiv:2310.08887
75
citations
#1190

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244
75
citations
#1191

D-Flow: Differentiating through Flows for Controlled Generation

Heli Ben-Hamu, Omri Puny, Itai Gat et al.

ICML 2024arXiv:2402.14017
75
citations
#1192

LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation

Suhyeon Lee, Won Jun Kim, Jinho Chang et al.

ICLR 2024arXiv:2305.11490
75
citations
#1193

Watermark Stealing in Large Language Models

Nikola Jovanović, Robin Staab, Martin Vechev

ICML 2024arXiv:2402.19361
75
citations
#1194

Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians

Licheng Zhong, Hong-Xing Yu, Jiajun Wu et al.

ECCV 2024arXiv:2403.09434
75
citations
#1195

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.

CVPR 2024arXiv:2404.16752
75
citations
#1196

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024arXiv:2307.07218
74
citations
#1197

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024arXiv:2311.15855
74
citations
#1198

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick et al.

ICML 2024arXiv:2401.12179
74
citations
#1199

AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini et al.

CVPR 2024arXiv:2406.02951
74
citations
#1200

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024arXiv:2403.11831
74
citations