Most Cited 2024 "ucb-style exploration" Papers

12,324 papers found • Page 6 of 62

Filters:Most Cited 2024 ucb-style exploration Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1001

FlowMM: Generating Materials with Riemannian Flow Matching

Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram et al.

ICML 2024arXiv:2406.04713

citations

#1002

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

Kyle Sargent, Zizhang Li, Tanmay Shah et al.

CVPR 2024arXiv:2310.17994

citations

#1003

Finetuning Text-to-Image Diffusion Models for Fairness

Xudong Shen, Chao Du, Tianyu Pang et al.

ICLR 2024arXiv:2311.07604

citations

#1004

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht et al.

ICML 2024arXiv:2402.08682

citations

#1005

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024arXiv:2402.17726

citations

#1006

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024arXiv:2403.16182

citations

#1007

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

Yuxin Wen, Yuchen Liu, Chen Chen et al.

ICLR 2024arXiv:2407.21720

citations

#1008

Representation Surgery for Multi-Task Model Merging

Enneng Yang, Li Shen, Zhenyi Wang et al.

ICML 2024arXiv:2402.02705

citations

#1009

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

Han Zhou, Xingchen Wan, Lev Proleev et al.

ICLR 2024arXiv:2309.17249

citations

#1010

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Gege Gao, Weiyang Liu, Anpei Chen et al.

CVPR 2024arXiv:2312.00093

citations

#1011

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024arXiv:2309.14611

citations

#1012

Neural Common Neighbor with Completion for Link Prediction

Xiyuan Wang, Haotong Yang, Muhan Zhang

ICLR 2024arXiv:2302.00890

citations

#1013

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732

citations

#1014

Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models

Xinpeng Ding, Jianhua Han, Hang Xu et al.

CVPR 2024arXiv:2401.00988

citations

#1015

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305

citations

#1016

Amortizing intractable inference in large language models

Edward Hu, Moksh Jain, Eric Elmoznino et al.

ICLR 2024arXiv:2310.04363

citations

#1017

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng et al.

ECCV 2024arXiv:2403.14610

citations

#1018

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Zheng Li, Xiang Li, xinyi fu et al.

CVPR 2024arXiv:2403.02781

citations

#1019

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08733

citations

#1020

Language Models with Conformal Factuality Guarantees

Christopher Mohri, Tatsunori Hashimoto

ICML 2024arXiv:2402.10978

citations

#1021

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.

CVPR 2024arXiv:2404.04650

citations

#1022

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

YU DU, Fangyun Wei, Hongyang Zhang

ICML 2024arXiv:2402.04253

citations

#1023

LQ-LoRA: Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

Han Guo, Philip Greengard, Eric Xing et al.

ICLR 2024arXiv:2311.12023

citations

#1024

FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly

Yuhan Li, Yishun Dou, Yue Shi et al.

AAAI 2024paperarXiv:2308.10608

citations

#1025

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

Haoran Ye, Jiarui Wang, Helan Liang et al.

AAAI 2024paperarXiv:2312.08224

citations

#1026

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta et al.

ICML 2024arXiv:2402.09727

citations

#1027

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553

citations

#1028

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.

CVPR 2024arXiv:2402.13250

citations

#1029

Large-scale Training of Foundation Models for Wearable Biosignals

Salar Abbaspourazad, Oussama Elachqar, Andrew Miller et al.

ICLR 2024arXiv:2312.05409

citations

#1030

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024arXiv:2403.07719

citations

#1031

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024arXiv:2310.08588

citations

#1032

Robust Classification via a Single Diffusion Model

Huanran Chen, Yinpeng Dong, Zhengyi Wang et al.

ICML 2024arXiv:2305.15241

citations

#1033

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization

Xu Peng, Junwei Zhu, Boyuan Jiang et al.

CVPR 2024arXiv:2312.06354

citations

#1034

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Katherine Crowson, Stefan Baumann, Alex Birch et al.

ICML 2024arXiv:2401.11605

citations

#1035

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024arXiv:2405.14705

citations

#1036

SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

Jaehyung Kim, Jaehyun Nam, Sangwoo Mo et al.

ICLR 2024arXiv:2404.13081

citations

#1037

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Anke Tang, Li Shen, Yong Luo et al.

ICML 2024arXiv:2402.00433

citations

#1038

Controlling Vision-Language Models for Multi-Task Image Restoration

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao et al.

ICLR 2024arXiv:2310.01018

citations

#1039

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024arXiv:2401.07487

citations

#1040

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

ICLR 2024arXiv:2401.12242

citations

#1041

Human Feedback is not Gold Standard

Tom Hosking, Phil Blunsom, Max Bartolo

ICLR 2024arXiv:2309.16349

citations

#1042

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Qi Zhao, Shijie Wang, Ce Zhang et al.

ICLR 2024oralarXiv:2307.16368

citations

#1043

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2024arXiv:2404.01547

citations

#1044

LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Gongwei Chen, Leyang Shen, Rui Shao et al.

CVPR 2024arXiv:2311.11860

citations

#1045

Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation

Xuefei Ning, Zinan Lin, Zixuan Zhou et al.

ICLR 2024arXiv:2307.15337

citations

#1046

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.

CVPR 2024arXiv:2403.05419

citations

#1047

A Closer Look at the Limitations of Instruction Tuning

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICML 2024arXiv:2402.05119

citations

#1048

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024arXiv:2401.12973

citations

#1049

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.

CVPR 2024arXiv:2312.12490

citations

#1050

CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting

xue wang, Tian Zhou, Qingsong Wen et al.

ICLR 2024oralarXiv:2305.12095

citations

#1051

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024highlightarXiv:2403.15241

citations

#1052

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo et al.

CVPR 2024arXiv:2312.02214

citations

#1053

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024arXiv:2309.00616

citations

#1054

PSALM: Pixelwise Segmentation with Large Multi-modal Model

Zheng Zhang, YeYao Ma, Enming Zhang et al.

ECCV 2024arXiv:2403.14598

citations

#1055

Evaluating Quantized Large Language Models

Shiyao Li, Xuefei Ning, Luning Wang et al.

ICML 2024arXiv:2402.18158

citations

#1056

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Wan-Duo Ma, Avisek Lahiri, J. P. Lewis et al.

AAAI 2024paperarXiv:2302.13153

citations

#1057

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

ziniu hu, Ahmet Iscen, Aashi Jain et al.

ICML 2024arXiv:2403.01248

citations

#1058

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024arXiv:2310.04562

citations

#1059

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Jiawei Wang, Yuchen Zhang, Jiaxin Zou et al.

ICML 2024arXiv:2402.01566

citations

#1060

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Yang Jin, Zhicheng Sun, Kun Xu et al.

ICML 2024oralarXiv:2402.03161

citations

#1061

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

ICML 2024arXiv:2403.06988

citations

#1062

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

CVPR 2024arXiv:2403.16131

citations

#1063

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034

citations

#1064

ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs

Yogesh Verma, Markus Heinonen, Vikas Garg

ICLR 2024oralarXiv:2404.10024

citations

#1065

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024highlightarXiv:2312.09158

citations

#1066

Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection

Zhongjie Ba, Qingyu Liu, Zhenguang Liu et al.

AAAI 2024paperarXiv:2403.01786

citations

#1067

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.

CVPR 2024arXiv:2305.16223

citations

#1068

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863

citations

#1069

EscherNet: A Generative Model for Scalable View Synthesis

Xin Kong, Shikun Liu, Xiaoyang Lyu et al.

CVPR 2024arXiv:2402.03908

citations

#1070

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146

citations

#1071

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

CVPR 2024arXiv:2307.07511

citations

#1072

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

Pin Tang, Zhongdao Wang, Guoqing Wang et al.

CVPR 2024arXiv:2404.09502

citations

#1073

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Zihan Zhong, Zhiqiang Tang, Tong He et al.

ICLR 2024arXiv:2401.17868

citations

#1074

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Tianxing Wu, Chenyang Si, Yuming Jiang et al.

ECCV 2024arXiv:2312.07537

citations

#1075

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang, Zhehao Shen, Penghao Wang et al.

CVPR 2024arXiv:2312.03461

citations

#1076

Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification

Hai Ci, Pei Yang, Yiren Song et al.

ECCV 2024arXiv:2404.14055

citations

#1077

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024arXiv:2403.11641

citations

#1078

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024arXiv:2403.12550

citations

#1079

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Yuhang Huang, Renjiao Yi et al.

AAAI 2024paperarXiv:2401.02032

citations

#1080

Language Model Self-improvement by Reinforcement Learning Contemplation

Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li et al.

ICLR 2024arXiv:2305.14483

citations

#1081

DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

Kaijie Zhu, Jiaao Chen, Jindong Wang et al.

ICLR 2024spotlightarXiv:2309.17167

citations

#1082

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

ICLR 2024spotlightarXiv:2309.16575

citations

#1083

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

ICLR 2024spotlightarXiv:2310.10638

citations

#1084

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Siyuan Guo, Cheng Deng, Ying Wen et al.

ICML 2024arXiv:2402.17453

citations

#1085

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036

citations

#1086

Fast ODE-based Sampling for Diffusion Models in Around 5 Steps

Zhenyu Zhou, Defang Chen, Can Wang et al.

CVPR 2024highlightarXiv:2312.00094

citations

#1087

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Ziheng Qin, Kai Wang, Zangwei Zheng et al.

ICLR 2024arXiv:2303.04947

citations

#1088

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024arXiv:2402.19231

citations

#1089

DeepZero: Scaling Up Zeroth-Order Optimization for Deep Model Training

AOCHUAN CHEN, Yimeng Zhang, Jinghan Jia et al.

ICLR 2024arXiv:2310.02025

citations

#1090

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024arXiv:2403.16095

citations

#1091

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

CVPR 2024arXiv:2311.17034

citations

#1092

Multimodal Representation Learning by Alternating Unimodal Adaptation

Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.

CVPR 2024arXiv:2311.10707

citations

#1093

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024arXiv:2311.17002

citations

#1094

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024arXiv:2401.00374

citations

#1095

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA

Lang Yu, Qin Chen, Jie Zhou et al.

AAAI 2024paperarXiv:2312.11795

citations

#1096

Low-Cost High-Power Membership Inference Attacks

Sajjad Zarifzadeh, Philippe Liu, Reza Shokri

ICML 2024arXiv:2312.03262

citations

#1097

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024arXiv:2311.10959

citations

#1098

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024arXiv:2309.16496

citations

#1099

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024arXiv:2405.11643

citations

#1100

Curiosity-driven Red-teaming for Large Language Models

Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.

ICLR 2024arXiv:2402.19464

citations

#1101

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024arXiv:2311.17901

citations

#1102

EcomGPT: Instruction-Tuning Large Language Models with Chain-of-Task Tasks for E-commerce

Li Yangning, Shirong Ma, Xiaobin Wang et al.

AAAI 2024paperarXiv:2308.06966

citations

#1103

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.

AAAI 2024paperarXiv:2305.15072

citations

#1104

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Chao Gong, Kai Chen, Zhipeng Wei et al.

ECCV 2024arXiv:2407.12383

citations

#1105

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Yang Jin, Kun Xu, Kun Xu et al.

ICLR 2024arXiv:2309.04669

citations

#1106

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

Jianqiang Wan, Sibo Song, Wenwen Yu et al.

CVPR 2024arXiv:2403.19128

citations

#1107

On the Stability of Iterative Retraining of Generative Models on their own Data

Quentin Bertrand, Joey Bose, Alexandre Duplessis et al.

ICLR 2024spotlightarXiv:2310.00429

citations

#1108

Towards 3D Molecule-Text Interpretation in Language Models

Sihang Li, Zhiyuan Liu, Yanchen Luo et al.

ICLR 2024arXiv:2401.13923

citations

#1109

Linear attention is (maybe) all you need (to understand Transformer optimization)

Kwangjun Ahn, Xiang Cheng, Minhak Song et al.

ICLR 2024arXiv:2310.01082

citations

#1110

Position: Graph Foundation Models Are Already Here

Haitao Mao, Zhikai Chen, Wenzhuo Tang et al.

ICML 2024spotlightarXiv:2402.02216

citations

#1111

FairCLIP: Harnessing Fairness in Vision-Language Learning

Yan Luo, MIN SHI, Muhammad Osama Khan et al.

CVPR 2024arXiv:2403.19949

citations

#1112

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

AAAI 2024paperarXiv:2307.16424

citations

#1113

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Lorenzo Pacchiardi, Alex Chan, Sören Mindermann et al.

ICLR 2024arXiv:2309.15840

citations

#1114

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024arXiv:2401.00834

citations

#1115

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong, Xinyu Yang, Zhenyu Zhang et al.

ICML 2024arXiv:2402.09398

citations

#1116

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024arXiv:2312.11557

citations

#1117

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.

ICML 2024spotlightarXiv:2307.08678

citations

#1118

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024arXiv:2403.08730

citations

#1119

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Wang Yu Hsiang, Jun-Wei Hsieh, Ping-Yang Chen et al.

AAAI 2024paperarXiv:2211.08824

citations

#1120

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024arXiv:2407.12366

citations

#1121

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024arXiv:2312.01919

citations

#1122

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Yuchuan Tian, Hanting Chen, Xutao Wang et al.

ICLR 2024spotlightarXiv:2305.18149

citations

#1123

Learning to Act without Actions

Dominik Schmidt, Minqi Jiang

ICLR 2024oralarXiv:2312.10812

citations

#1124

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024arXiv:2312.11360

citations

#1125

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685

citations

#1126

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Buyun Zhang, Liang Luo, Yuxin Chen et al.

ICML 2024arXiv:2403.02545

citations

#1127

Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Utkarsh Kumar Mall, Cheng Perng Phoo, Meilin Liu et al.

ICLR 2024arXiv:2312.06960

citations

#1128

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024arXiv:2312.16051

citations

#1129

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

Yukun Huang, Jianan Wang, Yukai Shi et al.

ICLR 2024arXiv:2306.12422

citations

#1130

NExT-Chat: An LMM for Chat, Detection and Segmentation

Ao Zhang, Yuan Yao, Wei Ji et al.

ICML 2024arXiv:2311.04498

citations

#1131

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

CVPR 2024arXiv:2308.09718

citations

#1132

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024arXiv:2311.16926

citations

#1133

LLM-grounded Video Diffusion Models

Long Lian, Baifeng Shi, Adam Yala et al.

ICLR 2024oralarXiv:2309.17444

citations

#1134

Instruct-Imagen: Image Generation with Multi-modal Instruction

Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.

CVPR 2024arXiv:2401.01952

citations

#1135

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024arXiv:2403.16034

citations

#1136

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024arXiv:2407.12442

citations

#1137

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024arXiv:2401.12244

citations

#1138

Graph Neural Prompting with Large Language Models

Yijun Tian, Huan Song, Zichen Wang et al.

AAAI 2024paperarXiv:2309.15427

citations

#1139

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye et al.

AAAI 2024paperarXiv:2306.05783

citations

#1140

Single Motion Diffusion

Sigal Raab, Inbal Leibovitch, Guy Tevet et al.

ICLR 2024oralarXiv:2302.05905

citations

#1141

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024arXiv:2403.19522

citations

#1142

A Dynamical Model of Neural Scaling Laws

Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

ICML 2024arXiv:2402.01092

citations

#1143

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024arXiv:2312.07472

citations

#1144

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Jinyi Hu, Yuan Yao, Chongyi Wang et al.

ICLR 2024spotlightarXiv:2308.12038

citations

#1145

Model Merging by Uncertainty-Based Gradient Matching

Nico Daheim, Thomas Möllenhoff, Edoardo M. Ponti et al.

ICLR 2024arXiv:2310.12808

citations

#1146

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024arXiv:2405.05967

citations

#1147

FiT: Flexible Vision Transformer for Diffusion Model

Zeyu Lu, ZiDong Wang, Di Huang et al.

ICML 2024spotlightarXiv:2402.12376

citations

#1148

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024arXiv:2403.11157

citations

#1149

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024arXiv:2405.01527

citations

#1150

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.

ICLR 2024arXiv:2310.00076

citations

#1151

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024

citations

#1152

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

Tianyu Guo, Wei Hu, Song Mei et al.

ICLR 2024arXiv:2310.10616

citations

#1153

Elucidating the Exposure Bias in Diffusion Models

Mang Ning, Mingxiao Li, Jianlin Su et al.

ICLR 2024arXiv:2308.15321

citations

#1154

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Pingzhi Li, Zhenyu Zhang, Prateek Yadav et al.

ICLR 2024spotlightarXiv:2310.01334

citations

#1155

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024arXiv:2311.17135

citations

#1156

Learning Vision from Models Rivals Learning Vision from Data

Yonglong Tian, Lijie Fan, Kaifeng Chen et al.

CVPR 2024arXiv:2312.17742

citations

#1157

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Zechuan Zhang, Zongxin Yang, Yi Yang

CVPR 2024highlightarXiv:2312.06704

citations

#1158

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024arXiv:2312.03052

citations

#1159

Streaming Dense Video Captioning

Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.

CVPR 2024arXiv:2404.01297

citations

#1160

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024arXiv:2407.09359

citations

#1161

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024arXiv:2312.00878

citations

#1162

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024arXiv:2402.16846

citations

#1163

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024arXiv:2402.17300

citations

#1164

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot et al.

ICLR 2024arXiv:2405.01534

citations

#1165

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024arXiv:2401.08209

citations

#1166

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Guangzhi Sun, Wenyi Yu, Changli Tang et al.

ICML 2024oralarXiv:2406.15704

citations

#1167

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul et al.

ICML 2024oralarXiv:2402.05863

citations

#1168

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu et al.

ECCV 2024arXiv:2405.12218

citations

#1169

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024arXiv:2311.16714

citations

#1170

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

Jiachen Lu, Ze Huang, Zeyu Yang et al.

ECCV 2024arXiv:2312.02934

citations

#1171

Rolling Diffusion Models

David Ruhe, Jonathan Heek, Tim Salimans et al.

ICML 2024oralarXiv:2402.09470

citations

#1172

DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation

Hong Chen, Yipeng Zhang, Simin Wu et al.

ICLR 2024arXiv:2305.03374

citations

#1173

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Yiwen Chen, Chi Zhang, Xiaofeng Yang et al.

AAAI 2024paperarXiv:2308.11473

citations

#1174

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300

citations

#1175

Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz, Aaditya Singh, DJ Strouse et al.

ICLR 2024spotlightarXiv:2310.04373

citations

#1176

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024arXiv:2403.12019

citations

#1177

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024arXiv:2404.08531

citations

#1178

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

Xian Liu, Jian Ren, Aliaksandr Siarohin et al.

ICLR 2024arXiv:2310.08579

citations

#1179

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.

AAAI 2024paperarXiv:2312.12142

citations

#1180

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496

citations

#1181

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2024arXiv:2309.00610

citations

#1182

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024arXiv:2404.03613

citations

#1183

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

Satwik Bhattamishra, Arkil Patel, Phil Blunsom et al.

ICLR 2024arXiv:2310.03016

citations

#1184

A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng, Xueting Li, Koki Nagano et al.

CVPR 2024arXiv:2311.16854

citations

#1185

FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update

Ji Liu, Juncheng Jia, Tianshi Che et al.

AAAI 2024paperarXiv:2312.05770

citations

#1186

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024arXiv:2403.05139

citations

#1187

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024arXiv:2311.16512

citations

#1188

Expressive Whole-Body 3D Gaussian Avatar

Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito

ECCV 2024arXiv:2407.21686

citations

#1189

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024oralarXiv:2310.08887

citations

#1190

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244

citations

#1191

D-Flow: Differentiating through Flows for Controlled Generation

Heli Ben-Hamu, Omri Puny, Itai Gat et al.

ICML 2024arXiv:2402.14017

citations

#1192

LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation

Suhyeon Lee, Won Jun Kim, Jinho Chang et al.

ICLR 2024arXiv:2305.11490

citations

#1193

Watermark Stealing in Large Language Models

Nikola Jovanović, Robin Staab, Martin Vechev

ICML 2024arXiv:2402.19361

citations

#1194

Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians

Licheng Zhong, Hong-Xing Yu, Jiajun Wu et al.

ECCV 2024arXiv:2403.09434

citations

#1195

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.

CVPR 2024arXiv:2404.16752

citations

#1196

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024arXiv:2307.07218

citations

#1197

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024arXiv:2311.15855

citations

#1198

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick et al.

ICML 2024arXiv:2401.12179

citations

#1199

AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini et al.

CVPR 2024arXiv:2406.02951

citations

#1200

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024arXiv:2403.11831

citations

← Previous

1...4 5 6 7 8...62