Most Cited 2024 "gradient similarity" Papers

12,324 papers found • Page 7 of 62

Filters:Most Cited 2024 gradient similarity Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1201

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024arXiv:2311.15855

citations

#1202

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024arXiv:2408.04883

citations

#1203

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

Chenhui Deng, Zichao Yue, Zhiru Zhang

ICLR 2024arXiv:2403.01232

citations

#1204

GenTron: Diffusion Transformers for Image and Video Generation

Shoufa Chen, Mengmeng Xu, Jiawei Ren et al.

CVPR 2024arXiv:2312.04557

citations

#1205

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Yuxin Zhang, Lirui Zhao, Mingbao Lin et al.

ICLR 2024arXiv:2310.08915

citations

#1206

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024arXiv:2307.07218

citations

#1207

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Donghoon Ahn, Hyoungwon Cho, Jaewon Min et al.

ECCV 2024arXiv:2403.17377

citations

#1208

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick et al.

ICML 2024arXiv:2401.12179

citations

#1209

Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

Xinhao Luo, Man Yao, Yuhong Chou et al.

ECCV 2024arXiv:2407.20708

citations

#1210

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu, Xiaodong Cun, Xintao Wang et al.

ECCV 2024arXiv:2405.20222

citations

#1211

RegionGPT: Towards Region Understanding Vision Language Model

Qiushan Guo, Shalini De Mello, Danny Yin et al.

CVPR 2024arXiv:2403.02330

citations

#1212

Stay on Topic with Classifier-Free Guidance

Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.

ICML 2024spotlightarXiv:2306.17806

citations

#1213

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

YEFEI HE, Jing Liu, Weijia Wu et al.

ICLR 2024oralarXiv:2310.03270

citations

#1214

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing

Yafei Zhang, Shen Zhou, Huafeng Li

CVPR 2024arXiv:2403.01105

citations

#1215

LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

Yunpeng Luo, Junlong Du, Ke Yan et al.

CVPR 2024arXiv:2403.17465

citations

#1216

DE-COP: Detecting Copyrighted Content in Language Models Training Data

André Duarte, Xuandong Zhao, Arlindo Oliveira et al.

ICML 2024arXiv:2402.09910

citations

#1217

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024arXiv:2402.09025

citations

#1218

Generating Human Interaction Motions in Scenes with Text Control

Hongwei Yi, Justus Thies, Michael J. Black et al.

ECCV 2024arXiv:2404.10685

citations

#1219

DistiLLM: Towards Streamlined Distillation for Large Language Models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.

ICML 2024arXiv:2402.03898

citations

#1220

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

CVPR 2024arXiv:2310.10343

citations

#1221

Relay Diffusion: Unifying diffusion process across resolutions for image synthesis

Jiayan Teng, Wendi Zheng, Ming Ding et al.

ICLR 2024spotlightarXiv:2309.03350

citations

#1222

Guidance with Spherical Gaussian Constraint for Conditional Diffusion

Lingxiao Yang, Shutong Ding, Yifan Cai et al.

ICML 2024arXiv:2402.03201

citations

#1223

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Malyaban Bal, Abhronil Sengupta

AAAI 2024paperarXiv:2308.10873

citations

#1224

VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao, Nitesh Bharadwaj Gundavarapu, Liangzhe Yuan et al.

ICML 2024arXiv:2402.13217

citations

#1225

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024arXiv:2312.08870

citations

#1226

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024arXiv:2309.02285

citations

#1227

Teaching Large Language Models to Translate with Comparison

Jiali Zeng, Fandong Meng, Yongjing Yin et al.

AAAI 2024paperarXiv:2307.04408

citations

#1228

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

ICLR 2024spotlightarXiv:2307.09476

citations

#1229

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li, Haolin Shi, Wenli Zhang et al.

ECCV 2024arXiv:2404.03575

citations

#1230

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao et al.

CVPR 2024arXiv:2404.06842

citations

#1231

LLMs are Good Sign Language Translators

Jia Gong, Lin Geng Foo, Yixuan He et al.

CVPR 2024arXiv:2404.00925

citations

#1232

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Zehuan Huang, Hao Wen, Junting Dong et al.

CVPR 2024arXiv:2312.06725

citations

#1233

SkeletonGait: Gait Recognition Using Skeleton Maps

Chao Fan, Jingzhe Ma, Dongyang Jin et al.

AAAI 2024paperarXiv:2311.13444

citations

#1234

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Guy Yariv, Itai Gat, Sagie Benaim et al.

AAAI 2024paperarXiv:2309.16429

citations

#1235

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

Zhengliang Shi, Shen Gao, Minghang Zhu et al.

AAAI 2024paperarXiv:2308.14034

citations

#1236

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Wenhao Tang, Fengtao ZHOU, Sheng Huang et al.

CVPR 2024arXiv:2402.17228

citations

#1237

SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution

mingjun zheng, Long Sun, Jiangxin Dong et al.

ECCV 2024

citations

#1238

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.

ECCV 2024arXiv:2405.06228

citations

#1239

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.

CVPR 2024arXiv:2404.06609

citations

#1240

ChatPose: Chatting about 3D Human Pose

Yao Feng, Jing Lin, Sai Kumar Dwivedi et al.

CVPR 2024arXiv:2311.18836

citations

#1241

WAVES: Benchmarking the Robustness of Image Watermarks

Bang An, Mucong Ding, Tahseen Rabbani et al.

ICML 2024arXiv:2401.08573

citations

#1242

Vanilla Bayesian Optimization Performs Great in High Dimensions

Carl Hvarfner, Erik Hellsten, Luigi Nardi

ICML 2024arXiv:2402.02229

citations

#1243

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlightarXiv:2404.07990

citations

#1244

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Honghui Yang, Sha Zhang, Di Huang et al.

CVPR 2024arXiv:2310.08370

citations

#1245

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images

Kuofeng Gao, Yang Bai, Jindong Gu et al.

ICLR 2024oralarXiv:2401.11170

citations

#1246

Fast Adversarial Attacks on Language Models In One GPU Minute

Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan et al.

ICML 2024arXiv:2402.15570

citations

#1247

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang, Zongzhen Wu, Bo Jiang et al.

AAAI 2024paperarXiv:2211.09648

citations

#1248

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.

ECCV 2024arXiv:2403.08551

citations

#1249

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.

CVPR 2024arXiv:2405.12979

citations

#1250

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

Yingpeng Du, Di Luo, Rui Yan et al.

AAAI 2024paperarXiv:2307.10747

citations

#1251

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024arXiv:2401.01885

citations

#1252

In-Context Learning through the Bayesian Prism

Madhur Panwar, Kabir Ahuja, Navin Goyal

ICLR 2024arXiv:2306.04891

citations

#1253

Learning to Model the World With Language

Jessy Lin, Yuqing Du, Olivia Watkins et al.

ICML 2024arXiv:2308.01399

citations

#1254

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.

CVPR 2024arXiv:2403.06592

citations

#1255

Frequency-Aware Transformer for Learned Image Compression

Han Li, Shaohui Li, Wenrui Dai et al.

ICLR 2024arXiv:2310.16387

citations

#1256

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

Hao Sun, Mingyao Zhou, Wenjing Chen et al.

AAAI 2024paperarXiv:2401.02309

citations

#1257

Plug-In Diffusion Model for Sequential Recommendation

Haokai Ma, Ruobing Xie, Lei Meng et al.

AAAI 2024paperarXiv:2401.02913

citations

#1258

TorchRL: A data-driven decision-making library for PyTorch

Albert Bou, Matteo Bettini, Sebastian Dittert et al.

ICLR 2024spotlightarXiv:2306.00577

citations

#1259

HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting

Qihe Huang, Lei Shen, Ruixin Zhang et al.

AAAI 2024paper

citations

#1260

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song et al.

CVPR 2024arXiv:2402.19470

citations

#1261

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302

citations

#1262

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024arXiv:2409.01686

citations

#1263

Scalable Diffusion for Materials Generation

Sherry Yang, Kwanghwan Cho, Amil Merchant et al.

ICLR 2024arXiv:2311.09235

citations

#1264

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

ECCV 2024arXiv:2406.01597

citations

#1265

OneRestore: A Universal Restoration Framework for Composite Degradation

Yu Guo, Yuan Gao, Yuxu Lu et al.

ECCV 2024arXiv:2407.04621

citations

#1266

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

Chen Min, Dawei Zhao, Liang Xiao et al.

CVPR 2024arXiv:2405.04390

citations

#1267

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

Minghao Chen, Iro Laina, Andrea Vedaldi

ECCV 2024arXiv:2404.18929

citations

#1268

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024arXiv:2403.13043

citations

#1269

Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

Litu Rout, Yujia Chen, Abhishek Kumar et al.

CVPR 2024arXiv:2312.00852

citations

#1270

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024highlightarXiv:2312.11461

citations

#1271

Simple Hierarchical Planning with Diffusion

Chang Chen, Fei Deng, Kenji Kawaguchi et al.

ICLR 2024oralarXiv:2401.02644

citations

#1272

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

Boxin Wang, Wei Ping, Lawrence McAfee et al.

ICML 2024arXiv:2310.07713

citations

#1273

MindBridge: A Cross-Subject Brain Decoding Framework

Shizun Wang, Songhua Liu, Zhenxiong Tan et al.

CVPR 2024highlightarXiv:2404.07850

citations

#1274

Accelerating Convergence of Score-Based Diffusion Models, Provably

Gen Li, Yu Huang, Timofey Efimov et al.

ICML 2024arXiv:2403.03852

citations

#1275

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

Michael Psenka, Alejandro Escontrela, Pieter Abbeel et al.

ICML 2024arXiv:2312.11752

citations

#1276

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

Yiyang Chen, Zhedong Zheng, Wei Ji et al.

ICLR 2024arXiv:2211.07394

citations

#1277

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Licong Lin, Yu Bai, Song Mei

ICLR 2024arXiv:2310.08566

citations

#1278

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.

CVPR 2024arXiv:2311.15383

citations

#1279

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Chenguo Lin, Yadong MU

ICLR 2024spotlightarXiv:2402.04717

citations

#1280

Harnessing Large Language Models for Training-free Video Anomaly Detection

Luca Zanella, Willi Menapace, Massimiliano Mancini et al.

CVPR 2024arXiv:2404.01014

citations

#1281

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Shijie Zhou, Zhiwen Fan, Dejia Xu et al.

ECCV 2024arXiv:2404.06903

citations

#1282

DiffDA: a Diffusion model for weather-scale Data Assimilation

Langwen Huang, Lukas Gianinazzi, Yuejiang Yu et al.

ICML 2024arXiv:2401.05932

citations

#1283

On the Learnability of Watermarks for Language Models

Chenchen Gu, XIANG LI, Percy Liang et al.

ICLR 2024arXiv:2312.04469

citations

#1284

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.

ECCV 2024arXiv:2403.05121

citations

#1285

Circumventing Concept Erasure Methods For Text-To-Image Generative Models

Minh Pham, Kelly Marshall, Niv Cohen et al.

ICLR 2024arXiv:2308.01508

citations

#1286

Detector-Free Structure from Motion

Xingyi He, Jiaming Sun, Yifan Wang et al.

CVPR 2024arXiv:2306.15669

citations

#1287

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

Lukas Höllein, Aljaž Božič, Norman Müller et al.

CVPR 2024arXiv:2403.01807

citations

#1288

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

Junge Zhang, Feihu Zhang, Shaochen Kuang et al.

AAAI 2024paperarXiv:2304.14811

citations

#1289

DOGE: Domain Reweighting with Generalization Estimation

Simin Fan, Matteo Pagliardini, Martin Jaggi

ICML 2024arXiv:2310.15393

citations

#1290

FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Zhen Liu, Hao Zhu, Qi Zhang et al.

CVPR 2024arXiv:2312.02434

citations

#1291

Learning to Rank in Generative Retrieval

Yongqi Li, Nan Yang, Liang Wang et al.

AAAI 2024paperarXiv:2306.15222

citations

#1292

How do Language Models Bind Entities in Context?

Jiahai Feng, Jacob Steinhardt

ICLR 2024arXiv:2310.17191

citations

#1293

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Basile Van Hoorick, Rundi Wu, Ege Ozguroglu et al.

ECCV 2024arXiv:2405.14868

citations

#1294

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Yunsheng Ma, Can Cui, Xu Cao et al.

CVPR 2024arXiv:2312.04372

citations

#1295

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

Gabriele Corso, Arthur Deng, Nicholas Polizzi et al.

ICLR 2024arXiv:2402.18396

citations

#1296

Learning Content-Enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Qi Bi, Shaodi You, Theo Gevers

AAAI 2024paperarXiv:2307.00371

citations

#1297

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

Pengying Wu, Yao Mu, Bingxian Wu et al.

ICML 2024arXiv:2401.02695

citations

#1298

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998

citations

#1299

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Kaiwen Zhang, Yifan Zhou, Xudong XU et al.

CVPR 2024arXiv:2312.07409

citations

#1300

Generative-Based Fusion Mechanism for Multi-Modal Tracking

Zhangyong Tang, Tianyang Xu, Xiaojun Wu et al.

AAAI 2024paperarXiv:2309.01728

citations

#1301

Free3D: Consistent Novel View Synthesis without 3D Representation

Chuanxia Zheng, Andrea Vedaldi

CVPR 2024arXiv:2312.04551

citations

#1302

Generating Images of Rare Concepts Using Pre-trained Diffusion Models

Dvir Samuel, Rami Ben-Ari, Simon Raviv et al.

AAAI 2024paperarXiv:2304.14530

citations

#1303

AVID: Any-Length Video Inpainting with Diffusion Model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.

CVPR 2024arXiv:2312.03816

citations

#1304

Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram

Yeongyeon Na, Minje Park, Yunwon Tae et al.

ICLR 2024oralarXiv:2402.09450

citations

#1305

SolidGen: An Autoregressive Model for Direct B-rep Synthesis

Karl Willis, Joseph Lambourne, Nigel Morris et al.

ICLR 2024

citations

#1306

Task-Customized Mixture of Adapters for General Image Fusion

Pengfei Zhu, Yang Sun, Bing Cao et al.

CVPR 2024arXiv:2403.12494

citations

#1307

Open-Vocabulary Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

CVPR 2024arXiv:2311.07042

citations

#1308

Aligning and Prompting Everything All at Once for Universal Visual Perception

Yunhang Shen, Chaoyou Fu, Peixian Chen et al.

CVPR 2024arXiv:2312.02153

citations

#1309

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

Shuai Tan, Bin Ji, Mengxiao Bi et al.

ECCV 2024arXiv:2404.01647

citations

#1310

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

Kong Zhe, Yong Zhang, Tianyu Yang et al.

ECCV 2024arXiv:2403.10983

citations

#1311

Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

AAAI 2024paperarXiv:2301.11578

citations

#1312

Asymmetry in Low-Rank Adapters of Foundation Models

Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi et al.

ICML 2024arXiv:2402.16842

citations

#1313

Tensor Programs VI: Feature Learning in Infinite Depth Neural Networks

Greg Yang, Dingli Yu, Chen Zhu et al.

ICLR 2024arXiv:2310.02244

citations

#1314

Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic

Sachin Goyal, Pratyush Maini, Zachary Lipton et al.

CVPR 2024arXiv:2404.07177

citations

#1315

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.

CVPR 2024arXiv:2312.11994

citations

#1316

Recursive Generalization Transformer for Image Super-Resolution

Zheng Chen, Yulun Zhang, Jinjin Gu et al.

ICLR 2024arXiv:2303.06373

citations

#1317

Stochastic Interpolants with Data-Dependent Couplings

Michael Albergo, Mark Goldstein, Nicholas Boffi et al.

ICML 2024spotlightarXiv:2310.03725

citations

#1318

Towards Realistic Scene Generation with LiDAR Diffusion Models

Haoxi Ran, Vitor Guizilini, Yue Wang

CVPR 2024arXiv:2404.00815

citations

#1319

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024arXiv:2311.11666

citations

#1320

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones et al.

ICLR 2024arXiv:2309.15098

citations

#1321

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

Jing Liu, Ruihao Gong, Xiuying Wei et al.

ICLR 2024arXiv:2310.08041

citations

#1322

Grokking as the transition from lazy to rich training dynamics

Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.

ICLR 2024arXiv:2310.06110

citations

#1323

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Giorgio Mariani, Irene Tallini, Emilian Postolache et al.

ICLR 2024arXiv:2302.02257

citations

#1324

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024arXiv:2311.16194

citations

#1325

Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement

Kai Xu, Rongyu Chen, Gianni Franchi et al.

ICLR 2024arXiv:2310.00227

citations

#1326

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke et al.

ECCV 2024arXiv:2405.02771

citations

#1327

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

Honghao Chen, Xiangxiang Chu, Renyongjian et al.

CVPR 2024arXiv:2403.07589

citations

#1328

BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving

Haicheng Liao, Zhenning Li, Huanming Shen et al.

AAAI 2024paperarXiv:2312.06371

citations

#1329

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024arXiv:2403.17346

citations

#1330

Deep Temporal Graph Clustering

Meng Liu, Yue Liu, KE LIANG et al.

ICLR 2024oralarXiv:2305.10738

citations

#1331

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Juhong Min, Shyamal Buch, Arsha Nagrani et al.

CVPR 2024arXiv:2404.06511

citations

#1332

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

Chang Liu, Haoning Wu, Yujie Zhong et al.

CVPR 2024arXiv:2306.00973

citations

#1333

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024arXiv:2407.14499

citations

#1334

Gaussian Shell Maps for Efficient 3D Human Generation

Rameen Abdal, Wang Yifan, Zifan Shi et al.

CVPR 2024arXiv:2311.17857

citations

#1335

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.

CVPR 2024arXiv:2404.18962

citations

#1336

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Bo-Yuan Sun, Yuqi Yang, Le Zhang et al.

CVPR 2024arXiv:2306.04300

citations

#1337

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Pingchuan Ma, Johnson Tsun-Hsuan Wang, Minghao Guo et al.

ICML 2024arXiv:2405.09783

citations

#1338

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Feng Lu, Lijun Zhang, Xiangyuan Lan et al.

ICLR 2024arXiv:2402.14505

citations

#1339

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu

CVPR 2024arXiv:2403.14302

citations

#1340

Matryoshka Diffusion Models

Jiatao Gu, Shuangfei Zhai, Yizhe Zhang et al.

ICLR 2024arXiv:2310.15111

citations

#1341

Image Fusion via Vision-Language Model

Zixiang Zhao, Lilun Deng, Haowen Bai et al.

ICML 2024arXiv:2402.02235

citations

#1342

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024arXiv:2312.11598

citations

#1343

Successor Heads: Recurring, Interpretable Attention Heads In The Wild

Rhys Gould, Euan Ong, George Ogden et al.

ICLR 2024arXiv:2312.09230

citations

#1344

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Feng Liang, Bichen Wu, Jialiang Wang et al.

CVPR 2024highlightarXiv:2312.17681

citations

#1345

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Zhuowen Yuan, Zidi Xiong, Yi Zeng et al.

ICML 2024arXiv:2403.13031

citations

#1346

Looped Transformers are Better at Learning Learning Algorithms

Liu Yang, Kangwook Lee, Robert Nowak et al.

ICLR 2024arXiv:2311.12424

citations

#1347

GIVT: Generative Infinite-Vocabulary Transformers

Michael Tschannen, Cian Eastwood, Fabian Mentzer

ECCV 2024arXiv:2312.02116

citations

#1348

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024arXiv:2312.02439

citations

#1349

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen et al.

ECCV 2024arXiv:2404.14351

citations

#1350

Space Group Constrained Crystal Generation

Rui Jiao, Wenbing Huang, Yu Liu et al.

ICLR 2024arXiv:2402.03992

citations

#1351

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024arXiv:2311.07125

citations

#1352

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Haipeng Liu, Yang Wang, Biao Qian et al.

CVPR 2024arXiv:2403.19898

citations

#1353

Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts

Jiang-Xin Shi, Tong Wei, Zhi Zhou et al.

ICML 2024arXiv:2309.10019

citations

#1354

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Weining Ren, Zihan Zhu, Boyang Sun et al.

CVPR 2024arXiv:2405.18715

citations

#1355

DiffusionTrack: Diffusion Model for Multi-Object Tracking

Run Luo, Zikai Song, Lintao Ma et al.

AAAI 2024paperarXiv:2308.09905

citations

#1356

An Emulator for Fine-tuning Large Language Models using Small Language Models

Eric Mitchell, Rafael Rafailov, Archit Sharma et al.

ICLR 2024oralarXiv:2310.12962

citations

#1357

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Mingxiao Li, Tingyu Qu, Ruicong Yao et al.

ICLR 2024arXiv:2305.15583

citations

#1358

Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.

CVPR 2024arXiv:2401.09414

citations

#1359

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920

citations

#1360

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

ECCV 2024arXiv:2402.14654

citations

#1361

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Yushi Huang, Ruihao Gong, Jing Liu et al.

CVPR 2024highlightarXiv:2311.16503

citations

#1362

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

JINLONG LI, Baolu Li, Zhengzhong Tu et al.

CVPR 2024arXiv:2404.04804

citations

#1363

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Jiuding Sun, Chantal Shaib, Byron Wallace

ICLR 2024spotlightarXiv:2306.11270

citations

#1364

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024arXiv:2404.03181

citations

#1365

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Sungwon Han, Jinsung Yoon, Sercan Arik et al.

ICML 2024arXiv:2404.09491

citations

#1366

Make RepVGG Greater Again: A Quantization-Aware Approach

Xuesong Nie, Yunfeng Yan, Siyuan Li et al.

AAAI 2024paperarXiv:2212.01593

citations

#1367

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Hang Li, Chengzhi Shen, Philip H.S. Torr et al.

CVPR 2024arXiv:2311.17216

citations

#1368

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Aleksander Holynski et al.

CVPR 2024arXiv:2404.01203

citations

#1369

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee et al.

ICLR 2024arXiv:2403.14119

citations

#1370

GIM: Learning Generalizable Image Matcher From Internet Videos

Xuelun Shen, zhipeng cai, Wei Yin et al.

ICLR 2024spotlightarXiv:2402.11095

citations

#1371

On the Foundations of Shortcut Learning

Katherine Hermann, Hossein Mobahi, Thomas FEL et al.

ICLR 2024spotlightarXiv:2310.16228

citations

#1372

EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.

CVPR 2024arXiv:2404.19110

citations

#1373

TabR: Tabular Deep Learning Meets Nearest Neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev et al.

ICLR 2024arXiv:2307.14338

citations

#1374

GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs

Pengcheng Jiang, Cao Xiao, Adam Cross et al.

ICLR 2024arXiv:2305.12788

citations

#1375

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Danny Halawi, Alexander Wei, Eric Wallace et al.

ICML 2024arXiv:2406.20053

citations

#1376

NExT: Teaching Large Language Models to Reason about Code Execution

Ansong Ni, Miltiadis Allamanis, Arman Cohan et al.

ICML 2024arXiv:2404.14662

citations

#1377

Repoformer: Selective Retrieval for Repository-Level Code Completion

Di Wu, Wasi Ahmad, Dejiao Zhang et al.

ICML 2024arXiv:2403.10059

citations

#1378

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.

CVPR 2024arXiv:2402.12259

citations

#1379

GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Cunxiao Du, Jing Jiang, Xu Yuanchen et al.

ICML 2024arXiv:2402.02082

citations

#1380

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024arXiv:2312.12730

citations

#1381

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024arXiv:2401.02460

citations

#1382

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization

Jin Zhou, Charles Staats, Wenda Li et al.

ICLR 2024arXiv:2403.18120

citations

#1383

Dense Reward for Free in Reinforcement Learning from Human Feedback

Alexander Chan, Hao Sun, Samuel Holt et al.

ICML 2024arXiv:2402.00782

citations

#1384

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Song Tang, Wenxin Su, Mao Ye et al.

CVPR 2024arXiv:2311.16510

citations

#1385

Language-Image Pre-training with Long Captions

Kecheng Zheng, Yifei Zhang, Wei Wu et al.

ECCV 2024arXiv:2403.17007

citations

#1386

Robust agents learn causal world models

Jonathan Richens, Tom Everitt

ICLR 2024arXiv:2402.10877

citations

#1387

Variational Bayesian Last Layers

James Harrison, John Willes, Jasper Snoek

ICLR 2024spotlightarXiv:2404.11599

citations

#1388

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML 2024arXiv:2405.17730

citations

#1389

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024paperarXiv:2308.13198

citations

#1390

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346

citations

#1391

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

ECCV 2024arXiv:2407.11699

citations

#1392

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun et al.

CVPR 2024highlightarXiv:2312.02980

citations

#1393

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.

CVPR 2024arXiv:2403.18271

citations

#1394

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024arXiv:2312.10113

citations

#1395

Masked Audio Generation using a Single Non-Autoregressive Transformer

Alon Ziv, Itai Gat, Gael Le Lan et al.

ICLR 2024arXiv:2401.04577

citations

#1396

ImagenHub: Standardizing the evaluation of conditional image generation models

Max Ku, Tianle Li, Kai Zhang et al.

ICLR 2024arXiv:2310.01596

citations

#1397

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

Jaehyeon Kim, Keon Lee, Seungjun Chung et al.

ICLR 2024arXiv:2404.02781

citations

#1398

Prompt-tuning Latent Diffusion Models for Inverse Problems

Hyungjin Chung, Jong Chul YE, Peyman Milanfar et al.

ICML 2024arXiv:2310.01110

citations

#1399

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024arXiv:2404.03645

citations

#1400

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Aaditya Singh, Ted Moskovitz, Feilx Hill et al.

ICML 2024spotlightarXiv:2404.07129

citations

← Previous

1...5 6 7 8 9...62