Most Cited 2024 "time reversibility" Papers

12,324 papers found • Page 10 of 62

#1801

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024arXiv:2305.11618
50
citations
#1802

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.

ECCV 2024arXiv:2402.17430
50
citations
#1803

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao et al.

ECCV 2024arXiv:2403.04640
50
citations
#1804

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
50
citations
#1805

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005
49
citations
#1806

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024arXiv:2312.14233
49
citations
#1807

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang, L. F. D’Haro, Yiming Chen et al.

AAAI 2024paperarXiv:2312.15407
49
citations
#1808

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Josh Alman, Zhao Song

ICLR 2024spotlightarXiv:2310.04064
49
citations
#1809

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

Mengke Li, Zhikai HU, Yang Lu et al.

AAAI 2024paperarXiv:2306.06963
49
citations
#1810

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

Zhanjie Zhang, Quanwei Zhang, Wei Xing et al.

AAAI 2024paperarXiv:2312.06135
49
citations
#1811

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024spotlightarXiv:2405.02421
49
citations
#1812

Generative Latent Coding for Ultra-Low Bitrate Image Compression

Zhaoyang Jia, Jiahao Li, Bin Li et al.

CVPR 2024arXiv:2512.20194
49
citations
#1813

Fully Sparse 3D Occupancy Prediction

Haisong Liu, Yang Chen, Haiguang Wang et al.

ECCV 2024arXiv:2312.17118
49
citations
#1814

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254
49
citations
#1815

FROSTER: Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition

Xiaohu Huang, Hao Zhou, Kun Yao et al.

ICLR 2024oralarXiv:2402.03241
49
citations
#1816

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

ICLR 2024arXiv:2309.16620
49
citations
#1817

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.

ECCV 2024arXiv:2312.01261
49
citations
#1818

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Hui LIN, Zhiheng Ma, Xiaopeng Hong et al.

AAAI 2024paperarXiv:2401.03870
49
citations
#1819

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024arXiv:2403.07592
49
citations
#1820

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.

ECCV 2024arXiv:2403.15385
49
citations
#1821

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024arXiv:2405.04966
49
citations
#1822

Simplifying Transformer Blocks

Bobby He, Thomas Hofmann

ICLR 2024arXiv:2311.01906
49
citations
#1823

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.

ICML 2024arXiv:2405.19592
49
citations
#1824

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024arXiv:2311.17404
49
citations
#1825

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Siyan Zhao, John Dang, Aditya Grover

ICLR 2024arXiv:2310.11523
49
citations
#1826

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Xin Zhou, Dingkang Liang, Wei Xu et al.

CVPR 2024arXiv:2403.01439
49
citations
#1827

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen, Enze Xie, Zhe Chen et al.

ICLR 2024arXiv:2306.04607
49
citations
#1828

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

ECCV 2024arXiv:2312.03048
49
citations
#1829

MS-DETR: Efficient DETR Training with Mixed Supervision

Chuyang Zhao, Yifan Sun, Wenhao Wang et al.

CVPR 2024arXiv:2401.03989
49
citations
#1830

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker et al.

ECCV 2024arXiv:2308.08947
49
citations
#1831

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Sipeng Zheng, jiazheng liu, Yicheng Feng et al.

ICLR 2024arXiv:2310.13255
49
citations
#1832

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024arXiv:2403.15447
49
citations
#1833

Causal Representation Learning from Multiple Distributions: A General Setting

Kun Zhang, Shaoan Xie, Ignavier Ng et al.

ICML 2024oralarXiv:2402.05052
49
citations
#1834

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.

CVPR 2024arXiv:2311.18608
49
citations
#1835

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao, Yuxin Mao, Bowen He et al.

AAAI 2024paperarXiv:2308.08288
49
citations
#1836

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

ECCV 2024arXiv:2403.13570
49
citations
#1837

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024arXiv:2306.09884
48
citations
#1838

Trajeglish: Traffic Modeling as Next-Token Prediction

Jonah Philion, Xue Bin Peng, Sanja Fidler

ICLR 2024arXiv:2312.04535
48
citations
#1839

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Xianghe Pang, shuo tang, Rui Ye et al.

ICML 2024spotlightarXiv:2402.05699
48
citations
#1840

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985
48
citations
#1841

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024arXiv:2403.12488
48
citations
#1842

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024arXiv:2404.02117
48
citations
#1843

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
48
citations
#1844

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu, Zhongyi Sun, Zexi Li et al.

ICML 2024arXiv:2402.12048
48
citations
#1845

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Yair Gat, Nitay Calderon, Amir Feder et al.

ICLR 2024arXiv:2310.00603
48
citations
#1846

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024arXiv:2312.06505
48
citations
#1847

When Fast Fourier Transform Meets Transformer for Image Restoration

xingyu jiang, Xiuhui Zhang, Ning Gao et al.

ECCV 2024
48
citations
#1848

Linguistic Calibration of Long-Form Generations

Neil Band, Xuechen Li, Tengyu Ma et al.

ICML 2024arXiv:2404.00474
48
citations
#1849

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crnčević et al.

ICML 2024arXiv:2401.04679
48
citations
#1850

Learning with Mixture of Prototypes for Out-of-Distribution Detection

Haodong Lu, Dong Gong, Shuo Wang et al.

ICLR 2024arXiv:2402.02653
48
citations
#1851

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784
48
citations
#1852

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.

ICLR 2024arXiv:2306.03117
48
citations
#1853

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024arXiv:2312.00825
48
citations
#1854

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024arXiv:2403.10615
48
citations
#1855

DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity

Yeying Jin, Wenhan Yang, W. Ye et al.

AAAI 2024paperarXiv:2211.08089
48
citations
#1856

MeaCap: Memory-Augmented Zero-shot Image Captioning

Zequn Zeng, Yan Xie, Hao Zhang et al.

CVPR 2024arXiv:2403.03715
48
citations
#1857

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024arXiv:2409.08270
48
citations
#1858

Improving Automatic VQA Evaluation Using Large Language Models

Oscar Mañas, Benno Krojer, Aishwarya Agrawal

AAAI 2024paperarXiv:2310.02567
48
citations
#1859

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Song Wang, Jiawei Yu, Wentong Li et al.

CVPR 2024arXiv:2404.11958
48
citations
#1860

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024arXiv:2312.09222
48
citations
#1861

Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation

Tuan Le, Julian Cremer, Frank Noe et al.

ICLR 2024arXiv:2309.17296
48
citations
#1862

Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection

Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.

AAAI 2024paper
48
citations
#1863

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024arXiv:2404.09216
48
citations
#1864

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Banghua Zhu, Michael Jordan, Jiantao Jiao

ICML 2024arXiv:2401.16335
48
citations
#1865

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024arXiv:2312.04884
48
citations
#1866

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.

ICLR 2024arXiv:2310.00535
48
citations
#1867

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255
48
citations
#1868

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.

ECCV 2024arXiv:2403.15098
48
citations
#1869

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024arXiv:2402.14780
48
citations
#1870

Deep Networks Always Grok and Here is Why

Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

ICML 2024arXiv:2402.15555
47
citations
#1871

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Yuiga Wada, Kanta Kaneda, Daichi Saito et al.

CVPR 2024highlightarXiv:2402.18091
47
citations
#1872

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024arXiv:2312.02284
47
citations
#1873

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024
47
citations
#1874

Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

Weijia Xu, Andrzej Banburski-Fahey, Nebojsa Jojic

ICML 2024arXiv:2305.09993
47
citations
#1875

Unifying Visual and Vision-Language Tracking via Contrastive Learning

AAAI 2024paperarXiv:2401.11228
47
citations
#1876

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

Yi ZHOU, Hui Zhang, Jiaqian Yu et al.

CVPR 2024arXiv:2403.08639
47
citations
#1877

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024arXiv:2305.10300
47
citations
#1878

Variational Learning is Effective for Large Deep Networks

Yuesong Shen, Nico Daheim, Bai Cong et al.

ICML 2024spotlightarXiv:2402.17641
47
citations
#1879

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024arXiv:2407.08268
47
citations
#1880

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024arXiv:2312.04547
47
citations
#1881

Online Vectorized HD Map Construction using Geometry

Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding et al.

ECCV 2024arXiv:2312.03341
47
citations
#1882

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024arXiv:2403.02991
47
citations
#1883

GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

Han Zhou, Wei Dong, Xiaohong Liu et al.

ECCV 2024arXiv:2407.12431
47
citations
#1884

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

Yurui Qian, Qi Cai, Yingwei Pan et al.

CVPR 2024arXiv:2403.17870
47
citations
#1885

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Yinya Huang, Xiaohan Lin, Zhengying Liu et al.

ICLR 2024spotlightarXiv:2402.08957
47
citations
#1886

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

ICLR 2024oralarXiv:2309.15289
47
citations
#1887

An Analysis of Linear Time Series Forecasting Models

William Toner, Luke Darlow

ICML 2024arXiv:2403.14587
47
citations
#1888

CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

Yassine Ouali, Adrian Bulat, Brais Martinez et al.

ECCV 2024arXiv:2408.10433
47
citations
#1889

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024arXiv:2312.00947
47
citations
#1890

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Qianjiang Hu, Zhimin Zhang, Wei Hu

ECCV 2024arXiv:2403.10094
47
citations
#1891

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu, Zheng Wang, Yonggan Fu et al.

ICML 2024arXiv:2406.15765
47
citations
#1892

Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs

Lean Wang, Wenkai Yang, Deli Chen et al.

ICLR 2024arXiv:2307.15992
47
citations
#1893

On the Embedding Collapse when Scaling up Recommendation Models

Xingzhuo Guo, Junwei Pan, Ximei Wang et al.

ICML 2024arXiv:2310.04400
47
citations
#1894

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024arXiv:2312.10908
47
citations
#1895

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Shentao Yang, Tianqi Chen, Mingyuan Zhou

ICML 2024oralarXiv:2402.08265
47
citations
#1896

Xformer: Hybrid X-Shaped Transformer for Image Denoising

Jiale Zhang, Yulun Zhang, Jinjin Gu et al.

ICLR 2024arXiv:2303.06440
47
citations
#1897

RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios

Wenhao Ding, Yulong Cao, DING ZHAO et al.

ECCV 2024arXiv:2312.13303
47
citations
#1898

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

Hongjun Wang, Sagar Vaze, Kai Han

ICLR 2024arXiv:2403.13684
47
citations
#1899

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He, Yiheng Deng, SHIXIANG TANG et al.

CVPR 2024arXiv:2306.07520
47
citations
#1900

AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Fulong Ye, Guang Liu, Xinya Wu et al.

AAAI 2024paperarXiv:2308.09991
47
citations
#1901

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang, Jie Zhang, Zheng Yuan et al.

CVPR 2024arXiv:2401.04350
47
citations
#1902

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang et al.

ECCV 2024arXiv:2404.04421
46
citations
#1903

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Jaroslaw Blasiok, Preetum Nakkiran

ICLR 2024
46
citations
#1904

LivePhoto: Real Image Animation with Text-guided Motion Control

Xi Chen, Zhiheng Liu, Mengting Chen et al.

ECCV 2024arXiv:2312.02928
46
citations
#1905

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang et al.

AAAI 2024paperarXiv:2302.14691
46
citations
#1906

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024arXiv:2312.04521
46
citations
#1907

Localizing and Editing Knowledge In Text-to-Image Generative Models

Samyadeep Basu, Nanxuan Zhao, Vlad Morariu et al.

ICLR 2024arXiv:2310.13730
46
citations
#1908

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

JUNCHAO GONG, LEI BAI, Peng Ye et al.

ICML 2024arXiv:2402.04290
46
citations
#1909

Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan et al.

ICLR 2024arXiv:2401.10191
46
citations
#1910

EasyTPP: Towards Open Benchmarking Temporal Point Processes

Siqiao Xue, Xiaoming Shi, Zhixuan Chu et al.

ICLR 2024oralarXiv:2307.08097
46
citations
#1911

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Hongxia Li, Wei Huang, Jingya Wang et al.

CVPR 2024arXiv:2403.00041
46
citations
#1912

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024arXiv:2312.01220
46
citations
#1913

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Seokhun Choi, Hyeonseop Song, Jaechul Kim et al.

ECCV 2024arXiv:2407.11793
46
citations
#1914

MagMax: Leveraging Model Merging for Seamless Continual Learning

Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.

ECCV 2024arXiv:2407.06322
46
citations
#1915

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Hao Fei, Shengqiong Wu, Wei Ji et al.

CVPR 2024arXiv:2308.13812
46
citations
#1916

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

Changhoon Kim, Kyle Min, Yezhou Yang

ECCV 2024arXiv:2405.16341
46
citations
#1917

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.

ECCV 2024arXiv:2405.00760
46
citations
#1918

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024arXiv:2312.17334
46
citations
#1919

MyVLM: Personalizing VLMs for User-Specific Queries

Yuval Alaluf, Elad Richardson, Sergey Tulyakov et al.

ECCV 2024arXiv:2403.14599
46
citations
#1920

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song et al.

ICML 2024arXiv:2402.04520
46
citations
#1921

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Yao Mu, Junting Chen, Qing-Long Zhang et al.

ICML 2024arXiv:2402.16117
46
citations
#1922

One-shot Empirical Privacy Estimation for Federated Learning

Galen Andrew, Peter Kairouz, Sewoong Oh et al.

ICLR 2024arXiv:2302.03098
46
citations
#1923

Scaling Laws for Sparsely-Connected Foundation Models

Elias Frantar, Carlos Riquelme Ruiz, Neil Houlsby et al.

ICLR 2024spotlightarXiv:2309.08520
46
citations
#1924

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024arXiv:2401.01887
46
citations
#1925

DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning

Huiping Zhuang, Run He, Kai Tong et al.

AAAI 2024paperarXiv:2403.17503
46
citations
#1926

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ICLR 2024arXiv:2311.15230
46
citations
#1927

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen, Chunwei Wang, Kuo Yang et al.

ICLR 2024arXiv:2310.10477
46
citations
#1928

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024arXiv:2310.05370
46
citations
#1929

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li et al.

CVPR 2024arXiv:2404.00876
46
citations
#1930

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024arXiv:2311.12386
46
citations
#1931

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

Senmao Li, Joost van de Weijer, taihang Hu et al.

ICLR 2024arXiv:2402.05375
46
citations
#1932

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

Peng, Xinyi Ling, Ziru Chen et al.

ICML 2024arXiv:2402.08831
46
citations
#1933

Dual Operating Modes of In-Context Learning

Ziqian Lin, Kangwook Lee

ICML 2024arXiv:2402.18819
46
citations
#1934

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Sreyan Ghosh, Ashish Seth, Sonal Kumar et al.

ICLR 2024arXiv:2310.08753
46
citations
#1935

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Hao Chen, Jindong Wang, Ankit Parag Shah et al.

ICLR 2024spotlightarXiv:2309.17002
46
citations
#1936

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.

AAAI 2024paperarXiv:2309.06933
46
citations
#1937

Active Preference Learning for Large Language Models

William Muldrew, Peter Hayes, Mingtian Zhang et al.

ICML 2024arXiv:2402.08114
46
citations
#1938

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024arXiv:2404.16622
46
citations
#1939

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024arXiv:2401.09865
46
citations
#1940

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

Guowei Xu, Ruijie Zheng, Yongyuan Liang et al.

ICLR 2024spotlightarXiv:2310.19668
46
citations
#1941

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024arXiv:2403.03421
46
citations
#1942

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou et al.

CVPR 2024highlightarXiv:2403.00436
46
citations
#1943

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang, Yongyi Su, Xun Xu et al.

CVPR 2024arXiv:2312.03502
46
citations
#1944

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024arXiv:2403.05061
46
citations
#1945

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

William Ljungbergh, Adam Tonderski, Joakim Johnander et al.

ECCV 2024arXiv:2404.07762
45
citations
#1946

SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

Inhwan Bae, Young-Jae Park, Hae-Gon Jeon

CVPR 2024arXiv:2403.18452
45
citations
#1947

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu, Zhitao Yang et al.

ICLR 2024arXiv:2403.18811
45
citations
#1948

DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model

Li Xiaofan, Zhang Yifu, Xiaoqing Ye

ECCV 2024
45
citations
#1949

Learning Transferable Negative Prompts for Out-of-Distribution Detection

Tianqi Li, Guansong Pang, wenjun miao et al.

CVPR 2024arXiv:2404.03248
45
citations
#1950

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024arXiv:2403.12409
45
citations
#1951

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024arXiv:2404.10241
45
citations
#1952

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

Yihan Wu, Zhengmian Hu, Junfeng Guo et al.

ICML 2024arXiv:2310.07710
45
citations
#1953

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.

CVPR 2024highlightarXiv:2402.17678
45
citations
#1954

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024arXiv:2312.02918
45
citations
#1955

Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

Ryan Wong, Necati Cihan Camgoz, Richard Bowden

ICLR 2024arXiv:2405.04164
45
citations
#1956

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang ZHANG, Yi Li et al.

CVPR 2024arXiv:2403.01818
45
citations
#1957

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.

ICLR 2024arXiv:2402.14606
45
citations
#1958

Exploiting Diffusion Prior for Generalizable Dense Prediction

Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.

CVPR 2024arXiv:2311.18832
45
citations
#1959

Gradient-based Parameter Selection for Efficient Fine-Tuning

Zhi Zhang, Qizhe Zhang, Zijun Gao et al.

CVPR 2024arXiv:2312.10136
45
citations
#1960

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, DAYAN GUAN et al.

ECCV 2024arXiv:2312.02896
45
citations
#1961

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.

CVPR 2024arXiv:2312.17247
45
citations
#1962

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024arXiv:2312.02702
45
citations
#1963

BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

Cheng Peng, Yutao Tang, Yifan Zhou et al.

ECCV 2024arXiv:2403.04926
45
citations
#1964

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024arXiv:2404.00906
45
citations
#1965

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024arXiv:2311.15879
45
citations
#1966

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024arXiv:2402.07635
45
citations
#1967

Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

Hao Sun, Alihan Hüyük, Mihaela van der Schaar

ICLR 2024arXiv:2309.06553
45
citations
#1968

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024arXiv:2404.14394
45
citations
#1969

LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents

Jae-Woo Choi, Youngwoo Yoon, Youngwoo Yoon et al.

ICLR 2024arXiv:2402.08178
45
citations
#1970

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024arXiv:2404.01260
45
citations
#1971

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024arXiv:2403.01849
45
citations
#1972

Graph Metanetworks for Processing Diverse Neural Architectures

Derek Lim, Haggai Maron, Marc T Law et al.

ICLR 2024spotlightarXiv:2312.04501
45
citations
#1973

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024paper
45
citations
#1974

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024arXiv:2401.00027
45
citations
#1975

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

Yulin Luo, Ruichuan An, Bocheng Zou et al.

ECCV 2024arXiv:2405.02363
45
citations
#1976

Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning

Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.

AAAI 2024paperarXiv:2310.01207
45
citations
#1977

ReGenNet: Towards Human Action-Reaction Synthesis

Liang Xu, Yizhou Zhou, Yichao Yan et al.

CVPR 2024arXiv:2403.11882
45
citations
#1978

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024arXiv:2309.13925
45
citations
#1979

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

ICLR 2024arXiv:2310.10402
45
citations
#1980

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612
45
citations
#1981

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024arXiv:2311.14758
45
citations
#1982

On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy

Letian Huang, Jiayang Bai, Jie Guo et al.

ECCV 2024arXiv:2402.00752
45
citations
#1983

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588
45
citations
#1984

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024arXiv:2311.13831
44
citations
#1985

ParCo: Part-Coordinating Text-to-Motion Synthesis

Qiran Zou, Shangyuan Yuan, Shian Du et al.

ECCV 2024arXiv:2403.18512
44
citations
#1986

Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi et al.

ICLR 2024arXiv:2401.08898
44
citations
#1987

Improved Probabilistic Image-Text Representations

Sanghyuk Chun

ICLR 2024arXiv:2305.18171
44
citations
#1988

Equivariant Graph Neural Operator for Modeling 3D Dynamics

Minkai Xu, Jiaqi Han, Aaron Lou et al.

ICML 2024oralarXiv:2401.11037
44
citations
#1989

MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process

Xinyao Fan, Yueying Wu, Chang XU et al.

ICLR 2024arXiv:2403.05751
44
citations
#1990

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024arXiv:2402.07220
44
citations
#1991

Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

Ilan Naiman, N. Benjamin Erichson, Pu Ren et al.

ICLR 2024arXiv:2310.02619
44
citations
#1992

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.

CVPR 2024arXiv:2403.12532
44
citations
#1993

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

Jiaqi Liu, Kai Wu, Qiang Nie et al.

AAAI 2024paperarXiv:2401.01010
44
citations
#1994

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Tao Huang, Guangqi Jiang, Yanjie Ze et al.

ECCV 2024arXiv:2312.14134
44
citations
#1995

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Qucheng Peng, Ce Zheng, Chen Chen

CVPR 2024arXiv:2403.11310
44
citations
#1996

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

Yan Wang, Zhixuan Chu, Xin Ouyang et al.

AAAI 2024paper
44
citations
#1997

Zero-Shot Detection of AI-Generated Images

Davide Cozzolino, GIovanni Poggi, Matthias Niessner et al.

ECCV 2024arXiv:2409.15875
44
citations
#1998

LLMs are Good Action Recognizers

Haoxuan Qu, Yujun Cai, Jun Liu

CVPR 2024arXiv:2404.00532
44
citations
#1999

Unveiling the Pitfalls of Knowledge Editing for Large Language Models

Zhoubo Li, Ningyu Zhang, Yunzhi Yao et al.

ICLR 2024arXiv:2310.02129
44
citations
#2000

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.

ICLR 2024arXiv:2311.14904
44
citations