Most Cited 2024 "referring expression editing" Papers

12,324 papers found • Page 10 of 62

Filters:Most Cited 2024 referring expression editing Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#1801

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024arXiv:2305.11618

citations

#1802

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.

ECCV 2024arXiv:2402.17430

citations

#1803

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao et al.

ECCV 2024arXiv:2403.04640

citations

#1804

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435

citations

#1805

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005

citations

#1806

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024arXiv:2312.14233

citations

#1807

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang, L. F. D’Haro, Yiming Chen et al.

AAAI 2024paperarXiv:2312.15407

citations

#1808

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Josh Alman, Zhao Song

ICLR 2024spotlightarXiv:2310.04064

citations

#1809

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

Mengke Li, Zhikai HU, Yang Lu et al.

AAAI 2024paperarXiv:2306.06963

citations

#1810

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

Zhanjie Zhang, Quanwei Zhang, Wei Xing et al.

AAAI 2024paperarXiv:2312.06135

citations

#1811

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024spotlightarXiv:2405.02421

citations

#1812

Generative Latent Coding for Ultra-Low Bitrate Image Compression

Zhaoyang Jia, Jiahao Li, Bin Li et al.

CVPR 2024arXiv:2512.20194

citations

#1813

Fully Sparse 3D Occupancy Prediction

Haisong Liu, Yang Chen, Haiguang Wang et al.

ECCV 2024arXiv:2312.17118

citations

#1814

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254

citations

#1815

FROSTER: Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition

Xiaohu Huang, Hao Zhou, Kun Yao et al.

ICLR 2024oralarXiv:2402.03241

citations

#1816

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

ICLR 2024arXiv:2309.16620

citations

#1817

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.

ECCV 2024arXiv:2312.01261

citations

#1818

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Hui LIN, Zhiheng Ma, Xiaopeng Hong et al.

AAAI 2024paperarXiv:2401.03870

citations

#1819

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024arXiv:2403.07592

citations

#1820

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.

ECCV 2024arXiv:2403.15385

citations

#1821

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024arXiv:2405.04966

citations

#1822

Simplifying Transformer Blocks

Bobby He, Thomas Hofmann

ICLR 2024arXiv:2311.01906

citations

#1823

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.

ICML 2024arXiv:2405.19592

citations

#1824

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024arXiv:2311.17404

citations

#1825

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Siyan Zhao, John Dang, Aditya Grover

ICLR 2024arXiv:2310.11523

citations

#1826

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Xin Zhou, Dingkang Liang, Wei Xu et al.

CVPR 2024arXiv:2403.01439

citations

#1827

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen, Enze Xie, Zhe Chen et al.

ICLR 2024arXiv:2306.04607

citations

#1828

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

ECCV 2024arXiv:2312.03048

citations

#1829

MS-DETR: Efficient DETR Training with Mixed Supervision

Chuyang Zhao, Yifan Sun, Wenhao Wang et al.

CVPR 2024arXiv:2401.03989

citations

#1830

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker et al.

ECCV 2024arXiv:2308.08947

citations

#1831

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Sipeng Zheng, jiazheng liu, Yicheng Feng et al.

ICLR 2024arXiv:2310.13255

citations

#1832

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024arXiv:2403.15447

citations

#1833

Causal Representation Learning from Multiple Distributions: A General Setting

Kun Zhang, Shaoan Xie, Ignavier Ng et al.

ICML 2024oralarXiv:2402.05052

citations

#1834

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.

CVPR 2024arXiv:2311.18608

citations

#1835

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao, Yuxin Mao, Bowen He et al.

AAAI 2024paperarXiv:2308.08288

citations

#1836

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

ECCV 2024arXiv:2403.13570

citations

#1837

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024arXiv:2306.09884

citations

#1838

Trajeglish: Traffic Modeling as Next-Token Prediction

Jonah Philion, Xue Bin Peng, Sanja Fidler

ICLR 2024arXiv:2312.04535

citations

#1839

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Xianghe Pang, shuo tang, Rui Ye et al.

ICML 2024spotlightarXiv:2402.05699

citations

#1840

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985

citations

#1841

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024arXiv:2403.12488

citations

#1842

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024arXiv:2404.02117

citations

#1843

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608

citations

#1844

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Didi Zhu, Zhongyi Sun, Zexi Li et al.

ICML 2024arXiv:2402.12048

citations

#1845

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Yair Gat, Nitay Calderon, Amir Feder et al.

ICLR 2024arXiv:2310.00603

citations

#1846

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024arXiv:2312.06505

citations

#1847

When Fast Fourier Transform Meets Transformer for Image Restoration

xingyu jiang, Xiuhui Zhang, Ning Gao et al.

ECCV 2024

citations

#1848

Linguistic Calibration of Long-Form Generations

Neil Band, Xuechen Li, Tengyu Ma et al.

ICML 2024arXiv:2404.00474

citations

#1849

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

Mahdi Nikdan, Soroush Tabesh, Elvir Crnčević et al.

ICML 2024arXiv:2401.04679

citations

#1850

Learning with Mixture of Prototypes for Out-of-Distribution Detection

Haodong Lu, Dong Gong, Shuo Wang et al.

ICLR 2024arXiv:2402.02653

citations

#1851

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784

citations

#1852

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.

ICLR 2024arXiv:2306.03117

citations

#1853

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024arXiv:2312.00825

citations

#1854

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024arXiv:2403.10615

citations

#1855

DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity

Yeying Jin, Wenhan Yang, W. Ye et al.

AAAI 2024paperarXiv:2211.08089

citations

#1856

MeaCap: Memory-Augmented Zero-shot Image Captioning

Zequn Zeng, Yan Xie, Hao Zhang et al.

CVPR 2024arXiv:2403.03715

citations

#1857

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024arXiv:2409.08270

citations

#1858

Improving Automatic VQA Evaluation Using Large Language Models

Oscar Mañas, Benno Krojer, Aishwarya Agrawal

AAAI 2024paperarXiv:2310.02567

citations

#1859

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Song Wang, Jiawei Yu, Wentong Li et al.

CVPR 2024arXiv:2404.11958

citations

#1860

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024arXiv:2312.09222

citations

#1861

Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation

Tuan Le, Julian Cremer, Frank Noe et al.

ICLR 2024arXiv:2309.17296

citations

#1862

Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection

Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.

AAAI 2024paper

citations

#1863

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024arXiv:2404.09216

citations

#1864

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Banghua Zhu, Michael Jordan, Jiantao Jiao

ICML 2024arXiv:2401.16335

citations

#1865

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024arXiv:2312.04884

citations

#1866

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.

ICLR 2024arXiv:2310.00535

citations

#1867

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255

citations

#1868

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.

ECCV 2024arXiv:2403.15098

citations

#1869

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024arXiv:2402.14780

citations

#1870

Deep Networks Always Grok and Here is Why

Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

ICML 2024arXiv:2402.15555

citations

#1871

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Yuiga Wada, Kanta Kaneda, Daichi Saito et al.

CVPR 2024highlightarXiv:2402.18091

citations

#1872

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024arXiv:2312.02284

citations

#1873

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024

citations

#1874

Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

Weijia Xu, Andrzej Banburski-Fahey, Nebojsa Jojic

ICML 2024arXiv:2305.09993

citations

#1875

Unifying Visual and Vision-Language Tracking via Contrastive Learning

AAAI 2024paperarXiv:2401.11228

citations

#1876

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

Yi ZHOU, Hui Zhang, Jiaqian Yu et al.

CVPR 2024arXiv:2403.08639

citations

#1877

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024arXiv:2305.10300

citations

#1878

Variational Learning is Effective for Large Deep Networks

Yuesong Shen, Nico Daheim, Bai Cong et al.

ICML 2024spotlightarXiv:2402.17641

citations

#1879

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024arXiv:2407.08268

citations

#1880

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024arXiv:2312.04547

citations

#1881

Online Vectorized HD Map Construction using Geometry

Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding et al.

ECCV 2024arXiv:2312.03341

citations

#1882

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024arXiv:2403.02991

citations

#1883

GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

Han Zhou, Wei Dong, Xiaohong Liu et al.

ECCV 2024arXiv:2407.12431

citations

#1884

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

Yurui Qian, Qi Cai, Yingwei Pan et al.

CVPR 2024arXiv:2403.17870

citations

#1885

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Yinya Huang, Xiaohan Lin, Zhengying Liu et al.

ICLR 2024spotlightarXiv:2402.08957

citations

#1886

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

ICLR 2024oralarXiv:2309.15289

citations

#1887

An Analysis of Linear Time Series Forecasting Models

William Toner, Luke Darlow

ICML 2024arXiv:2403.14587

citations

#1888

CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

Yassine Ouali, Adrian Bulat, Brais Martinez et al.

ECCV 2024arXiv:2408.10433

citations

#1889

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024arXiv:2312.00947

citations

#1890

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Qianjiang Hu, Zhimin Zhang, Wei Hu

ECCV 2024arXiv:2403.10094

citations

#1891

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu, Zheng Wang, Yonggan Fu et al.

ICML 2024arXiv:2406.15765

citations

#1892

Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs

Lean Wang, Wenkai Yang, Deli Chen et al.

ICLR 2024arXiv:2307.15992

citations

#1893

On the Embedding Collapse when Scaling up Recommendation Models

Xingzhuo Guo, Junwei Pan, Ximei Wang et al.

ICML 2024arXiv:2310.04400

citations

#1894

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024arXiv:2312.10908

citations

#1895

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Shentao Yang, Tianqi Chen, Mingyuan Zhou

ICML 2024oralarXiv:2402.08265

citations

#1896

Xformer: Hybrid X-Shaped Transformer for Image Denoising

Jiale Zhang, Yulun Zhang, Jinjin Gu et al.

ICLR 2024arXiv:2303.06440

citations

#1897

RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios

Wenhao Ding, Yulong Cao, DING ZHAO et al.

ECCV 2024arXiv:2312.13303

citations

#1898

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning

Hongjun Wang, Sagar Vaze, Kai Han

ICLR 2024arXiv:2403.13684

citations

#1899

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He, Yiheng Deng, SHIXIANG TANG et al.

CVPR 2024arXiv:2306.07520

citations

#1900

AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Fulong Ye, Guang Liu, Xinya Wu et al.

AAAI 2024paperarXiv:2308.09991

citations

#1901

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang, Jie Zhang, Zheng Yuan et al.

CVPR 2024arXiv:2401.04350

citations

#1902

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang et al.

ECCV 2024arXiv:2404.04421

citations

#1903

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Jaroslaw Blasiok, Preetum Nakkiran

ICLR 2024

citations

#1904

LivePhoto: Real Image Animation with Text-guided Motion Control

Xi Chen, Zhiheng Liu, Mengting Chen et al.

ECCV 2024arXiv:2312.02928

citations

#1905

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang et al.

AAAI 2024paperarXiv:2302.14691

citations

#1906

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024arXiv:2312.04521

citations

#1907

Localizing and Editing Knowledge In Text-to-Image Generative Models

Samyadeep Basu, Nanxuan Zhao, Vlad Morariu et al.

ICLR 2024arXiv:2310.13730

citations

#1908

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

JUNCHAO GONG, LEI BAI, Peng Ye et al.

ICML 2024arXiv:2402.04290

citations

#1909

Divide and not forget: Ensemble of selectively trained experts in Continual Learning

Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan et al.

ICLR 2024arXiv:2401.10191

citations

#1910

EasyTPP: Towards Open Benchmarking Temporal Point Processes

Siqiao Xue, Xiaoming Shi, Zhixuan Chu et al.

ICLR 2024oralarXiv:2307.08097

citations

#1911

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Hongxia Li, Wei Huang, Jingya Wang et al.

CVPR 2024arXiv:2403.00041

citations

#1912

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024arXiv:2312.01220

citations

#1913

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Seokhun Choi, Hyeonseop Song, Jaechul Kim et al.

ECCV 2024arXiv:2407.11793

citations

#1914

MagMax: Leveraging Model Merging for Seamless Continual Learning

Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.

ECCV 2024arXiv:2407.06322

citations

#1915

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

Hao Fei, Shengqiong Wu, Wei Ji et al.

CVPR 2024arXiv:2308.13812

citations

#1916

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

Changhoon Kim, Kyle Min, Yezhou Yang

ECCV 2024arXiv:2405.16341

citations

#1917

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.

ECCV 2024arXiv:2405.00760

citations

#1918

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024arXiv:2312.17334

citations

#1919

MyVLM: Personalizing VLMs for User-Specific Queries

Yuval Alaluf, Elad Richardson, Sergey Tulyakov et al.

ECCV 2024arXiv:2403.14599

citations

#1920

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song et al.

ICML 2024arXiv:2402.04520

citations

#1921

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Yao Mu, Junting Chen, Qing-Long Zhang et al.

ICML 2024arXiv:2402.16117

citations

#1922

One-shot Empirical Privacy Estimation for Federated Learning

Galen Andrew, Peter Kairouz, Sewoong Oh et al.

ICLR 2024arXiv:2302.03098

citations

#1923

Scaling Laws for Sparsely-Connected Foundation Models

Elias Frantar, Carlos Riquelme Ruiz, Neil Houlsby et al.

ICLR 2024spotlightarXiv:2309.08520

citations

#1924

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024arXiv:2401.01887

citations

#1925

DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning

Huiping Zhuang, Run He, Kai Tong et al.

AAAI 2024paperarXiv:2403.17503

citations

#1926

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ICLR 2024arXiv:2311.15230

citations

#1927

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen, Chunwei Wang, Kuo Yang et al.

ICLR 2024arXiv:2310.10477

citations

#1928

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024arXiv:2310.05370

citations

#1929

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li et al.

CVPR 2024arXiv:2404.00876

citations

#1930

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024arXiv:2311.12386

citations

#1931

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

Senmao Li, Joost van de Weijer, taihang Hu et al.

ICLR 2024arXiv:2402.05375

citations

#1932

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

Peng, Xinyi Ling, Ziru Chen et al.

ICML 2024arXiv:2402.08831

citations

#1933

Dual Operating Modes of In-Context Learning

Ziqian Lin, Kangwook Lee

ICML 2024arXiv:2402.18819

citations

#1934

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Sreyan Ghosh, Ashish Seth, Sonal Kumar et al.

ICLR 2024arXiv:2310.08753

citations

#1935

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Hao Chen, Jindong Wang, Ankit Parag Shah et al.

ICLR 2024spotlightarXiv:2309.17002

citations

#1936

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.

AAAI 2024paperarXiv:2309.06933

citations

#1937

Active Preference Learning for Large Language Models

William Muldrew, Peter Hayes, Mingtian Zhang et al.

ICML 2024arXiv:2402.08114

citations

#1938

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024arXiv:2404.16622

citations

#1939

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024arXiv:2401.09865

citations

#1940

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

Guowei Xu, Ruijie Zheng, Yongyuan Liang et al.

ICLR 2024spotlightarXiv:2310.19668

citations

#1941

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024arXiv:2403.03421

citations

#1942

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou et al.

CVPR 2024highlightarXiv:2403.00436

citations

#1943

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation

Haojie Zhang, Yongyi Su, Xun Xu et al.

CVPR 2024arXiv:2312.03502

citations

#1944

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024arXiv:2403.05061

citations

#1945

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

William Ljungbergh, Adam Tonderski, Joakim Johnander et al.

ECCV 2024arXiv:2404.07762

citations

#1946

SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

Inhwan Bae, Young-Jae Park, Hae-Gon Jeon

CVPR 2024arXiv:2403.18452

citations

#1947

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu, Zhitao Yang et al.

ICLR 2024arXiv:2403.18811

citations

#1948

DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model

Li Xiaofan, Zhang Yifu, Xiaoqing Ye

ECCV 2024

citations

#1949

Learning Transferable Negative Prompts for Out-of-Distribution Detection

Tianqi Li, Guansong Pang, wenjun miao et al.

CVPR 2024arXiv:2404.03248

citations

#1950

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024arXiv:2403.12409

citations

#1951

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024arXiv:2404.10241

citations

#1952

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

Yihan Wu, Zhengmian Hu, Junfeng Guo et al.

ICML 2024arXiv:2310.07710

citations

#1953

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.

CVPR 2024highlightarXiv:2402.17678

citations

#1954

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024arXiv:2312.02918

citations

#1955

Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

Ryan Wong, Necati Cihan Camgoz, Richard Bowden

ICLR 2024arXiv:2405.04164

citations

#1956

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang ZHANG, Yi Li et al.

CVPR 2024arXiv:2403.01818

citations

#1957

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.

ICLR 2024arXiv:2402.14606

citations

#1958

Exploiting Diffusion Prior for Generalizable Dense Prediction

Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.

CVPR 2024arXiv:2311.18832

citations

#1959

Gradient-based Parameter Selection for Efficient Fine-Tuning

Zhi Zhang, Qizhe Zhang, Zijun Gao et al.

CVPR 2024arXiv:2312.10136

citations

#1960

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, DAYAN GUAN et al.

ECCV 2024arXiv:2312.02896

citations

#1961

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.

CVPR 2024arXiv:2312.17247

citations

#1962

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024arXiv:2312.02702

citations

#1963

BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

Cheng Peng, Yutao Tang, Yifan Zhou et al.

ECCV 2024arXiv:2403.04926

citations

#1964

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024arXiv:2404.00906

citations

#1965

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024arXiv:2311.15879

citations

#1966

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024arXiv:2402.07635

citations

#1967

Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

Hao Sun, Alihan Hüyük, Mihaela van der Schaar

ICLR 2024arXiv:2309.06553

citations

#1968

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024arXiv:2404.14394

citations

#1969

LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents

Jae-Woo Choi, Youngwoo Yoon, Youngwoo Yoon et al.

ICLR 2024arXiv:2402.08178

citations

#1970

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024arXiv:2404.01260

citations

#1971

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024arXiv:2403.01849

citations

#1972

Graph Metanetworks for Processing Diverse Neural Architectures

Derek Lim, Haggai Maron, Marc T Law et al.

ICLR 2024spotlightarXiv:2312.04501

citations

#1973

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024paper

citations

#1974

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024arXiv:2401.00027

citations

#1975

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

Yulin Luo, Ruichuan An, Bocheng Zou et al.

ECCV 2024arXiv:2405.02363

citations

#1976

Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning

Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.

AAAI 2024paperarXiv:2310.01207

citations

#1977

ReGenNet: Towards Human Action-Reaction Synthesis

Liang Xu, Yizhou Zhou, Yichao Yan et al.

CVPR 2024arXiv:2403.11882

citations

#1978

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024arXiv:2309.13925

citations

#1979

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

ICLR 2024arXiv:2310.10402

citations

#1980

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612

citations

#1981

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024arXiv:2311.14758

citations

#1982

On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy

Letian Huang, Jiayang Bai, Jie Guo et al.

ECCV 2024arXiv:2402.00752

citations

#1983

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588

citations

#1984

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024arXiv:2311.13831

citations

#1985

ParCo: Part-Coordinating Text-to-Motion Synthesis

Qiran Zou, Shangyuan Yuan, Shian Du et al.

ECCV 2024arXiv:2403.18512

citations

#1986

Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi et al.

ICLR 2024arXiv:2401.08898

citations

#1987

Improved Probabilistic Image-Text Representations

Sanghyuk Chun

ICLR 2024arXiv:2305.18171

citations

#1988

Equivariant Graph Neural Operator for Modeling 3D Dynamics

Minkai Xu, Jiaqi Han, Aaron Lou et al.

ICML 2024oralarXiv:2401.11037

citations

#1989

MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process

Xinyao Fan, Yueying Wu, Chang XU et al.

ICLR 2024arXiv:2403.05751

citations

#1990

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024arXiv:2402.07220

citations

#1991

Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

Ilan Naiman, N. Benjamin Erichson, Pu Ren et al.

ICLR 2024arXiv:2310.02619

citations

#1992

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.

CVPR 2024arXiv:2403.12532

citations

#1993

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

Jiaqi Liu, Kai Wu, Qiang Nie et al.

AAAI 2024paperarXiv:2401.01010

citations

#1994

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Tao Huang, Guangqi Jiang, Yanjie Ze et al.

ECCV 2024arXiv:2312.14134

citations

#1995

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Qucheng Peng, Ce Zheng, Chen Chen

CVPR 2024arXiv:2403.11310

citations

#1996

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

Yan Wang, Zhixuan Chu, Xin Ouyang et al.

AAAI 2024paper

citations

#1997

Zero-Shot Detection of AI-Generated Images

Davide Cozzolino, GIovanni Poggi, Matthias Niessner et al.

ECCV 2024arXiv:2409.15875

citations

#1998

LLMs are Good Action Recognizers

Haoxuan Qu, Yujun Cai, Jun Liu

CVPR 2024arXiv:2404.00532

citations

#1999

Unveiling the Pitfalls of Knowledge Editing for Large Language Models

Zhoubo Li, Ningyu Zhang, Yunzhi Yao et al.

ICLR 2024arXiv:2310.02129

citations

#2000

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.

ICLR 2024arXiv:2311.14904

citations

← Previous

1...8 9 10 11 12...62