Most Cited 2024 "contrastive factor discovery" Papers

12,324 papers found • Page 5 of 62

#801

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024posterarXiv:2403.07592
44
citations
#802

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024posterarXiv:2312.00947
44
citations
#803

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Yuhang Chen, Wenke Huang, Mang Ye

CVPR 2024posterarXiv:2405.16585
43
citations
#804

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024posterarXiv:2311.14758
43
citations
#805

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Tao Huang, Guangqi Jiang, Yanjie Ze et al.

ECCV 2024posterarXiv:2312.14134
43
citations
#806

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024paperarXiv:2312.10493
43
citations
#807

Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)

Qifeng Li, Xiaosong Jia, Shaobo Wang et al.

ECCV 2024poster
43
citations
#808

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.

ICLR 2024posterarXiv:2311.14904
43
citations
#809

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

Chenrui Zhang, Lin Liu, Chuyuan Wang et al.

AAAI 2024paperarXiv:2308.12033
43
citations
#810

Learning Transferable Negative Prompts for Out-of-Distribution Detection

Tianqi Li, Guansong Pang, wenjun miao et al.

CVPR 2024posterarXiv:2404.03248
43
citations
#811

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Xinyi He, Mengyu Zhou, Xinrun Xu et al.

AAAI 2024paperarXiv:2312.13671
43
citations
#812

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Qiao Gu, Zhaoyang Lv, Duncan Frost et al.

ECCV 2024posterarXiv:2403.18118
43
citations
#813

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024posterarXiv:2401.00027
43
citations
#814

AffineQuant: Affine Transformation Quantization for Large Language Models

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICLR 2024posterarXiv:2403.12544
43
citations
#815

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Hallee E. Wong, Marianne Rakic, John Guttag et al.

ECCV 2024posterarXiv:2312.07381
43
citations
#816

Improved Probabilistic Image-Text Representations

Sanghyuk Chun

ICLR 2024posterarXiv:2305.18171
43
citations
#817

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

CVPR 2024highlightarXiv:2404.18630
43
citations
#818

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

Yulin Luo, Ruichuan An, Bocheng Zou et al.

ECCV 2024posterarXiv:2405.02363
43
citations
#819

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024posterarXiv:2404.16622
43
citations
#820

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024posterarXiv:2403.03421
43
citations
#821

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024posterarXiv:2312.02702
43
citations
#822

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024posterarXiv:2312.01220
43
citations
#823

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024posterarXiv:2311.13831
43
citations
#824

ParCo: Part-Coordinating Text-to-Motion Synthesis

Qiran Zou, Shangyuan Yuan, Shian Du et al.

ECCV 2024posterarXiv:2403.18512
43
citations
#825

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024paperarXiv:2403.06365
43
citations
#826

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.

CVPR 2024posterarXiv:2312.04466
42
citations
#827

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.

ICLR 2024posterarXiv:2402.14606
42
citations
#828

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang ZHANG, Yi Li et al.

CVPR 2024posterarXiv:2403.01818
42
citations
#829

A Watermark-Conditioned Diffusion Model for IP Protection

Rui Min, Sen Li, Hongyang Chen et al.

ECCV 2024posterarXiv:2403.10893
42
citations
#830

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.

CVPR 2024posterarXiv:2403.12532
42
citations
#831

Learning the 3D Fauna of the Web

Zizhang Li, Dor Litvak, Ruining Li et al.

CVPR 2024posterarXiv:2401.02400
42
citations
#832

Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification

Jiangming Shi, Xiangbo Yin, Yeyun Chen et al.

ECCV 2024posterarXiv:2401.06825
42
citations
#833

BAMM: Bidirectional Autoregressive Motion Model

Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.

ECCV 2024posterarXiv:2403.19435
42
citations
#834

SemCity: Semantic Scene Generation with Triplane Diffusion

Jumin Lee, Sebin Lee, Changho Jo et al.

CVPR 2024posterarXiv:2403.07773
42
citations
#835

HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.

CVPR 2024highlightarXiv:2212.10428
42
citations
#836

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.

CVPR 2024highlightarXiv:2311.17597
42
citations
#837

Curriculum reinforcement learning for quantum architecture search under hardware errors

Yash J. Patel, Akash Kundu, Mateusz Ostaszewski et al.

ICLR 2024posterarXiv:2402.03500
42
citations
#838

CAGE: Controllable Articulation GEneration

Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.

CVPR 2024posterarXiv:2312.09570
42
citations
#839

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

Yihan Wang, Si Si, Daliang Li et al.

ICLR 2024posterarXiv:2211.00635
42
citations
#840

Exploiting Diffusion Prior for Generalizable Dense Prediction

Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.

CVPR 2024posterarXiv:2311.18832
42
citations
#841

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Donghyun Kim, Byeongho Heo, Dongyoon Han

ECCV 2024posterarXiv:2403.19588
42
citations
#842

Benchmarking and Improving Generator-Validator Consistency of Language Models

XIANG LI, Vaishnavi Shrivastava, Siyan Li et al.

ICLR 2024posterarXiv:2310.01846
41
citations
#843

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024posterarXiv:2404.10241
41
citations
#844

Generative Proxemics: A Prior for 3D Social Interaction from Images

Vickie Ye, Vickie Ye, Georgios Pavlakos et al.

CVPR 2024posterarXiv:2306.09337
41
citations
#845

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Linlan Huang, Xusheng Cao, Haori Lu et al.

ECCV 2024posterarXiv:2407.14143
41
citations
#846

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

Jie Xu, Yazhou Ren, Xiaolong Wang et al.

CVPR 2024posterarXiv:2303.17245
41
citations
#847

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling

Shimin Zhang, Qu Yang, Chenxiang Ma et al.

AAAI 2024paperarXiv:2308.13250
41
citations
#848

Large Language Models Are Neurosymbolic Reasoners

Meng Fang, Shilong Deng, Yudi Zhang et al.

AAAI 2024paperarXiv:2401.09334
41
citations
#849

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

Yifei Li, Hsiaoyu Chen, Egor Larionov et al.

CVPR 2024posterarXiv:2311.12194
41
citations
#850

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

An Lao, Qi Zhang, Chongyang Shi et al.

AAAI 2024paperarXiv:2312.11023
41
citations
#851

On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy

Letian Huang, Jiayang Bai, Jie Guo et al.

ECCV 2024posterarXiv:2402.00752
41
citations
#852

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Zhengxiang Shi, Aldo Lipani

ICLR 2024posterarXiv:2309.05173
41
citations
#853

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

Kai Katsumata, Duc Minh Vo, Hideki Nakayama

ECCV 2024posterarXiv:2311.12897
41
citations
#854

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning

Pratyush Maini, Sachin Goyal, Zachary Lipton et al.

ICLR 2024posterarXiv:2307.03132
41
citations
#855

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.

AAAI 2024paperarXiv:2312.00050
41
citations
#856

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

CVPR 2024posterarXiv:2404.00834
41
citations
#857

MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

Lirong Wu, Yijun Tian, Yufei Huang et al.

ICLR 2024spotlightarXiv:2402.14391
41
citations
#858

Attribute-Missing Graph Clustering Network

Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.

AAAI 2024paper
41
citations
#859

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

Shuai Tan, Bin Ji, Ye Pan

CVPR 2024posterarXiv:2403.06375
41
citations
#860

Few-Shot Detection of Machine-Generated Text using Style Representations

Rafael Rivera Soto, Kailin Koch, Aleem Khan et al.

ICLR 2024posterarXiv:2401.06712
41
citations
#861

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar, Ali Etemad

AAAI 2024paperarXiv:2211.13929
40
citations
#862

Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion

Shenghong Luo, Xuhang Chen, Weiwen Chen et al.

AAAI 2024paperarXiv:2308.13739
40
citations
#863

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

Bu Jin, Yupeng Zheng, Pengfei Li et al.

ECCV 2024posterarXiv:2403.19589
40
citations
#864

Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing

Tian-Xing Xu, WENBO HU, Yu-Kun Lai et al.

ECCV 2024posterarXiv:2403.10050
40
citations
#865

A Diffusion-Based Framework for Multi-Class Anomaly Detection

Haoyang He, Jiangning Zhang, Hongxu Chen et al.

AAAI 2024paperarXiv:2312.06607
40
citations
#866

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Yichen Bai, Zongbo Han, Bing Cao et al.

CVPR 2024posterarXiv:2311.15243
40
citations
#867

Prompt Learning via Meta-Regularization

Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

CVPR 2024posterarXiv:2404.00851
40
citations
#868

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.

CVPR 2024highlightarXiv:2311.17261
40
citations
#869

Scene Adaptive Sparse Transformer for Event-based Object Detection

Yansong Peng, Li Hebei, Yueyi Zhang et al.

CVPR 2024posterarXiv:2404.01882
40
citations
#870

Does CLIP’s generalization performance mainly stem from high train-test similarity?

Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak et al.

ICLR 2024posterarXiv:2310.09562
40
citations
#871

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series

Irina Rish, Kartik Ahuja, Mohammad Javad Darvishi Bayazi et al.

ICLR 2024poster
40
citations
#872

NOPE: Novel Object Pose Estimation from a Single Image

Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.

CVPR 2024posterarXiv:2303.13612
40
citations
#873

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

Zhihang Liu, Jun Li, Hongtao Xie et al.

AAAI 2024paperarXiv:2312.12155
40
citations
#874

MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation

Shuzhao Xie, Weixiang Zhang, Chen Tang et al.

ECCV 2024posterarXiv:2409.09756
40
citations
#875

A Vision Check-up for Language Models

Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.

CVPR 2024highlightarXiv:2401.01862
40
citations
#876

Stream Query Denoising for Vectorized HD-Map Construction

Shuo Wang, Fan Jia, Weixin Mao et al.

ECCV 2024posterarXiv:2401.09112
40
citations
#877

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

David Rozenberszki, Or Litany, Angela Dai

CVPR 2024posterarXiv:2303.14541
40
citations
#878

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Xu Yang, Changxing Ding, Zhibin Hong et al.

CVPR 2024posterarXiv:2404.01089
40
citations
#879

TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection

Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj

ECCV 2024posterarXiv:2311.09999
40
citations
#880

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang, Ruiyuan Gao, Kai Chen et al.

CVPR 2024posterarXiv:2403.13304
39
citations
#881

StyleSinger: Style Transfer for Out

of-Domain Singing Voice Synthesis

AAAI 2024paperarXiv:2312.10741
39
citations
#882

ZeroShape: Regression-based Zero-shot Shape Reconstruction

Zixuan Huang, Stefan Stojanov, Anh Thai et al.

CVPR 2024posterarXiv:2312.14198
39
citations
#883

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

CVPR 2024posterarXiv:2402.18528
39
citations
#884

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024posterarXiv:2312.07395
39
citations
#885

NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models

Yusuf Dalva, Pinar Yanardag

CVPR 2024posterarXiv:2312.05390
39
citations
#886

Text-Guided Molecule Generation with Diffusion Language Model

Haisong Gong, Qiang Liu, Shu Wu et al.

AAAI 2024paperarXiv:2402.13040
39
citations
#887

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.

ICLR 2024spotlightarXiv:2302.08560
39
citations
#888

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

AAAI 2024paperarXiv:2405.04453
39
citations
#889

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping SUN, Yanjun Wang, Ailing Zeng et al.

CVPR 2024posterarXiv:2403.17934
39
citations
#890

Learning Diffusion Texture Priors for Image Restoration

Tian Ye, Sixiang Chen, Wenhao Chai et al.

CVPR 2024highlight
39
citations
#891

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Chao Xu, Ang Li, Linghao Chen et al.

ECCV 2024posterarXiv:2408.10195
39
citations
#892

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Lihua Jing, Rui Wang, Wenqi Ren et al.

CVPR 2024posterarXiv:2404.16452
39
citations
#893

Controllable Mind Visual Diffusion Model

Bohan Zeng, Shanglin Li, Xuhui Liu et al.

AAAI 2024paperarXiv:2305.10135
39
citations
#894

GalLop: Learning global and local prompts for vision-language models

Marc Lafon, Elias Ramzi, Clément Rambour et al.

ECCV 2024posterarXiv:2407.01400
39
citations
#895

GLACE: Global Local Accelerated Coordinate Encoding

Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.

CVPR 2024posterarXiv:2406.04340
39
citations
#896

Multi-Architecture Multi-Expert Diffusion Models

Yunsung Lee, Jin-Young Kim, Hyojun Go et al.

AAAI 2024paperarXiv:2306.04990
39
citations
#897

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

Yu Deng, Duomin Wang, Xiaohang Ren et al.

CVPR 2024posterarXiv:2311.18729
39
citations
#898

Provable Offline Preference-Based Reinforcement Learning

Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.

ICLR 2024spotlightarXiv:2305.14816
39
citations
#899

PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters

Jingyu Chen, Runlin Lei, Zhewei Wei

ICLR 2024spotlight
39
citations
#900

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Yu-Hsuan Wu, Jerry Hu, Weijian Li et al.

ICLR 2024oralarXiv:2312.17346
39
citations
#901

No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation

Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.

AAAI 2024paperarXiv:2312.10080
39
citations
#902

Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

Zhihao Gu, Jiangning Zhang, Liang Liu et al.

AAAI 2024paper
38
citations
#903

AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction

Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.

ICLR 2024oralarXiv:2402.03784
38
citations
#904

Quality-Diversity through AI Feedback

Herbie Bradley, Andrew Dai, Hannah Teufel et al.

ICLR 2024posterarXiv:2310.13032
38
citations
#905

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Yaoting Wang, Liu Weisong, Guangyao Li et al.

AAAI 2024paperarXiv:2309.07929
38
citations
#906

TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

Hyunwook Lee, Sungahn Ko

ICLR 2024oralarXiv:2403.02600
38
citations
#907

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

Mengting Chen, Xi Chen, Zhonghua Zhai et al.

ECCV 2024posterarXiv:2403.12965
38
citations
#908

Multi-view Aggregation Network for Dichotomous Image Segmentation

Qian Yu, Xiaoqi Zhao, Youwei Pang et al.

CVPR 2024highlightarXiv:2404.07445
38
citations
#909

Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

Soopil Kim, Sion An, Philip Chikontwe et al.

AAAI 2024paperarXiv:2312.13783
38
citations
#910

Test-Time Domain Generalization for Face Anti-Spoofing

Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.

CVPR 2024posterarXiv:2403.19334
38
citations
#911

PINNACLE: PINN Adaptive ColLocation and Experimental points selection

Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng et al.

ICLR 2024spotlightarXiv:2404.07662
38
citations
#912

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Mengqi Huang, Zhendong Mao, Mingcong Liu et al.

CVPR 2024posterarXiv:2403.00483
38
citations
#913

Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data

YongKyung Oh, Dongyoung Lim, Sungil Kim

ICLR 2024spotlightarXiv:2402.14989
38
citations
#914

R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection

Zheyuan Zhou, Wang Le, Naiyu Fang et al.

ECCV 2024posterarXiv:2407.10862
38
citations
#915

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou, Hao Shao, Letian Wang et al.

CVPR 2024posterarXiv:2403.11492
38
citations
#916

Better Call SAL: Towards Learning to Segment Anything in Lidar

Aljoša Ošep, Tim Meinhardt, Francesco Ferroni et al.

ECCV 2024posterarXiv:2403.13129
38
citations
#917

When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations

Aleksandar Petrov, Philip Torr, Adel Bibi

ICLR 2024posterarXiv:2310.19698
38
citations
#918

Latent Space Editing in Transformer-Based Flow Matching

Vincent Tao Hu, Wei Zhang, Meng Tang et al.

AAAI 2024paperarXiv:2312.10825
38
citations
#919

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.

CVPR 2024posterarXiv:2404.16670
38
citations
#920

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Xin Li, Yunfei Wu, Xinghua Jiang et al.

CVPR 2024posterarXiv:2402.19014
37
citations
#921

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

Yajing Liu, Shijun Zhou, Xiyao Liu et al.

CVPR 2024highlightarXiv:2405.15225
37
citations
#922

HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models

Li Pang, Xiangyu Rui, Long Cui et al.

CVPR 2024posterarXiv:2402.15865
37
citations
#923

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.

ECCV 2024posterarXiv:2405.17873
37
citations
#924

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.

ECCV 2024posterarXiv:2407.15484
37
citations
#925

MathAttack: Attacking Large Language Models towards Math Solving Ability

Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.

AAAI 2024paperarXiv:2309.01686
37
citations
#926

A Unified and General Framework for Continual Learning

Zhenyi Wang, Yan Li, Li Shen et al.

ICLR 2024posterarXiv:2403.13249
37
citations
#927

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Lukas Hoyer, David Tan, Muhammad Ferjad Naeem et al.

ECCV 2024posterarXiv:2311.16241
37
citations
#928

SMooDi: Stylized Motion Diffusion Model

Lei Zhong, Yiming Xie, Varun Jampani et al.

ECCV 2024posterarXiv:2407.12783
37
citations
#929

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Jinglin Xu, Yijie Guo, Yuxin Peng

CVPR 2024highlightarXiv:2405.05216
37
citations
#930

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Feifei Wang, Zhentao Tan, Tianyi Wei et al.

CVPR 2024posterarXiv:2312.07865
37
citations
#931

Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures

Gehui Xu, Jie Wen, Chengliang Liu et al.

AAAI 2024paper
37
citations
#932

Disentangled Prompt Representation for Domain Generalization

De Cheng, Zhipeng Xu, XINYANG JIANG et al.

CVPR 2024poster
37
citations
#933

SegPoint: Segment Any Point Cloud via Large Language Model

Shuting He, Henghui Ding, Xudong Jiang et al.

ECCV 2024posterarXiv:2407.13761
37
citations
#934

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.

CVPR 2024posterarXiv:2403.13470
37
citations
#935

SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples

Chen Ma, Ningfei Wang, Qi Alfred Chen et al.

AAAI 2024paperarXiv:2312.09520
37
citations
#936

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Zhicai Wang, Longhui Wei, Tan Wang et al.

CVPR 2024posterarXiv:2403.19600
37
citations
#937

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

JunDa Cheng, Wei Yin, Kaixuan Wang et al.

CVPR 2024posterarXiv:2403.07535
37
citations
#938

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez

CVPR 2024posterarXiv:2311.18822
37
citations
#939

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano et al.

CVPR 2024posterarXiv:2402.19422
37
citations
#940

Question Aware Vision Transformer for Multimodal Reasoning

Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.

CVPR 2024highlightarXiv:2402.05472
37
citations
#941

Making RL with Preference-based Feedback Efficient via Randomization

Runzhe Wu, Wen Sun

ICLR 2024posterarXiv:2310.14554
37
citations
#942

Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning

Shangchao Su, Mingzhao Yang, Bin Li et al.

AAAI 2024paperarXiv:2211.07864
37
citations
#943

Video Question Answering with Procedural Programs

Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.

ECCV 2024posterarXiv:2312.00937
37
citations
#944

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Jiyao Zhang, Weiyao Huang, Bo Peng et al.

ECCV 2024posterarXiv:2406.04316
37
citations
#945

MEVG : Multi-event Video Generation with Text-to-Video Models

Gyeongrok Oh, Jaehwan Jeong, Sieun Kim et al.

ECCV 2024posterarXiv:2312.04086
37
citations
#946

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

Jiuming Liu, Guangming Wang, Weicai Ye et al.

CVPR 2024poster
37
citations
#947

Revisiting Single Image Reflection Removal In the Wild

Yurui Zhu, Bo Li, Xueyang Fu et al.

CVPR 2024posterarXiv:2311.17320
37
citations
#948

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen et al.

ECCV 2024posterarXiv:2408.13890
37
citations
#949

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Fangfu Liu, Diankun Wu, Yi Wei et al.

CVPR 2024posterarXiv:2312.06655
37
citations
#950

STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation

Liangcai Su, Junwei Pan, Ximei Wang et al.

AAAI 2024paperarXiv:2308.13537
37
citations
#951

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Yuhao Sun, Lingyun Yu, Hongtao Xie et al.

CVPR 2024posterarXiv:2405.09882
36
citations
#952

Towards Efficient Replay in Federated Incremental Learning

Yichen Li, Qunwei Li, Haozhao Wang et al.

CVPR 2024posterarXiv:2403.05890
36
citations
#953

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

Lei Li, Angela Dai

CVPR 2024posterarXiv:2311.17737
36
citations
#954

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Siyu Jiao, hongguang Zhu, Yunchao Wei et al.

ECCV 2024posterarXiv:2408.00744
36
citations
#955

Score-Guided Diffusion for 3D Human Recovery

Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas

CVPR 2024posterarXiv:2403.09623
36
citations
#956

DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion

Junjie Guo, Chenqiang Gao, Fangcen liu et al.

ECCV 2024posterarXiv:2403.00326
36
citations
#957

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

XuDong Wang, Ishan Misra, Ziyun Zeng et al.

CVPR 2024posterarXiv:2308.14710
36
citations
#958

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Liqi He, Zuchao Li, Xiantao Cai et al.

AAAI 2024paperarXiv:2312.08762
36
citations
#959

Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning

Mohamed Elsayed, A. Rupam Mahmood

ICLR 2024posterarXiv:2404.00781
36
citations
#960

Amodal Completion via Progressive Mixed Context Diffusion

Katherine Xu, Lingzhi Zhang, Jianbo Shi

CVPR 2024highlightarXiv:2312.15540
36
citations
#961

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

Guibiao Liao, Jiankun Li, Xiaoqing Ye

AAAI 2024paper
36
citations
#962

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

Jiayu Zou, Kun Tian, Zheng Zhu et al.

AAAI 2024paperarXiv:2303.08333
36
citations
#963

Diagnosing and Re-learning for Balanced Multimodal Learning

Yake Wei, Siwei Li, Ruoxuan Feng et al.

ECCV 2024posterarXiv:2407.09705
36
citations
#964

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.

CVPR 2024posterarXiv:2404.02478
36
citations
#965

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Ziyang Chen, Israel D. Gebru, Christian Richardt et al.

CVPR 2024highlightarXiv:2403.18821
36
citations
#966

How to Configure Good In-Context Sequence for Visual Question Answering

Li Li, Jiawei Peng, huiyi chen et al.

CVPR 2024posterarXiv:2312.01571
36
citations
#967

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

YUXUAN SUN, Hao Wu, Chenglu Zhu et al.

ECCV 2024posterarXiv:2401.16355
36
citations
#968

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Kezhi Kong, Jiani Zhang, Zhengyuan Shen et al.

ICLR 2024posterarXiv:2402.14361
36
citations
#969

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.

CVPR 2024posterarXiv:2403.16439
36
citations
#970

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.

ECCV 2024posterarXiv:2403.18274
36
citations
#971

Pyramid Diffusion for Fine 3D Large Scene Generation

Yuheng Liu, Xinke Li, Xueting Li et al.

ECCV 2024posterarXiv:2311.12085
36
citations
#972

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

Marco Cannici, Davide Scaramuzza

CVPR 2024posterarXiv:2403.19780
36
citations
#973

Communication-Efficient Federated Learning with Accelerated Client Gradient

Geeho Kim, Jinkyu Kim, Bohyung Han

CVPR 2024posterarXiv:2201.03172
36
citations
#974

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Yuchen Hu, CHEN CHEN, Chao-Han Huck Yang et al.

ICLR 2024spotlightarXiv:2401.10446
36
citations
#975

DragVideo: Interactive Drag-style Video Editing

Yufan Deng, Ruida Wang, Yuhao ZHANG et al.

ECCV 2024posterarXiv:2312.02216
36
citations
#976

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Quan Zhang, Xiaoyu Liu, Wei Li et al.

CVPR 2024posterarXiv:2403.16368
36
citations
#977

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

Jiatong Shi, Hirofumi Inaguma, Xutai Ma et al.

ICLR 2024spotlightarXiv:2310.02720
36
citations
#978

Control4D: Efficient 4D Portrait Editing with Text

Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.

CVPR 2024posterarXiv:2305.20082
36
citations
#979

U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Xiang Ma, Xuemei Li, Lexin Fang et al.

AAAI 2024paperarXiv:2401.02236
36
citations
#980

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed abdelrahman et al.

ECCV 2024posterarXiv:2407.12679
36
citations
#981

VicTR: Video-conditioned Text Representations for Activity Recognition

Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.

CVPR 2024posterarXiv:2304.02560
36
citations
#982

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

Mara Finkelstein, Markus Freitag

ICLR 2024posterarXiv:2309.10966
36
citations
#983

Vamos: Versatile Action Models for Video Understanding

Shijie Wang, Qi Zhao, Minh Quan et al.

ECCV 2024posterarXiv:2311.13627
36
citations
#984

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Dazhong Shen, Guanglu Song, Zeyue Xue et al.

CVPR 2024posterarXiv:2404.05384
35
citations
#985

Alchemist: Parametric Control of Material Properties with Diffusion Models

Prafull Sharma, Varun Jampani, Yuanzhen Li et al.

CVPR 2024posterarXiv:2312.02970
35
citations
#986

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.

CVPR 2024posterarXiv:2312.12416
35
citations
#987

Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

Shuang Li, Jiangjie Chen, Siyu Yuan et al.

AAAI 2024paperarXiv:2308.13961
35
citations
#988

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

Ke Fan, Junshu Tang, Weijian Cao et al.

ECCV 2024posterarXiv:2405.15763
35
citations
#989

LION: Implicit Vision Prompt Tuning

Haixin Wang, Jianlong Chang, Yihang Zhai et al.

AAAI 2024paperarXiv:2303.09992
35
citations
#990

6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning

Jiangrong Shen, Wenyao Ni, Qi Xu et al.

AAAI 2024paper
35
citations
#991

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

Fei Wang, Dan Guo, Kun Li et al.

CVPR 2024posterarXiv:2403.07347
35
citations
#992

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022
35
citations
#993

GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding

Changshuo Wang, Meiqing Wu, Siew-Kei Lam et al.

ECCV 2024posterarXiv:2407.13519
35
citations
#994

Tokenize Anything via Prompting

Ting Pan, Lulu Tang, Xinlong Wang et al.

ECCV 2024posterarXiv:2312.09128
35
citations
#995

Exploiting Label Skews in Federated Learning with Model Concatenation

Yiqun Diao, Qinbin Li, Bingsheng He

AAAI 2024paperarXiv:2312.06290
35
citations
#996

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation

Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.

CVPR 2024posterarXiv:2404.03656
35
citations
#997

UGG: Unified Generative Grasping

Jiaxin Lu, Hao Kang, Haoxiang Li et al.

ECCV 2024posterarXiv:2311.16917
35
citations
#998

RobustSAM: Segment Anything Robustly on Degraded Images

Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.

CVPR 2024highlightarXiv:2406.09627
35
citations
#999

How to Fine-Tune Vision Models with SGD

Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.

ICLR 2024posterarXiv:2211.09359
35
citations
#1000

ICP-Flow: LiDAR Scene Flow Estimation with ICP

Yancong Lin, Holger Caesar

CVPR 2024posterarXiv:2402.17351
35
citations