Most Cited 2024 "fourier embedding" Papers

12,324 papers found • Page 4 of 62

#601

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024poster
57
citations
#602

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399
57
citations
#603

Editing Language Model

Based Knowledge Graph Embeddings

AAAI 2024paperarXiv:2305.14908
57
citations
#604

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

Devikalyan Das, Christopher Wewer, Raza Yunus et al.

CVPR 2024posterarXiv:2312.01196
57
citations
#605

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

Tong Li, Zhaoyang Liu, Yanyan Shen et al.

AAAI 2024paperarXiv:2312.15235
57
citations
#606

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

hang yao, Ming LIU, Zhicun Yin et al.

ECCV 2024posterarXiv:2406.07487
57
citations
#607

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024posterarXiv:2404.03507
56
citations
#608

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

Bencheng Liao, Shaoyu Chen, Bo Jiang et al.

ECCV 2024posterarXiv:2303.08815
56
citations
#609

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Daniel Winter, Matan Cohen, Shlomi Fruchter et al.

ECCV 2024posterarXiv:2403.18818
56
citations
#610

SECap: Speech Emotion Captioning with Large Language Model

Yaoxun Xu, Hangting Chen, Jianwei Yu et al.

AAAI 2024paperarXiv:2312.10381
56
citations
#611

BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks

Frederikke Marin, Felix Teufel, Marc Horlacher et al.

ICLR 2024posterarXiv:2311.12570
56
citations
#612

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

8137 Feiyu Zhu, Reid Simmons

AAAI 2024paperarXiv:2303.07033
56
citations
#613

MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

Xurui Li, Ziming Huang, Feng Xue et al.

ICLR 2024posterarXiv:2401.16753
55
citations
#614

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Wenfang Yao, Kejing Yin, William Cheung et al.

AAAI 2024paperarXiv:2403.06197
55
citations
#615

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024posterarXiv:2312.04483
55
citations
#616

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang et al.

CVPR 2024posterarXiv:2405.04940
55
citations
#617

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
55
citations
#618

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.

CVPR 2024posterarXiv:2403.10518
55
citations
#619

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

CVPR 2024posterarXiv:2306.12041
55
citations
#620

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024posterarXiv:2407.20229
55
citations
#621

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

ECCV 2024posterarXiv:2404.08351
55
citations
#622

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024posterarXiv:2407.12276
55
citations
#623

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.

ECCV 2024posterarXiv:2403.15951
54
citations
#624

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

Tomer Garber, Tom Tirer

CVPR 2024posterarXiv:2312.16519
54
citations
#625

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

AAAI 2024paperarXiv:2312.15816
54
citations
#626

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

AAAI 2024paperarXiv:2307.13808
54
citations
#627

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Yue Han, Junwei Zhu, Keke He et al.

ECCV 2024posterarXiv:2405.12970
54
citations
#628

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Haoyu Lu, Yuqi Huo, Guoxing Yang et al.

ICLR 2024posterarXiv:2302.06605
54
citations
#629

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

CVPR 2024highlightarXiv:2311.17950
54
citations
#630

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
54
citations
#631

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Xinhua Cheng, Tianyu Yang, Jianan Wang et al.

ICLR 2024posterarXiv:2310.11784
54
citations
#632

MemFlow: Optical Flow Estimation and Prediction with Memory

Qiaole Dong, Yanwei Fu

CVPR 2024posterarXiv:2404.04808
54
citations
#633

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia, Letian Shi, Zifeng Ding et al.

CVPR 2024posterarXiv:2311.15977
54
citations
#634

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Kangle Deng, Timothy Omernick, Alexander B Weiss et al.

ECCV 2024posterarXiv:2402.13251
54
citations
#635

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang et al.

ECCV 2024posterarXiv:2404.11593
54
citations
#636

Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution

Guangyuan Li, Chen Rao, Juncheng Mo et al.

CVPR 2024posterarXiv:2404.04785
54
citations
#637

Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2024posterarXiv:2403.18447
54
citations
#638

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.

ECCV 2024posterarXiv:2310.11881
53
citations
#639

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

CVPR 2024posterarXiv:2404.07991
53
citations
#640

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

Shu Yin, Peican Zhu, Lianwei Wu et al.

AAAI 2024paperarXiv:2312.05739
53
citations
#641

Latent Guard: a Safety Framework for Text-to-image Generation

Runtao Liu, Ashkan Khakzar, Jindong Gu et al.

ECCV 2024posterarXiv:2404.08031
53
citations
#642

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024posterarXiv:2312.15770
53
citations
#643

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

CVPR 2024posterarXiv:2406.08603
53
citations
#644

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Zeyu Liu, Weicong Liang, Zhanhao Liang et al.

ECCV 2024posterarXiv:2403.09622
53
citations
#645

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

CVPR 2024posterarXiv:2312.07526
53
citations
#646

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Jannik Kossen, Yarin Gal, Tom Rainforth

ICLR 2024posterarXiv:2307.12375
53
citations
#647

Text-Image Alignment for Diffusion-Based Perception

Neehar Kondapaneni, Markus Marks, Manuel Knott et al.

CVPR 2024posterarXiv:2310.00031
53
citations
#648

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time

Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin

AAAI 2024paperarXiv:2312.12343
53
citations
#649

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

CVPR 2024posterarXiv:2311.11125
53
citations
#650

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Junlin Han, Filippos Kokkinos, Philip Torr

ECCV 2024posterarXiv:2403.12034
52
citations
#651

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.

CVPR 2024posterarXiv:2312.02976
52
citations
#652

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024posterarXiv:2311.13601
52
citations
#653

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Xiaofeng Cong, Jie Gui, Jing Zhang et al.

CVPR 2024posterarXiv:2403.18548
52
citations
#654

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Dong Wu, Mingmin Chi, Xuan Zang et al.

AAAI 2024paperarXiv:2309.00526
52
citations
#655

LaRa: Efficient Large-Baseline Radiance Fields

Anpei Chen, Haofei Xu, Stefano Esposito et al.

ECCV 2024posterarXiv:2407.04699
52
citations
#656

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

ECCV 2024posterarXiv:2407.05878
52
citations
#657

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Shivangi Aneja, Justus Thies, Angela Dai et al.

CVPR 2024posterarXiv:2312.08459
52
citations
#658

AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond

Zixiang Zhou, Yu Wan, Baoyuan Wang

CVPR 2024poster
52
citations
#659

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.

ECCV 2024posterarXiv:2305.16037
52
citations
#660

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

Yanyan Li, Chenyu Lyu, Yan Di et al.

ECCV 2024posterarXiv:2403.11324
52
citations
#661

Intriguing Properties of Generative Classifiers

Priyank Jaini, Kevin Clark, Robert Geirhos

ICLR 2024spotlightarXiv:2309.16779
51
citations
#662

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

CVPR 2024posterarXiv:2312.02963
51
citations
#663

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Miltiadis (Miltos) Kofinas, Boris Knyazev, Yan Zhang et al.

ICLR 2024posterarXiv:2403.12143
51
citations
#664

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chenlin Zhang, Chen Zhao et al.

CVPR 2024posterarXiv:2311.17241
51
citations
#665

Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.

CVPR 2024posterarXiv:2402.17376
51
citations
#666

Describing Differences in Image Sets with Natural Language

Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.

CVPR 2024posterarXiv:2312.02974
51
citations
#667

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng et al.

ECCV 2024posterarXiv:2403.12957
51
citations
#668

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Linjiang Huang, Rongyao Fang, Aiping Zhang et al.

ECCV 2024posterarXiv:2403.12963
51
citations
#669

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.

ECCV 2024posterarXiv:2311.17057
51
citations
#670

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

kaijie ren, Lei Zhang

CVPR 2024posterarXiv:2403.11708
51
citations
#671

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024paperarXiv:2401.03991
51
citations
#672

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Junwei Luo, Xue Yang, Yi Yu et al.

CVPR 2024posterarXiv:2311.14757
51
citations
#673

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Yining Hong, Zishuo Zheng, Peihao Chen et al.

CVPR 2024posterarXiv:2401.08577
51
citations
#674

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.

CVPR 2024posterarXiv:2312.09168
51
citations
#675

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333
51
citations
#676

Bilateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An et al.

CVPR 2024posterarXiv:2403.11270
51
citations
#677

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024posterarXiv:2401.09050
51
citations
#678

Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

CVPR 2024posterarXiv:2309.06255
51
citations
#679

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.

ECCV 2024posterarXiv:2312.02902
51
citations
#680

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Chongyu Fan, Jiancheng Liu, Alfred Hero et al.

ECCV 2024posterarXiv:2403.07362
50
citations
#681

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

wenqiao Li, Xiaohao Xu, Yao Gu et al.

CVPR 2024posterarXiv:2311.14897
50
citations
#682

Few-Shot Object Detection with Foundation Models

Guangxing Han, Ser-Nam Lim

CVPR 2024poster
50
citations
#683

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024highlightarXiv:2301.11104
50
citations
#684

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

David Wan, Jaemin Cho, Elias Stengel-Eskin et al.

ECCV 2024posterarXiv:2403.02325
50
citations
#685

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

ECCV 2024posterarXiv:2402.10491
50
citations
#686

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao et al.

ECCV 2024posterarXiv:2403.04640
50
citations
#687

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Shraman Pramanick, Guangxing Han, Rui Hou et al.

CVPR 2024highlightarXiv:2312.12423
50
citations
#688

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Yufeng Huang, Jiji Tang, Zhuo Chen et al.

AAAI 2024paperarXiv:2305.06152
49
citations
#689

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024posterarXiv:2403.10254
49
citations
#690

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang, L. F. D’Haro, Yiming Chen et al.

AAAI 2024paperarXiv:2312.15407
49
citations
#691

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024posterarXiv:2311.17404
49
citations
#692

On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?

Maxime Zanella, Ismail Ben Ayed

CVPR 2024posterarXiv:2405.02266
49
citations
#693

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng, Bowen Wen, Jonathan Tremblay et al.

CVPR 2024posterarXiv:2404.01440
49
citations
#694

Matching Anything by Segmenting Anything

Siyuan Li, Lei Ke, Martin Danelljan et al.

CVPR 2024highlightarXiv:2406.04221
49
citations
#695

ReMamber: Referring Image Segmentation with Mamba Twister

Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.

ECCV 2024posterarXiv:2403.17839
49
citations
#696

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.

ECCV 2024posterarXiv:2312.01261
49
citations
#697

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.

ECCV 2024posterarXiv:2402.17430
49
citations
#698

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Rinon Gal, Or Lichter, Elad Richardson et al.

ECCV 2024posterarXiv:2404.03620
49
citations
#699

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
49
citations
#700

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

CVPR 2024posterarXiv:2307.07607
49
citations
#701

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

Jonas Belouadi, Anne Lauscher, Steffen Eger

ICLR 2024posterarXiv:2310.00367
49
citations
#702

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Zexiang Liu, Yangguang Li, Youtian Lin et al.

ECCV 2024posterarXiv:2312.08754
49
citations
#703

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Shuai Yang, Yifan Zhou, Ziwei Liu et al.

CVPR 2024posterarXiv:2403.12962
49
citations
#704

From Zero to Turbulence: Generative Modeling for 3D Flow Simulation

Marten Lienen, David Lüdke, Jan Hansen-Palmus et al.

ICLR 2024posterarXiv:2306.01776
49
citations
#705

Soft Contrastive Learning for Time Series

Seunghan Lee, Taeyoung Park, Kibok Lee

ICLR 2024oralarXiv:2312.16424
48
citations
#706

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

Xuangeng Chu, Yu Li, Ailing Zeng et al.

ICLR 2024posterarXiv:2401.10215
48
citations
#707

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024posterarXiv:2312.14233
48
citations
#708

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024posterarXiv:2305.11618
48
citations
#709

Local Search GFlowNets

Minsu Kim, Yun Taeyoung, Emmanuel Bengio et al.

ICLR 2024spotlightarXiv:2310.02710
48
citations
#710

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

ECCV 2024posterarXiv:2312.03048
48
citations
#711

Neural Markov Random Field for Stereo Matching

Tongfan Guan, Chen Wang, Yun-Hui Liu

CVPR 2024posterarXiv:2403.11193
48
citations
#712

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024posterarXiv:2403.13900
48
citations
#713

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

Mi Yan, Jiazhao Zhang, Yan Zhu et al.

CVPR 2024posterarXiv:2401.07745
48
citations
#714

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

Ziqi Pang, Ziyang Xie, Yunze Man et al.

ICLR 2024oralarXiv:2310.12973
48
citations
#715

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024posterarXiv:2312.00825
48
citations
#716

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Yuqian Fu, Yu Wang, Yixuan Pan et al.

ECCV 2024posterarXiv:2402.03094
48
citations
#717

Language-driven All-in-one Adverse Weather Removal

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024posterarXiv:2312.01381
48
citations
#718

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

Mengke Li, Zhikai HU, Yang Lu et al.

AAAI 2024paperarXiv:2306.06963
48
citations
#719

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024posterarXiv:2403.12488
47
citations
#720

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024posterarXiv:2305.10300
47
citations
#721

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024posterarXiv:2312.09222
47
citations
#722

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024spotlightarXiv:2405.02421
47
citations
#723

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

ECCV 2024posterarXiv:2403.13570
47
citations
#724

Simplifying Transformer Blocks

Bobby He, Thomas Hofmann

ICLR 2024posterarXiv:2311.01906
47
citations
#725

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

hongcheng Guo, Jian Yang, Jiaheng Liu et al.

AAAI 2024paperarXiv:2401.04749
47
citations
#726

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

CVPR 2024highlightarXiv:2402.17323
47
citations
#727

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Yuhao Liu, Zhanghan Ke, Fang Liu et al.

CVPR 2024posterarXiv:2403.00644
47
citations
#728

MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.

CVPR 2024posterarXiv:2308.11408
47
citations
#729

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024posterarXiv:2402.14780
47
citations
#730

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Yu Zeng, Vishal M. Patel, Haochen Wang et al.

CVPR 2024posterarXiv:2407.06187
47
citations
#731

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024posterarXiv:2312.04884
47
citations
#732

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.

ECCV 2024posterarXiv:2403.15098
47
citations
#733

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Junjue Wang, Zhuo Zheng, Zihang Chen et al.

AAAI 2024paperarXiv:2312.12222
47
citations
#734

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024posterarXiv:2306.09884
47
citations
#735

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024posterarXiv:2312.08985
47
citations
#736

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang et al.

ECCV 2024posterarXiv:2404.04421
46
citations
#737

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Jaroslaw Blasiok, Preetum Nakkiran

ICLR 2024poster
46
citations
#738

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

Peng Xu, Wenqi Shao, Mengzhao Chen et al.

ICLR 2024posterarXiv:2402.16880
46
citations
#739

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024posterarXiv:2312.06505
46
citations
#740

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.

ICLR 2024posterarXiv:2306.03117
46
citations
#741

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Siyan Zhao, John Dang, Aditya Grover

ICLR 2024posterarXiv:2310.11523
46
citations
#742

When Fast Fourier Transform Meets Transformer for Image Restoration

xingyu jiang, Xiuhui Zhang, Ning Gao et al.

ECCV 2024poster
46
citations
#743

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

ECCV 2024posterarXiv:2403.11052
46
citations
#744

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

Stéphane d'Ascoli, Sören Becker, Philippe Schwaller et al.

ICLR 2024spotlightarXiv:2310.05573
46
citations
#745

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

Chiyu Zhang, Xiaogang Xu, Lei Wang et al.

AAAI 2024paperarXiv:2210.12381
46
citations
#746

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.

ICLR 2024posterarXiv:2310.00535
46
citations
#747

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024posterarXiv:2312.02284
46
citations
#748

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ICLR 2024posterarXiv:2311.15230
46
citations
#749

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024posterarXiv:2312.04547
46
citations
#750

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024posterarXiv:2405.07784
46
citations
#751

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

ICLR 2024oralarXiv:2309.15289
45
citations
#752

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024paper
45
citations
#753

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024posterarXiv:2409.08270
45
citations
#754

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

ICLR 2024posterarXiv:2310.10402
45
citations
#755

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

ICLR 2024posterarXiv:2309.16620
45
citations
#756

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024posterarXiv:2312.10908
45
citations
#757

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024posterarXiv:2404.09216
45
citations
#758

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612
45
citations
#759

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024posterarXiv:2311.12386
45
citations
#760

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024posterarXiv:2403.12409
45
citations
#761

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024posterarXiv:2310.05370
45
citations
#762

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024posterarXiv:2312.04521
45
citations
#763

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024posterarXiv:2403.16005
45
citations
#764

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu, Zhitao Yang et al.

ICLR 2024posterarXiv:2403.18811
45
citations
#765

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024posterarXiv:2312.17334
45
citations
#766

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024posterarXiv:2404.02117
45
citations
#767

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024posterarXiv:2402.07635
45
citations
#768

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Qianjiang Hu, Zhimin Zhang, Wei Hu

ECCV 2024posterarXiv:2403.10094
44
citations
#769

Fine-Grained Prototypes Distillation for Few-Shot Object Detection

Zichen Wang, Bo Yang, Haonan Yue et al.

AAAI 2024paperarXiv:2401.07629
44
citations
#770

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024posterarXiv:2401.01887
44
citations
#771

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, DAYAN GUAN et al.

ECCV 2024posterarXiv:2312.02896
44
citations
#772

DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model

Li Xiaofan, Zhang Yifu, Xiaoqing Ye

ECCV 2024poster
44
citations
#773

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024posterarXiv:2404.01260
44
citations
#774

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024posterarXiv:2312.00947
44
citations
#775

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024poster
44
citations
#776

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.

AAAI 2024paperarXiv:2309.06933
44
citations
#777

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
44
citations
#778

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024posterarXiv:2403.07592
44
citations
#779

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024posterarXiv:2403.02991
44
citations
#780

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024poster
44
citations
#781

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen, Chunwei Wang, Kuo Yang et al.

ICLR 2024posterarXiv:2310.10477
44
citations
#782

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

William Ljungbergh, Adam Tonderski, Joakim Johnander et al.

ECCV 2024posterarXiv:2404.07762
44
citations
#783

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024posterarXiv:2403.10615
44
citations
#784

Diffusion Reward: Learning Rewards via Conditional Video Diffusion

Tao Huang, Guangqi Jiang, Yanjie Ze et al.

ECCV 2024posterarXiv:2312.14134
43
citations
#785

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

CVPR 2024highlightarXiv:2404.18630
43
citations
#786

Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)

Qifeng Li, Xiaosong Jia, Shaobo Wang et al.

ECCV 2024poster
43
citations
#787

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024posterarXiv:2311.14758
43
citations
#788

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Yuhang Chen, Wenke Huang, Mang Ye

CVPR 2024posterarXiv:2405.16585
43
citations
#789

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024posterarXiv:2312.01220
43
citations
#790

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024paperarXiv:2312.10493
43
citations
#791

Improved Probabilistic Image-Text Representations

Sanghyuk Chun

ICLR 2024posterarXiv:2305.18171
43
citations
#792

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024posterarXiv:2404.16622
43
citations
#793

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

Yulin Luo, Ruichuan An, Bocheng Zou et al.

ECCV 2024posterarXiv:2405.02363
43
citations
#794

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024paperarXiv:2403.06365
43
citations
#795

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Xinyi He, Mengyu Zhou, Xinrun Xu et al.

AAAI 2024paperarXiv:2312.13671
43
citations
#796

AffineQuant: Affine Transformation Quantization for Large Language Models

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICLR 2024posterarXiv:2403.12544
43
citations
#797

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.

ICLR 2024posterarXiv:2311.14904
43
citations
#798

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024posterarXiv:2312.02702
43
citations
#799

Learning Transferable Negative Prompts for Out-of-Distribution Detection

Tianqi Li, Guansong Pang, wenjun miao et al.

CVPR 2024posterarXiv:2404.03248
43
citations
#800

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024posterarXiv:2401.00027
43
citations