Most Cited 2024 "molecular assembly" Papers

12,324 papers found • Page 4 of 62

#601

DocFormerv2: Local Features for Document Understanding

Srikar Appalaraju, Peng Tang, Qi Dong et al.

AAAI 2024paperarXiv:2306.01733
58
citations
#602

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629
58
citations
#603

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations
#604

Seamless Human Motion Composition with Blended Positional Encodings

German Barquero, Sergio Escalera, Cristina Palmero

CVPR 2024posterarXiv:2402.15509
58
citations
#605

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen, WeiHua Li, Cheng Sun et al.

ECCV 2024posterarXiv:2409.10542
58
citations
#606

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024posterarXiv:2404.00562
58
citations
#607

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

ICLR 2024posterarXiv:2303.04488
57
citations
#608

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

hang yao, Ming LIU, Zhicun Yin et al.

ECCV 2024posterarXiv:2406.07487
57
citations
#609

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Daniel Winter, Matan Cohen, Shlomi Fruchter et al.

ECCV 2024posterarXiv:2403.18818
57
citations
#610

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399
57
citations
#611

PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering

Bingheng Li, Erlin Pan, Zhao Kang

AAAI 2024paperarXiv:2312.14438
57
citations
#612

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024poster
57
citations
#613

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

CVPR 2024highlightarXiv:2402.18078
57
citations
#614

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

Tong Li, Zhaoyang Liu, Yanyan Shen et al.

AAAI 2024paperarXiv:2312.15235
57
citations
#615

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

Devikalyan Das, Christopher Wewer, Raza Yunus et al.

CVPR 2024posterarXiv:2312.01196
57
citations
#616

Editing Language Model

Based Knowledge Graph Embeddings

AAAI 2024paperarXiv:2305.14908
57
citations
#617

Language Model Inversion

John X. Morris, Wenting Zhao, Justin Chiu et al.

ICLR 2024posterarXiv:2311.13647
57
citations
#618

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.

ECCV 2024posterarXiv:2310.17796
57
citations
#619

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
57
citations
#620

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

8137 Feiyu Zhu, Reid Simmons

AAAI 2024paperarXiv:2303.07033
56
citations
#621

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Wenfang Yao, Kejing Yin, William Cheung et al.

AAAI 2024paperarXiv:2403.06197
56
citations
#622

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

Bencheng Liao, Shaoyu Chen, Bo Jiang et al.

ECCV 2024posterarXiv:2303.08815
56
citations
#623

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

Yuqi Zhu, Jia Li, Ge Li et al.

AAAI 2024paperarXiv:2309.02772
56
citations
#624

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024posterarXiv:2407.12276
56
citations
#625

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024posterarXiv:2404.03507
56
citations
#626

BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks

Frederikke Marin, Felix Teufel, Marc Horlacher et al.

ICLR 2024posterarXiv:2311.12570
56
citations
#627

SECap: Speech Emotion Captioning with Large Language Model

Yaoxun Xu, Hangting Chen, Jianwei Yu et al.

AAAI 2024paperarXiv:2312.10381
56
citations
#628

MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

Xurui Li, Ziming Huang, Feng Xue et al.

ICLR 2024posterarXiv:2401.16753
55
citations
#629

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

AAAI 2024paperarXiv:2312.15816
55
citations
#630

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

AAAI 2024paperarXiv:2307.13808
55
citations
#631

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang et al.

CVPR 2024posterarXiv:2405.04940
55
citations
#632

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024posterarXiv:2407.20229
55
citations
#633

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.

CVPR 2024posterarXiv:2403.10518
55
citations
#634

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024posterarXiv:2312.04483
55
citations
#635

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

ECCV 2024posterarXiv:2404.08351
55
citations
#636

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
55
citations
#637

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

CVPR 2024posterarXiv:2306.12041
55
citations
#638

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Kangle Deng, Timothy Omernick, Alexander B Weiss et al.

ECCV 2024posterarXiv:2402.13251
54
citations
#639

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

Tomer Garber, Tom Tirer

CVPR 2024posterarXiv:2312.16519
54
citations
#640

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.

ECCV 2024posterarXiv:2403.15951
54
citations
#641

Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution

Guangyuan Li, Chen Rao, Juncheng Mo et al.

CVPR 2024posterarXiv:2404.04785
54
citations
#642

MemFlow: Optical Flow Estimation and Prediction with Memory

Qiaole Dong, Yanwei Fu

CVPR 2024posterarXiv:2404.04808
54
citations
#643

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Haoyu Lu, Yuqi Huo, Guoxing Yang et al.

ICLR 2024posterarXiv:2302.06605
54
citations
#644

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

Hiba Dahmani, Moussab Bennehar, Nathan Piasco et al.

ECCV 2024posterarXiv:2403.10427
54
citations
#645

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
54
citations
#646

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

CVPR 2024posterarXiv:2311.11125
54
citations
#647

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Xinhua Cheng, Tianyu Yang, Jianan Wang et al.

ICLR 2024posterarXiv:2310.11784
54
citations
#648

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

CVPR 2024highlightarXiv:2311.17950
54
citations
#649

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

CVPR 2024posterarXiv:2312.07526
54
citations
#650

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia, Letian Shi, Zifeng Ding et al.

CVPR 2024posterarXiv:2311.15977
54
citations
#651

Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2024posterarXiv:2403.18447
54
citations
#652

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Yue Han, Junwei Zhu, Keke He et al.

ECCV 2024posterarXiv:2405.12970
54
citations
#653

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang et al.

ECCV 2024posterarXiv:2404.11593
54
citations
#654

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

CVPR 2024posterarXiv:2404.07991
53
citations
#655

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Jannik Kossen, Yarin Gal, Tom Rainforth

ICLR 2024posterarXiv:2307.12375
53
citations
#656

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024posterarXiv:2312.15770
53
citations
#657

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.

ECCV 2024posterarXiv:2310.11881
53
citations
#658

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.

ECCV 2024posterarXiv:2312.02902
53
citations
#659

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Zeyu Liu, Weicong Liang, Zhanhao Liang et al.

ECCV 2024posterarXiv:2403.09622
53
citations
#660

Latent Guard: a Safety Framework for Text-to-image Generation

Runtao Liu, Ashkan Khakzar, Jindong Gu et al.

ECCV 2024posterarXiv:2404.08031
53
citations
#661

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time

Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin

AAAI 2024paperarXiv:2312.12343
53
citations
#662

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

CVPR 2024posterarXiv:2406.08603
53
citations
#663

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

Shu Yin, Peican Zhu, Lianwei Wu et al.

AAAI 2024paperarXiv:2312.05739
53
citations
#664

Text-Image Alignment for Diffusion-Based Perception

Neehar Kondapaneni, Markus Marks, Manuel Knott et al.

CVPR 2024posterarXiv:2310.00031
53
citations
#665

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Xiaofeng Cong, Jie Gui, Jing Zhang et al.

CVPR 2024posterarXiv:2403.18548
52
citations
#666

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Junlin Han, Filippos Kokkinos, Philip Torr

ECCV 2024posterarXiv:2403.12034
52
citations
#667

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024posterarXiv:2311.13601
52
citations
#668

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

Yanyan Li, Chenyu Lyu, Yan Di et al.

ECCV 2024posterarXiv:2403.11324
52
citations
#669

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

ECCV 2024posterarXiv:2407.05878
52
citations
#670

LaRa: Efficient Large-Baseline Radiance Fields

Anpei Chen, Haofei Xu, Stefano Esposito et al.

ECCV 2024posterarXiv:2407.04699
52
citations
#671

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.

CVPR 2024posterarXiv:2312.02976
52
citations
#672

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Dong Wu, Mingmin Chi, Xuan Zang et al.

AAAI 2024paperarXiv:2309.00526
52
citations
#673

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Shivangi Aneja, Justus Thies, Angela Dai et al.

CVPR 2024posterarXiv:2312.08459
52
citations
#674

AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond

Zixiang Zhou, Yu Wan, Baoyuan Wang

CVPR 2024poster
52
citations
#675

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.

ECCV 2024posterarXiv:2305.16037
52
citations
#676

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Junwei Luo, Xue Yang, Yi Yu et al.

CVPR 2024posterarXiv:2311.14757
52
citations
#677

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024paperarXiv:2401.03991
51
citations
#678

Bilateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An et al.

CVPR 2024posterarXiv:2403.11270
51
citations
#679

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

CVPR 2024posterarXiv:2312.02963
51
citations
#680

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Rinon Gal, Or Lichter, Elad Richardson et al.

ECCV 2024posterarXiv:2404.03620
51
citations
#681

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024posterarXiv:2401.09050
51
citations
#682

Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

CVPR 2024posterarXiv:2309.06255
51
citations
#683

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.

CVPR 2024posterarXiv:2312.09168
51
citations
#684

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Miltiadis (Miltos) Kofinas, Boris Knyazev, Yan Zhang et al.

ICLR 2024posterarXiv:2403.12143
51
citations
#685

Intriguing Properties of Generative Classifiers

Priyank Jaini, Kevin Clark, Robert Geirhos

ICLR 2024spotlightarXiv:2309.16779
51
citations
#686

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Yining Hong, Zishuo Zheng, Peihao Chen et al.

CVPR 2024posterarXiv:2401.08577
51
citations
#687

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.

ECCV 2024posterarXiv:2311.17057
51
citations
#688

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chenlin Zhang, Chen Zhao et al.

CVPR 2024posterarXiv:2311.17241
51
citations
#689

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Linjiang Huang, Rongyao Fang, Aiping Zhang et al.

ECCV 2024posterarXiv:2403.12963
51
citations
#690

Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.

CVPR 2024posterarXiv:2402.17376
51
citations
#691

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

kaijie ren, Lei Zhang

CVPR 2024posterarXiv:2403.11708
51
citations
#692

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng et al.

ECCV 2024posterarXiv:2403.12957
51
citations
#693

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333
51
citations
#694

Describing Differences in Image Sets with Natural Language

Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.

CVPR 2024posterarXiv:2312.02974
51
citations
#695

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Shraman Pramanick, Guangxing Han, Rui Hou et al.

CVPR 2024highlightarXiv:2312.12423
50
citations
#696

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

wenqiao Li, Xiaohao Xu, Yao Gu et al.

CVPR 2024posterarXiv:2311.14897
50
citations
#697

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024highlightarXiv:2301.11104
50
citations
#698

Few-Shot Object Detection with Foundation Models

Guangxing Han, Ser-Nam Lim

CVPR 2024poster
50
citations
#699

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Junjue Wang, Zhuo Zheng, Zihang Chen et al.

AAAI 2024paperarXiv:2312.12222
50
citations
#700

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao et al.

ECCV 2024posterarXiv:2403.04640
50
citations
#701

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.

ECCV 2024posterarXiv:2402.17430
50
citations
#702

Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition

Jianyang Xie, Yanda Meng, Yitian Zhao et al.

AAAI 2024paper
50
citations
#703

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

David Wan, Jaemin Cho, Elias Stengel-Eskin et al.

ECCV 2024posterarXiv:2403.02325
50
citations
#704

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Chongyu Fan, Jiancheng Liu, Alfred Hero et al.

ECCV 2024posterarXiv:2403.07362
50
citations
#705

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

ECCV 2024posterarXiv:2402.10491
50
citations
#706

Matching Anything by Segmenting Anything

Siyuan Li, Lei Ke, Martin Danelljan et al.

CVPR 2024highlightarXiv:2406.04221
50
citations
#707

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024posterarXiv:2403.16005
49
citations
#708

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024posterarXiv:2311.17404
49
citations
#709

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
49
citations
#710

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

ECCV 2024posterarXiv:2403.11052
49
citations
#711

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Yuqian Fu, Yu Wang, Yixuan Pan et al.

ECCV 2024posterarXiv:2402.03094
49
citations
#712

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024posterarXiv:2403.10254
49
citations
#713

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Zexiang Liu, Yangguang Li, Youtian Lin et al.

ECCV 2024posterarXiv:2312.08754
49
citations
#714

On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?

Maxime Zanella, Ismail Ben Ayed

CVPR 2024posterarXiv:2405.02266
49
citations
#715

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng, Bowen Wen, Jonathan Tremblay et al.

CVPR 2024posterarXiv:2404.01440
49
citations
#716

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

CVPR 2024posterarXiv:2307.07607
49
citations
#717

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Yufeng Huang, Jiji Tang, Zhuo Chen et al.

AAAI 2024paperarXiv:2305.06152
49
citations
#718

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang, L. F. D’Haro, Yiming Chen et al.

AAAI 2024paperarXiv:2312.15407
49
citations
#719

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

Jonas Belouadi, Anne Lauscher, Steffen Eger

ICLR 2024posterarXiv:2310.00367
49
citations
#720

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.

ECCV 2024posterarXiv:2312.01261
49
citations
#721

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Shuai Yang, Yifan Zhou, Ziwei Liu et al.

CVPR 2024posterarXiv:2403.12962
49
citations
#722

ReMamber: Referring Image Segmentation with Mamba Twister

Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.

ECCV 2024posterarXiv:2403.17839
49
citations
#723

From Zero to Turbulence: Generative Modeling for 3D Flow Simulation

Marten Lienen, David Lüdke, Jan Hansen-Palmus et al.

ICLR 2024posterarXiv:2306.01776
49
citations
#724

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

Mengke Li, Zhikai HU, Yang Lu et al.

AAAI 2024paperarXiv:2306.06963
48
citations
#725

Language-driven All-in-one Adverse Weather Removal

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024posterarXiv:2312.01381
48
citations
#726

Local Search GFlowNets

Minsu Kim, Yun Taeyoung, Emmanuel Bengio et al.

ICLR 2024spotlightarXiv:2310.02710
48
citations
#727

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

Ziqi Pang, Ziyang Xie, Yunze Man et al.

ICLR 2024oralarXiv:2310.12973
48
citations
#728

Soft Contrastive Learning for Time Series

Seunghan Lee, Taeyoung Park, Kibok Lee

ICLR 2024oralarXiv:2312.16424
48
citations
#729

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024posterarXiv:2305.11618
48
citations
#730

When Fast Fourier Transform Meets Transformer for Image Restoration

xingyu jiang, Xiuhui Zhang, Ning Gao et al.

ECCV 2024poster
48
citations
#731

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024posterarXiv:2403.13900
48
citations
#732

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024posterarXiv:2403.12488
48
citations
#733

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

Mi Yan, Jiazhao Zhang, Yan Zhu et al.

CVPR 2024posterarXiv:2401.07745
48
citations
#734

Neural Markov Random Field for Stereo Matching

Tongfan Guan, Chen Wang, Yun-Hui Liu

CVPR 2024posterarXiv:2403.11193
48
citations
#735

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

Xuangeng Chu, Yu Li, Ailing Zeng et al.

ICLR 2024posterarXiv:2401.10215
48
citations
#736

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao, Yuxin Mao, Bowen He et al.

AAAI 2024paperarXiv:2308.08288
48
citations
#737

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

ECCV 2024posterarXiv:2312.03048
48
citations
#738

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024posterarXiv:2405.07784
48
citations
#739

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024posterarXiv:2312.00825
48
citations
#740

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024posterarXiv:2312.04884
48
citations
#741

Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection

Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.

AAAI 2024paper
48
citations
#742

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024posterarXiv:2312.14233
48
citations
#743

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024posterarXiv:2312.08985
47
citations
#744

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Yuhao Liu, Zhanghan Ke, Fang Liu et al.

CVPR 2024posterarXiv:2403.00644
47
citations
#745

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.

ECCV 2024posterarXiv:2403.15098
47
citations
#746

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024posterarXiv:2306.09884
47
citations
#747

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

hongcheng Guo, Jian Yang, Jiaheng Liu et al.

AAAI 2024paperarXiv:2401.04749
47
citations
#748

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024posterarXiv:2305.10300
47
citations
#749

MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.

CVPR 2024posterarXiv:2308.11408
47
citations
#750

RangeLDM: Fast Realistic LiDAR Point Cloud Generation

Qianjiang Hu, Zhimin Zhang, Wei Hu

ECCV 2024posterarXiv:2403.10094
47
citations
#751

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Yu Zeng, Vishal M. Patel, Haochen Wang et al.

CVPR 2024posterarXiv:2407.06187
47
citations
#752

Simplifying Transformer Blocks

Bobby He, Thomas Hofmann

ICLR 2024posterarXiv:2311.01906
47
citations
#753

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

CVPR 2024highlightarXiv:2402.17323
47
citations
#754

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024spotlightarXiv:2405.02421
47
citations
#755

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

ECCV 2024posterarXiv:2403.13570
47
citations
#756

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024posterarXiv:2402.14780
47
citations
#757

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024posterarXiv:2312.09222
47
citations
#758

GAIA: Zero-shot Talking Avatar Generation

Tianyu He, Junliang Guo, Runyi Yu et al.

ICLR 2024posterarXiv:2311.15230
46
citations
#759

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024posterarXiv:2312.02284
46
citations
#760

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024posterarXiv:2409.08270
46
citations
#761

Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling

Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.

ICLR 2024posterarXiv:2306.03117
46
citations
#762

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

Stéphane d'Ascoli, Sören Becker, Philippe Schwaller et al.

ICLR 2024spotlightarXiv:2310.05573
46
citations
#763

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Siyan Zhao, John Dang, Aditya Grover

ICLR 2024posterarXiv:2310.11523
46
citations
#764

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang et al.

ECCV 2024posterarXiv:2404.04421
46
citations
#765

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

Peng Xu, Wenqi Shao, Mengzhao Chen et al.

ICLR 2024posterarXiv:2402.16880
46
citations
#766

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Jaroslaw Blasiok, Preetum Nakkiran

ICLR 2024poster
46
citations
#767

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

Chiyu Zhang, Xiaogang Xu, Lei Wang et al.

AAAI 2024paperarXiv:2210.12381
46
citations
#768

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.

ICLR 2024posterarXiv:2310.00535
46
citations
#769

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024posterarXiv:2312.04547
46
citations
#770

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024posterarXiv:2312.06505
46
citations
#771

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024paper
45
citations
#772

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024posterarXiv:2312.10908
45
citations
#773

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024posterarXiv:2404.02117
45
citations
#774

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Jianhao Yuan, Jie Zhang, Shuyang Sun et al.

ICLR 2024posterarXiv:2310.10402
45
citations
#775

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024posterarXiv:2310.05370
45
citations
#776

DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model

Li Xiaofan, Zhang Yifu, Xiaoqing Ye

ECCV 2024poster
45
citations
#777

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024posterarXiv:2312.04521
45
citations
#778

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024posterarXiv:2402.07635
45
citations
#779

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024posterarXiv:2403.12409
45
citations
#780

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024posterarXiv:2404.09216
45
citations
#781

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024posterarXiv:2401.01887
45
citations
#782

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

ICLR 2024posterarXiv:2309.16620
45
citations
#783

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

ICLR 2024oralarXiv:2309.15289
45
citations
#784

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Siyao Li, Tianpei Gu, Zhitao Yang et al.

ICLR 2024posterarXiv:2403.18811
45
citations
#785

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024posterarXiv:2312.17334
45
citations
#786

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024posterarXiv:2311.12386
45
citations
#787

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612
45
citations
#788

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
45
citations
#789

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving

William Ljungbergh, Adam Tonderski, Joakim Johnander et al.

ECCV 2024posterarXiv:2404.07762
44
citations
#790

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024posterarXiv:2404.01260
44
citations
#791

Fine-Grained Prototypes Distillation for Few-Shot Object Detection

Zichen Wang, Bo Yang, Haonan Yue et al.

AAAI 2024paperarXiv:2401.07629
44
citations
#792

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024poster
44
citations
#793

FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models

Andrea Caraffa, Davide Boscaini, Amir Hamza et al.

ECCV 2024posterarXiv:2312.00947
44
citations
#794

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024posterarXiv:2403.02991
44
citations
#795

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024posterarXiv:2403.07592
44
citations
#796

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen, Chunwei Wang, Kuo Yang et al.

ICLR 2024posterarXiv:2310.10477
44
citations
#797

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024poster
44
citations
#798

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024posterarXiv:2403.10615
44
citations
#799

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, DAYAN GUAN et al.

ECCV 2024posterarXiv:2312.02896
44
citations
#800

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024posterarXiv:2403.01849
44
citations