Most Cited 2024 "pathology severity control" Papers

12,324 papers found • Page 3 of 62

#401

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553
82
citations
#402

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024posterarXiv:2309.14611
82
citations
#403

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024posterarXiv:2401.07487
82
citations
#404

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024highlightarXiv:2403.15241
81
citations
#405

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Qi Zhao, Shijie Wang, Ce Zhang et al.

ICLR 2024oralarXiv:2307.16368
81
citations
#406

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.

CVPR 2024posterarXiv:2404.04650
81
citations
#407

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

Han Zhou, Xingchen Wan, Lev Proleev et al.

ICLR 2024posterarXiv:2309.17249
81
citations
#408

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

ICLR 2024spotlightarXiv:2310.10638
81
citations
#409

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024posterarXiv:2310.08588
81
citations
#410

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

ICLR 2024posterarXiv:2401.12242
80
citations
#411

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024posterarXiv:2310.00034
80
citations
#412

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024posterarXiv:2312.03777
80
citations
#413

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

CVPR 2024posterarXiv:2307.07511
80
citations
#414

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.

CVPR 2024posterarXiv:2312.12490
80
citations
#415

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang, Zhehao Shen, Penghao Wang et al.

CVPR 2024posterarXiv:2312.03461
80
citations
#416

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024highlightarXiv:2312.09158
79
citations
#417

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024posterarXiv:2403.11641
79
citations
#418

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

ICLR 2024spotlightarXiv:2309.16575
79
citations
#419

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Yuhang Huang, Renjiao Yi et al.

AAAI 2024paperarXiv:2401.02032
79
citations
#420

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024posterarXiv:2401.00374
78
citations
#421

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024posterarXiv:2307.12732
78
citations
#422

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.

ICLR 2024posterarXiv:2401.10480
78
citations
#423

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024posterarXiv:2403.16095
78
citations
#424

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

CVPR 2024posterarXiv:2403.16131
78
citations
#425

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863
78
citations
#426

Amortizing intractable inference in large language models

Edward Hu, Moksh Jain, Eric Elmoznino et al.

ICLR 2024posterarXiv:2310.04363
78
citations
#427

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.

CVPR 2024posterarXiv:2403.05419
78
citations
#428

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036
78
citations
#429

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024posterarXiv:2310.04562
78
citations
#430

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024posterarXiv:2311.17002
78
citations
#431

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo et al.

CVPR 2024posterarXiv:2312.02214
78
citations
#432

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

Yukun Huang, Jianan Wang, Yukai Shi et al.

ICLR 2024posterarXiv:2306.12422
78
citations
#433

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024posterarXiv:2309.16496
77
citations
#434

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024posterarXiv:2311.17135
77
citations
#435

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024posterarXiv:2407.12366
77
citations
#436

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

CVPR 2024posterarXiv:2308.09718
77
citations
#437

Curiosity-driven Red-teaming for Large Language Models

Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.

ICLR 2024posterarXiv:2402.19464
77
citations
#438

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024posterarXiv:2401.00834
77
citations
#439

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Jinyi Hu, Yuan Yao, Chongyi Wang et al.

ICLR 2024spotlightarXiv:2308.12038
77
citations
#440

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024poster
77
citations
#441

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

AAAI 2024paperarXiv:2307.16424
76
citations
#442

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024posterarXiv:2401.08209
76
citations
#443

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685
76
citations
#444

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024posterarXiv:2312.11557
76
citations
#445

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024posterarXiv:2405.14705
76
citations
#446

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024posterarXiv:2312.07472
76
citations
#447

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

Haoran Ye, Jiarui Wang, Helan Liang et al.

AAAI 2024paperarXiv:2312.08224
76
citations
#448

LLM-grounded Video Diffusion Models

Long Lian, Baifeng Shi, Adam Yala et al.

ICLR 2024oralarXiv:2309.17444
76
citations
#449

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024posterarXiv:2402.16846
75
citations
#450

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024posterarXiv:2403.07719
75
citations
#451

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024posterarXiv:2403.12019
75
citations
#452

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024posterarXiv:2312.11360
75
citations
#453

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024posterarXiv:2402.19231
75
citations
#454

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024posterarXiv:2311.10959
75
citations
#455

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024posterarXiv:2403.08730
75
citations
#456

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024posterarXiv:2405.05967
75
citations
#457

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024posterarXiv:2407.09359
75
citations
#458

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.

AAAI 2024paperarXiv:2312.12142
74
citations
#459

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024posterarXiv:2312.00878
74
citations
#460

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024posterarXiv:2403.11831
74
citations
#461

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.

ICLR 2024posterarXiv:2310.00076
74
citations
#462

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024posterarXiv:2307.07218
74
citations
#463

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300
74
citations
#464

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024posterarXiv:2405.11643
74
citations
#465

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Yuchuan Tian, Hanting Chen, Xutao Wang et al.

ICLR 2024spotlightarXiv:2305.18149
74
citations
#466

Graph Neural Prompting with Large Language Models

Yijun Tian, Huan Song, Zichen Wang et al.

AAAI 2024paperarXiv:2309.15427
74
citations
#467

Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz, Aaditya Singh, DJ Strouse et al.

ICLR 2024spotlightarXiv:2310.04373
73
citations
#468

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024posterarXiv:2312.01919
73
citations
#469

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024posterarXiv:2403.16034
73
citations
#470

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024posterarXiv:2404.03613
73
citations
#471

Towards 3D Molecule-Text Interpretation in Language Models

Sihang Li, Zhiyuan Liu, Yanchen Luo et al.

ICLR 2024posterarXiv:2401.13923
73
citations
#472

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024posterarXiv:2311.16926
73
citations
#473

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

Yingpeng Du, Di Luo, Rui Yan et al.

AAAI 2024paperarXiv:2307.10747
72
citations
#474

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024posterarXiv:2405.01527
72
citations
#475

Elucidating the Exposure Bias in Diffusion Models

Mang Ning, Mingxiao Li, Jianlin Su et al.

ICLR 2024posterarXiv:2308.15321
72
citations
#476

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024posterarXiv:2403.05139
72
citations
#477

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao et al.

CVPR 2024posterarXiv:2404.06842
72
citations
#478

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

CVPR 2024posterarXiv:2310.10343
72
citations
#479

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024posterarXiv:2401.01885
71
citations
#480

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing

Yafei Zhang, Shen Zhou, Huafeng Li

CVPR 2024posterarXiv:2403.01105
71
citations
#481

SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution

mingjun zheng, Long Sun, Jiangxin Dong et al.

ECCV 2024poster
71
citations
#482

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024posterarXiv:2403.19522
71
citations
#483

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244
71
citations
#484

FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update

Ji Liu, Juncheng Jia, Tianshi Che et al.

AAAI 2024paperarXiv:2312.05770
71
citations
#485

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024posterarXiv:2311.16512
71
citations
#486

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.

ECCV 2024posterarXiv:2405.06228
71
citations
#487

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
70
citations
#488

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024posterarXiv:2404.08531
70
citations
#489

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Malyaban Bal, Abhronil Sengupta

AAAI 2024paperarXiv:2308.10873
70
citations
#490

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496
70
citations
#491

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024posterarXiv:2309.02285
70
citations
#492

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024posterarXiv:2403.12550
70
citations
#493

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024posterarXiv:2403.13043
70
citations
#494

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024posterarXiv:2402.17300
70
citations
#495

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlight
69
citations
#496

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

YEFEI HE, Jing Liu, Weijia Wu et al.

ICLR 2024oralarXiv:2310.03270
69
citations
#497

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024posterarXiv:2401.12244
69
citations
#498

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024posterarXiv:2409.01686
69
citations
#499

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024highlightarXiv:2312.11461
69
citations
#500

Plug-In Diffusion Model for Sequential Recommendation

Haokai Ma, Ruobing Xie, Lei Meng et al.

AAAI 2024paperarXiv:2401.02913
69
citations
#501

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024posterarXiv:2311.15855
69
citations
#502

Learning to Act without Actions

Dominik Schmidt, Minqi Jiang

ICLR 2024oralarXiv:2312.10812
69
citations
#503

SolidGen: An Autoregressive Model for Direct B-rep Synthesis

Karl Willis, Joseph Lambourne, Nigel Morris et al.

ICLR 2024poster
69
citations
#504

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.

ECCV 2024posterarXiv:2403.08551
68
citations
#505

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.

CVPR 2024posterarXiv:2405.12979
68
citations
#506

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Shijie Zhou, Zhiwen Fan, Dejia Xu et al.

ECCV 2024posterarXiv:2404.06903
68
citations
#507

Free3D: Consistent Novel View Synthesis without 3D Representation

Chuanxia Zheng, Andrea Vedaldi

CVPR 2024posterarXiv:2312.04551
68
citations
#508

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.

CVPR 2024posterarXiv:2312.11994
68
citations
#509

On the Learnability of Watermarks for Language Models

Chenchen Gu, XIANG LI, Percy Liang et al.

ICLR 2024posterarXiv:2312.04469
68
citations
#510

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024posterarXiv:2407.12442
68
citations
#511

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024posterarXiv:2311.16194
68
citations
#512

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024oralarXiv:2310.08887
68
citations
#513

Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic

Sachin Goyal, Pratyush Maini, Zachary Lipton et al.

CVPR 2024posterarXiv:2404.07177
67
citations
#514

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen et al.

ECCV 2024posterarXiv:2404.14351
67
citations
#515

Learning to Rank in Generative Retrieval

Yongqi Li, Nan Yang, Liang Wang et al.

AAAI 2024paperarXiv:2306.15222
67
citations
#516

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

ECCV 2024posterarXiv:2406.01597
67
citations
#517

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024posterarXiv:2403.17346
66
citations
#518

OneRestore: A Universal Restoration Framework for Composite Degradation

Yu Guo, Yuan Gao, Yuxu Lu et al.

ECCV 2024posterarXiv:2407.04621
66
citations
#519

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024posterarXiv:2311.11666
66
citations
#520

Deep Temporal Graph Clustering

Meng Liu, Yue Liu, KE LIANG et al.

ICLR 2024oralarXiv:2305.10738
66
citations
#521

FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Zhen Liu, Hao Zhu, Qi Zhang et al.

CVPR 2024posterarXiv:2312.02434
66
citations
#522

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Kaiwen Zhang, Yifan Zhou, Xudong XU et al.

CVPR 2024posterarXiv:2312.07409
66
citations
#523

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

Junge Zhang, Feihu Zhang, Shaochen Kuang et al.

AAAI 2024paperarXiv:2304.14811
66
citations
#524

DiffusionTrack: Diffusion Model for Multi-Object Tracking

Run Luo, Zikai Song, Lintao Ma et al.

AAAI 2024paperarXiv:2308.09905
65
citations
#525

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024posterarXiv:2311.07125
65
citations
#526

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024posterarXiv:2312.02439
65
citations
#527

Make RepVGG Greater Again: A Quantization-Aware Approach

Xuesong Nie, Yunfeng Yan, Siyuan Li et al.

AAAI 2024paperarXiv:2212.01593
65
citations
#528

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Giorgio Mariani, Irene Tallini, Emilian Postolache et al.

ICLR 2024posterarXiv:2302.02257
65
citations
#529

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024posterarXiv:2312.11598
64
citations
#530

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024posterarXiv:2404.03645
64
citations
#531

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Weining Ren, Zihan Zhu, Boyang Sun et al.

CVPR 2024posterarXiv:2405.18715
64
citations
#532

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024posterarXiv:2405.11442
64
citations
#533

Open-Vocabulary Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

CVPR 2024posterarXiv:2311.07042
64
citations
#534

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024posterarXiv:2401.02460
64
citations
#535

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024posterarXiv:2403.17920
64
citations
#536

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024posterarXiv:2404.03181
64
citations
#537

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang, Zongzhen Wu, Bo Jiang et al.

AAAI 2024paperarXiv:2211.09648
64
citations
#538

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024posterarXiv:2312.02087
63
citations
#539

GIVT: Generative Infinite-Vocabulary Transformers

Michael Tschannen, Cian Eastwood, Fabian Mentzer

ECCV 2024posterarXiv:2312.02116
63
citations
#540

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Aleksander Holynski et al.

CVPR 2024posterarXiv:2404.01203
63
citations
#541

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Jiuding Sun, Chantal Shaib, Byron Wallace

ICLR 2024spotlightarXiv:2306.11270
63
citations
#542

Grokking as the transition from lazy to rich training dynamics

Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.

ICLR 2024posterarXiv:2310.06110
63
citations
#543

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Yizhi Song, Zhifei Zhang, Zhe Lin et al.

CVPR 2024posterarXiv:2403.10701
63
citations
#544

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

ECCV 2024posterarXiv:2402.14654
63
citations
#545

Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems.

Gabriel Cardoso, Yazid Janati el idrissi, Sylvain Le Corff et al.

ICLR 2024poster
63
citations
#546

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024posterarXiv:2312.10113
63
citations
#547

Language-Image Pre-training with Long Captions

Kecheng Zheng, Yifei Zhang, Wei Wu et al.

ECCV 2024posterarXiv:2403.17007
63
citations
#548

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998
63
citations
#549

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun et al.

CVPR 2024highlightarXiv:2312.02980
62
citations
#550

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024posterarXiv:2402.09812
62
citations
#551

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Junming Chen, Yunfei Liu, Jianan Wang et al.

CVPR 2024posterarXiv:2401.04747
62
citations
#552

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024posterarXiv:2407.14499
62
citations
#553

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346
62
citations
#554

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

ECCV 2024posterarXiv:2403.09508
62
citations
#555

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Song Tang, Wenxin Su, Mao Ye et al.

CVPR 2024posterarXiv:2311.16510
62
citations
#556

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

ECCV 2024posterarXiv:2404.01810
62
citations
#557

Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement

Kai Xu, Rongyu Chen, Gianni Franchi et al.

ICLR 2024posterarXiv:2310.00227
61
citations
#558

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.

CVPR 2024posterarXiv:2402.12259
61
citations
#559

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024paperarXiv:2312.10726
61
citations
#560

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

ECCV 2024posterarXiv:2407.11699
61
citations
#561

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan et al.

CVPR 2024posterarXiv:2404.06351
61
citations
#562

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon et al.

ECCV 2024posterarXiv:2403.14548
61
citations
#563

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024posterarXiv:2312.12730
61
citations
#564

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

CVPR 2024posterarXiv:2312.07509
61
citations
#565

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.

CVPR 2024highlightarXiv:2401.14398
61
citations
#566

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024posterarXiv:2404.01284
61
citations
#567

DePT: Decoupled Prompt Tuning

Ji Zhang, Shihan Wu, Lianli Gao et al.

CVPR 2024posterarXiv:2309.07439
60
citations
#568

Space Group Constrained Crystal Generation

Rui Jiao, Wenbing Huang, Yu Liu et al.

ICLR 2024posterarXiv:2402.03992
60
citations
#569

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Hui Zhang, Sammy Christen, Zicong Fan et al.

ECCV 2024posterarXiv:2403.19649
60
citations
#570

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

Hyeonho Jeong, Jong Chul YE

ICLR 2024oralarXiv:2310.01107
60
citations
#571

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang et al.

ICLR 2024posterarXiv:2312.16108
60
citations
#572

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.

CVPR 2024posterarXiv:2403.18271
60
citations
#573

Toward effective protection against diffusion-based mimicry through score distillation

Haotian Xue, Chumeng Liang, Xiaoyu Wu et al.

ICLR 2024posterarXiv:2311.12832
60
citations
#574

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi, Keefer Rowan et al.

ICML 2024spotlight
60
citations
#575

Driving Everywhere with Large Language Model Policy Adaptation

Boyi Li, Yue Wang, Jiageng Mao et al.

CVPR 2024posterarXiv:2402.05932
59
citations
#576

Point Cloud Pre-training with Diffusion Models

xiao zheng, Xiaoshui Huang, Guofeng Mei et al.

CVPR 2024posterarXiv:2311.14960
59
citations
#577

Diffusion Models for Open-Vocabulary Segmentation

Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.

ECCV 2024posterarXiv:2306.09316
59
citations
#578

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024paperarXiv:2308.13198
59
citations
#579

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024paperarXiv:2312.01878
59
citations
#580

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.

ECCV 2024posterarXiv:2310.07699
59
citations
#581

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

Weiyi Lv, Yuhang Huang, NING Zhang et al.

CVPR 2024posterarXiv:2403.02075
59
citations
#582

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Shen Nie, Hanzhong Guo, Cheng Lu et al.

ICLR 2024posterarXiv:2311.01410
59
citations
#583

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.

ICLR 2024spotlightarXiv:2310.06743
59
citations
#584

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Renjie Pi, Lewei Yao, Jiahui Gao et al.

CVPR 2024highlightarXiv:2311.06612
59
citations
#585

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Hongjie Wang, Bhishma Dedhia, Niraj Jha

CVPR 2024posterarXiv:2305.17328
59
citations
#586

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

ECCV 2024posterarXiv:2404.15264
59
citations
#587

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024posterarXiv:2404.00562
58
citations
#588

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Yufei Guo, Yuanpei Chen, Xiaode Liu et al.

AAAI 2024paperarXiv:2312.06372
58
citations
#589

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations
#590

Seamless Human Motion Composition with Blended Positional Encodings

German Barquero, Sergio Escalera, Cristina Palmero

CVPR 2024posterarXiv:2402.15509
58
citations
#591

DocFormerv2: Local Features for Document Understanding

Srikar Appalaraju, Peng Tang, Qi Dong et al.

AAAI 2024paperarXiv:2306.01733
58
citations
#592

SILC: Improving Vision Language Pretraining with Self-Distillation

Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.

ECCV 2024posterarXiv:2310.13355
58
citations
#593

PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering

Bingheng Li, Erlin Pan, Zhao Kang

AAAI 2024paperarXiv:2312.14438
57
citations
#594

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen, WeiHua Li, Cheng Sun et al.

ECCV 2024posterarXiv:2409.10542
57
citations
#595

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399
57
citations
#596

Language Model Inversion

John X. Morris, Wenting Zhao, Justin Chiu et al.

ICLR 2024posterarXiv:2311.13647
57
citations
#597

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024poster
57
citations
#598

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

ICLR 2024posterarXiv:2303.04488
57
citations
#599

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629
57
citations
#600

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
57
citations