Most Cited 2024 "modal integration" Papers

12,324 papers found • Page 3 of 62

Filters:Most Cited 2024 modal integration Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#401

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553

citations

#402

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

Xiao Wang, Shiao Wang, Chuanming Tang et al.

CVPR 2024posterarXiv:2309.14611

citations

#403

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024posterarXiv:2401.07487

citations

#404

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024highlightarXiv:2403.15241

citations

#405

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Qi Zhao, Shijie Wang, Ce Zhang et al.

ICLR 2024oralarXiv:2307.16368

citations

#406

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.

CVPR 2024posterarXiv:2404.04650

citations

#407

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

Han Zhou, Xingchen Wan, Lev Proleev et al.

ICLR 2024posterarXiv:2309.17249

citations

#408

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli et al.

ICLR 2024spotlightarXiv:2310.10638

citations

#409

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024posterarXiv:2310.08588

citations

#410

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.

ICLR 2024posterarXiv:2401.12242

citations

#411

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024posterarXiv:2310.00034

citations

#412

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.

CVPR 2024posterarXiv:2312.03777

citations

#413

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.

CVPR 2024posterarXiv:2307.07511

citations

#414

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.

CVPR 2024posterarXiv:2312.12490

citations

#415

HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Yuheng Jiang, Zhehao Shen, Penghao Wang et al.

CVPR 2024posterarXiv:2312.03461

citations

#416

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu, Yi Jiang, Qihao Liu et al.

CVPR 2024highlightarXiv:2312.09158

citations

#417

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024posterarXiv:2403.11641

citations

#418

A Benchmark for Learning to Translate a New Language from One Grammar Book

Garrett Tanzer, Mirac Suzgun, Eline Visser et al.

ICLR 2024spotlightarXiv:2309.16575

citations

#419

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Yuhang Huang, Renjiao Yi et al.

AAAI 2024paperarXiv:2401.02032

citations

#420

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024posterarXiv:2401.00374

citations

#421

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024posterarXiv:2307.12732

citations

#422

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.

ICLR 2024posterarXiv:2401.10480

citations

#423

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024posterarXiv:2403.16095

citations

#424

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

CVPR 2024posterarXiv:2403.16131

citations

#425

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863

citations

#426

Amortizing intractable inference in large language models

Edward Hu, Moksh Jain, Eric Elmoznino et al.

ICLR 2024posterarXiv:2310.04363

citations

#427

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.

CVPR 2024posterarXiv:2403.05419

citations

#428

Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia et al.

CVPR 2024highlightarXiv:2403.18036

citations

#429

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024posterarXiv:2310.04562

citations

#430

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Yutong Feng, Biao Gong, Di Chen et al.

CVPR 2024posterarXiv:2311.17002

citations

#431

FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

Jun Xiang, Xuan Gao, Yudong Guo et al.

CVPR 2024posterarXiv:2312.02214

citations

#432

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

Yukun Huang, Jianan Wang, Yukai Shi et al.

ICLR 2024posterarXiv:2306.12422

citations

#433

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Ruoyu Feng, Wenming Weng, Yanhui Wang et al.

CVPR 2024posterarXiv:2309.16496

citations

#434

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024posterarXiv:2311.17135

citations

#435

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024posterarXiv:2407.12366

citations

#436

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.

CVPR 2024posterarXiv:2308.09718

citations

#437

Curiosity-driven Red-teaming for Large Language Models

Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.

ICLR 2024posterarXiv:2402.19464

citations

#438

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024posterarXiv:2401.00834

citations

#439

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Jinyi Hu, Yuan Yao, Chongyi Wang et al.

ICLR 2024spotlightarXiv:2308.12038

citations

#440

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024poster

citations

#441

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

AAAI 2024paperarXiv:2307.16424

citations

#442

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024posterarXiv:2401.08209

citations

#443

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685

citations

#444

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024posterarXiv:2312.11557

citations

#445

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

Sixian Zhang, Bohan Wang, Junqiang Wu et al.

CVPR 2024posterarXiv:2405.14705

citations

#446

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024posterarXiv:2312.07472

citations

#447

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

Haoran Ye, Jiarui Wang, Helan Liang et al.

AAAI 2024paperarXiv:2312.08224

citations

#448

LLM-grounded Video Diffusion Models

Long Lian, Baifeng Shi, Adam Yala et al.

ICLR 2024oralarXiv:2309.17444

citations

#449

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024posterarXiv:2402.16846

citations

#450

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024posterarXiv:2403.07719

citations

#451

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024posterarXiv:2403.12019

citations

#452

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024posterarXiv:2312.11360

citations

#453

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024posterarXiv:2402.19231

citations

#454

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024posterarXiv:2311.10959

citations

#455

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024posterarXiv:2403.08730

citations

#456

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024posterarXiv:2405.05967

citations

#457

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024posterarXiv:2407.09359

citations

#458

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.

AAAI 2024paperarXiv:2312.12142

citations

#459

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024posterarXiv:2312.00878

citations

#460

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024posterarXiv:2403.11831

citations

#461

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.

ICLR 2024posterarXiv:2310.00076

citations

#462

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Ziyue Jiang, Jinglin Liu, Yi Ren et al.

ICLR 2024posterarXiv:2307.07218

citations

#463

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300

citations

#464

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024posterarXiv:2405.11643

citations

#465

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Yuchuan Tian, Hanting Chen, Xutao Wang et al.

ICLR 2024spotlightarXiv:2305.18149

citations

#466

Graph Neural Prompting with Large Language Models

Yijun Tian, Huan Song, Zichen Wang et al.

AAAI 2024paperarXiv:2309.15427

citations

#467

Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz, Aaditya Singh, DJ Strouse et al.

ICLR 2024spotlightarXiv:2310.04373

citations

#468

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024posterarXiv:2312.01919

citations

#469

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024posterarXiv:2403.16034

citations

#470

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024posterarXiv:2404.03613

citations

#471

Towards 3D Molecule-Text Interpretation in Language Models

Sihang Li, Zhiyuan Liu, Yanchen Luo et al.

ICLR 2024posterarXiv:2401.13923

citations

#472

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024posterarXiv:2311.16926

citations

#473

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

Yingpeng Du, Di Luo, Rui Yan et al.

AAAI 2024paperarXiv:2307.10747

citations

#474

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024posterarXiv:2405.01527

citations

#475

Elucidating the Exposure Bias in Diffusion Models

Mang Ning, Mingxiao Li, Jianlin Su et al.

ICLR 2024posterarXiv:2308.15321

citations

#476

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024posterarXiv:2403.05139

citations

#477

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao et al.

CVPR 2024posterarXiv:2404.06842

citations

#478

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

CVPR 2024posterarXiv:2310.10343

citations

#479

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024posterarXiv:2401.01885

citations

#480

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing

Yafei Zhang, Shen Zhou, Huafeng Li

CVPR 2024posterarXiv:2403.01105

citations

#481

SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution

mingjun zheng, Long Sun, Jiangxin Dong et al.

ECCV 2024poster

citations

#482

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024posterarXiv:2403.19522

citations

#483

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244

citations

#484

FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update

Ji Liu, Juncheng Jia, Tianshi Che et al.

AAAI 2024paperarXiv:2312.05770

citations

#485

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024posterarXiv:2311.16512

citations

#486

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.

ECCV 2024posterarXiv:2405.06228

citations

#487

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302

citations

#488

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024posterarXiv:2404.08531

citations

#489

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Malyaban Bal, Abhronil Sengupta

AAAI 2024paperarXiv:2308.10873

citations

#490

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496

citations

#491

PromptTTS 2: Describing and Generating Voices with Text Prompt

Yichong Leng, ZHifang Guo, Kai Shen et al.

ICLR 2024posterarXiv:2309.02285

citations

#492

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024posterarXiv:2403.12550

citations

#493

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024posterarXiv:2403.13043

citations

#494

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024posterarXiv:2402.17300

citations

#495

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlight

citations

#496

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

YEFEI HE, Jing Liu, Weijia Wu et al.

ICLR 2024oralarXiv:2310.03270

citations

#497

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024posterarXiv:2401.12244

citations

#498

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024posterarXiv:2409.01686

citations

#499

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024highlightarXiv:2312.11461

citations

#500

Plug-In Diffusion Model for Sequential Recommendation

Haokai Ma, Ruobing Xie, Lei Meng et al.

AAAI 2024paperarXiv:2401.02913

citations

#501

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024posterarXiv:2311.15855

citations

#502

Learning to Act without Actions

Dominik Schmidt, Minqi Jiang

ICLR 2024oralarXiv:2312.10812

citations

#503

SolidGen: An Autoregressive Model for Direct B-rep Synthesis

Karl Willis, Joseph Lambourne, Nigel Morris et al.

ICLR 2024poster

citations

#504

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.

ECCV 2024posterarXiv:2403.08551

citations

#505

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.

CVPR 2024posterarXiv:2405.12979

citations

#506

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Shijie Zhou, Zhiwen Fan, Dejia Xu et al.

ECCV 2024posterarXiv:2404.06903

citations

#507

Free3D: Consistent Novel View Synthesis without 3D Representation

Chuanxia Zheng, Andrea Vedaldi

CVPR 2024posterarXiv:2312.04551

citations

#508

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.

CVPR 2024posterarXiv:2312.11994

citations

#509

On the Learnability of Watermarks for Language Models

Chenchen Gu, XIANG LI, Percy Liang et al.

ICLR 2024posterarXiv:2312.04469

citations

#510

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024posterarXiv:2407.12442

citations

#511

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024posterarXiv:2311.16194

citations

#512

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024oralarXiv:2310.08887

citations

#513

Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic

Sachin Goyal, Pratyush Maini, Zachary Lipton et al.

CVPR 2024posterarXiv:2404.07177

citations

#514

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen et al.

ECCV 2024posterarXiv:2404.14351

citations

#515

Learning to Rank in Generative Retrieval

Yongqi Li, Nan Yang, Liang Wang et al.

AAAI 2024paperarXiv:2306.15222

citations

#516

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

ECCV 2024posterarXiv:2406.01597

citations

#517

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024posterarXiv:2403.17346

citations

#518

OneRestore: A Universal Restoration Framework for Composite Degradation

Yu Guo, Yuan Gao, Yuxu Lu et al.

ECCV 2024posterarXiv:2407.04621

citations

#519

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024posterarXiv:2311.11666

citations

#520

Deep Temporal Graph Clustering

Meng Liu, Yue Liu, KE LIANG et al.

ICLR 2024oralarXiv:2305.10738

citations

#521

FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Zhen Liu, Hao Zhu, Qi Zhang et al.

CVPR 2024posterarXiv:2312.02434

citations

#522

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Kaiwen Zhang, Yifan Zhou, Xudong XU et al.

CVPR 2024posterarXiv:2312.07409

citations

#523

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

Junge Zhang, Feihu Zhang, Shaochen Kuang et al.

AAAI 2024paperarXiv:2304.14811

citations

#524

DiffusionTrack: Diffusion Model for Multi-Object Tracking

Run Luo, Zikai Song, Lintao Ma et al.

AAAI 2024paperarXiv:2308.09905

citations

#525

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024posterarXiv:2311.07125

citations

#526

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024posterarXiv:2312.02439

citations

#527

Make RepVGG Greater Again: A Quantization-Aware Approach

Xuesong Nie, Yunfeng Yan, Siyuan Li et al.

AAAI 2024paperarXiv:2212.01593

citations

#528

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Giorgio Mariani, Irene Tallini, Emilian Postolache et al.

ICLR 2024posterarXiv:2302.02257

citations

#529

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024posterarXiv:2312.11598

citations

#530

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024posterarXiv:2404.03645

citations

#531

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Weining Ren, Zihan Zhu, Boyang Sun et al.

CVPR 2024posterarXiv:2405.18715

citations

#532

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024posterarXiv:2405.11442

citations

#533

Open-Vocabulary Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

CVPR 2024posterarXiv:2311.07042

citations

#534

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024posterarXiv:2401.02460

citations

#535

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024posterarXiv:2403.17920

citations

#536

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024posterarXiv:2404.03181

citations

#537

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang, Zongzhen Wu, Bo Jiang et al.

AAAI 2024paperarXiv:2211.09648

citations

#538

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024posterarXiv:2312.02087

citations

#539

GIVT: Generative Infinite-Vocabulary Transformers

Michael Tschannen, Cian Eastwood, Fabian Mentzer

ECCV 2024posterarXiv:2312.02116

citations

#540

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Aleksander Holynski et al.

CVPR 2024posterarXiv:2404.01203

citations

#541

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Jiuding Sun, Chantal Shaib, Byron Wallace

ICLR 2024spotlightarXiv:2306.11270

citations

#542

Grokking as the transition from lazy to rich training dynamics

Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.

ICLR 2024posterarXiv:2310.06110

citations

#543

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Yizhi Song, Zhifei Zhang, Zhe Lin et al.

CVPR 2024posterarXiv:2403.10701

citations

#544

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

ECCV 2024posterarXiv:2402.14654

citations

#545

Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems.

Gabriel Cardoso, Yazid Janati el idrissi, Sylvain Le Corff et al.

ICLR 2024poster

citations

#546

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024posterarXiv:2312.10113

citations

#547

Language-Image Pre-training with Long Captions

Kecheng Zheng, Yifei Zhang, Wei Wu et al.

ECCV 2024posterarXiv:2403.17007

citations

#548

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998

citations

#549

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun et al.

CVPR 2024highlightarXiv:2312.02980

citations

#550

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024posterarXiv:2402.09812

citations

#551

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Junming Chen, Yunfei Liu, Jianan Wang et al.

CVPR 2024posterarXiv:2401.04747

citations

#552

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024posterarXiv:2407.14499

citations

#553

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346

citations

#554

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

ECCV 2024posterarXiv:2403.09508

citations

#555

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Song Tang, Wenxin Su, Mao Ye et al.

CVPR 2024posterarXiv:2311.16510

citations

#556

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

ECCV 2024posterarXiv:2404.01810

citations

#557

Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement

Kai Xu, Rongyu Chen, Gianni Franchi et al.

ICLR 2024posterarXiv:2310.00227

citations

#558

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.

CVPR 2024posterarXiv:2402.12259

citations

#559

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024paperarXiv:2312.10726

citations

#560

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

ECCV 2024posterarXiv:2407.11699

citations

#561

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan et al.

CVPR 2024posterarXiv:2404.06351

citations

#562

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon et al.

ECCV 2024posterarXiv:2403.14548

citations

#563

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024posterarXiv:2312.12730

citations

#564

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

CVPR 2024posterarXiv:2312.07509

citations

#565

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.

CVPR 2024highlightarXiv:2401.14398

citations

#566

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024posterarXiv:2404.01284

citations

#567

DePT: Decoupled Prompt Tuning

Ji Zhang, Shihan Wu, Lianli Gao et al.

CVPR 2024posterarXiv:2309.07439

citations

#568

Space Group Constrained Crystal Generation

Rui Jiao, Wenbing Huang, Yu Liu et al.

ICLR 2024posterarXiv:2402.03992

citations

#569

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Hui Zhang, Sammy Christen, Zicong Fan et al.

ECCV 2024posterarXiv:2403.19649

citations

#570

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

Hyeonho Jeong, Jong Chul YE

ICLR 2024oralarXiv:2310.01107

citations

#571

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang et al.

ICLR 2024posterarXiv:2312.16108

citations

#572

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.

CVPR 2024posterarXiv:2403.18271

citations

#573

Toward effective protection against diffusion-based mimicry through score distillation

Haotian Xue, Chumeng Liang, Xiaoyu Wu et al.

ICLR 2024posterarXiv:2311.12832

citations

#574

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi, Keefer Rowan et al.

ICML 2024spotlight

citations

#575

Driving Everywhere with Large Language Model Policy Adaptation

Boyi Li, Yue Wang, Jiageng Mao et al.

CVPR 2024posterarXiv:2402.05932

citations

#576

Point Cloud Pre-training with Diffusion Models

xiao zheng, Xiaoshui Huang, Guofeng Mei et al.

CVPR 2024posterarXiv:2311.14960

citations

#577

Diffusion Models for Open-Vocabulary Segmentation

Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.

ECCV 2024posterarXiv:2306.09316

citations

#578

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024paperarXiv:2308.13198

citations

#579

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024paperarXiv:2312.01878

citations

#580

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.

ECCV 2024posterarXiv:2310.07699

citations

#581

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

Weiyi Lv, Yuhang Huang, NING Zhang et al.

CVPR 2024posterarXiv:2403.02075

citations

#582

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Shen Nie, Hanzhong Guo, Cheng Lu et al.

ICLR 2024posterarXiv:2311.01410

citations

#583

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.

ICLR 2024spotlightarXiv:2310.06743

citations

#584

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Renjie Pi, Lewei Yao, Jiahui Gao et al.

CVPR 2024highlightarXiv:2311.06612

citations

#585

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Hongjie Wang, Bhishma Dedhia, Niraj Jha

CVPR 2024posterarXiv:2305.17328

citations

#586

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

ECCV 2024posterarXiv:2404.15264

citations

#587

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024posterarXiv:2404.00562

citations

#588

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Yufei Guo, Yuanpei Chen, Xiaode Liu et al.

AAAI 2024paperarXiv:2312.06372

citations

#589

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318

citations

#590

Seamless Human Motion Composition with Blended Positional Encodings

German Barquero, Sergio Escalera, Cristina Palmero

CVPR 2024posterarXiv:2402.15509

citations

#591

DocFormerv2: Local Features for Document Understanding

Srikar Appalaraju, Peng Tang, Qi Dong et al.

AAAI 2024paperarXiv:2306.01733

citations

#592

SILC: Improving Vision Language Pretraining with Self-Distillation

Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.

ECCV 2024posterarXiv:2310.13355

citations

#593

PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering

Bingheng Li, Erlin Pan, Zhao Kang

AAAI 2024paperarXiv:2312.14438

citations

#594

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen, WeiHua Li, Cheng Sun et al.

ECCV 2024posterarXiv:2409.10542

citations

#595

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399

citations

#596

Language Model Inversion

John X. Morris, Wenting Zhao, Justin Chiu et al.

ICLR 2024posterarXiv:2311.13647

citations

#597

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024poster

citations

#598

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

ICLR 2024posterarXiv:2303.04488

citations

#599

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629

citations

#600

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137

citations

← Previous

1 2 3 4 5...62