Most Cited 2024 "video fidelity" Papers

12,324 papers found • Page 8 of 62

#1401

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024arXiv:2404.03645
64
citations
#1402

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Aaditya Singh, Ted Moskovitz, Feilx Hill et al.

ICML 2024spotlightarXiv:2404.07129
64
citations
#1403

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML 2024arXiv:2405.17730
64
citations
#1404

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

Andries Smit, Nathan Grinsztajn, Paul Duckworth et al.

ICML 2024arXiv:2311.17371
64
citations
#1405

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346
64
citations
#1406

Efficient Dataset Distillation via Minimax Diffusion

Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev et al.

CVPR 2024arXiv:2311.15529
64
citations
#1407

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024arXiv:2312.10113
64
citations
#1408

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

Rui Huang, Songyou Peng, Ayca Takmaz et al.

ECCV 2024arXiv:2312.17232
64
citations
#1409

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun et al.

CVPR 2024highlightarXiv:2312.02980
64
citations
#1410

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed et al.

CVPR 2024arXiv:2404.07449
63
citations
#1411

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Dinghuai Zhang, Ricky T. Q. Chen, Chenghao Liu et al.

ICLR 2024arXiv:2310.02679
63
citations
#1412

Sentence-level Prompts Benefit Composed Image Retrieval

Yang Bai, Xinxing Xu, Yong Liu et al.

ICLR 2024spotlightarXiv:2310.05473
63
citations
#1413

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos

Hongchi Xia, Yang Fu, Sifei Liu et al.

CVPR 2024arXiv:2401.12592
63
citations
#1414

Position: Topological Deep Learning is the New Frontier for Relational Learning

Theodore Papamarkou, Tolga Birdal, Michael Bronstein et al.

ICML 2024arXiv:2402.08871
63
citations
#1415

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024arXiv:2402.09812
63
citations
#1416

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.

CVPR 2024highlightarXiv:2312.01305
63
citations
#1417

Toward effective protection against diffusion-based mimicry through score distillation

Haotian Xue, Chumeng Liang, Xiaoyu Wu et al.

ICLR 2024arXiv:2311.12832
63
citations
#1418

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan et al.

CVPR 2024arXiv:2404.06351
63
citations
#1419

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

Jiaming Liu, Senqiao Yang, Peidong Jia et al.

ICLR 2024arXiv:2306.04344
63
citations
#1420

Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning

Zihan Ding, Chi Jin

ICLR 2024arXiv:2309.16984
63
citations
#1421

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024arXiv:2404.01284
63
citations
#1422

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Chulin Xie, Zinan Lin, Arturs Backurs et al.

ICML 2024spotlightarXiv:2403.01749
63
citations
#1423

Generative Pre-training for Speech with Flow Matching

Alexander Liu, Matthew Le, Apoorv Vyas et al.

ICLR 2024arXiv:2310.16338
63
citations
#1424

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024arXiv:2405.11442
63
citations
#1425

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

ECCV 2024arXiv:2403.09508
63
citations
#1426

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

ECCV 2024arXiv:2404.01810
63
citations
#1427

C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video

Hyunjik Kim, Matthias Bauer, Lucas Theis et al.

CVPR 2024arXiv:2312.02753
63
citations
#1428

Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems.

Gabriel Cardoso, Yazid Janati el idrissi, Sylvain Le Corff et al.

ICLR 2024
63
citations
#1429

3D Facial Expressions through Analysis-by-Neural-Synthesis

George Retsinas, Panagiotis Filntisis, Radek Danecek et al.

CVPR 2024arXiv:2404.04104
63
citations
#1430

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Gang Zhang, Chen Junnan, Guohuan Gao et al.

CVPR 2024arXiv:2403.05817
63
citations
#1431

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Hui Zhang, Sammy Christen, Zicong Fan et al.

ECCV 2024arXiv:2403.19649
63
citations
#1432

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng, Yuxin Chen, Suvrit Sra

ICML 2024arXiv:2312.06528
63
citations
#1433

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong et al.

ECCV 2024arXiv:2403.19495
62
citations
#1434

BoQ: A Place is Worth a Bag of Learnable Queries

Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

CVPR 2024arXiv:2405.07364
62
citations
#1435

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

ICML 2024arXiv:2402.10172
62
citations
#1436

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024arXiv:2312.02087
62
citations
#1437

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon et al.

ECCV 2024arXiv:2403.14548
62
citations
#1438

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399
62
citations
#1439

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

Hoon Kim, Minje Jang, Wonjun Yoon et al.

CVPR 2024highlightarXiv:2402.18848
62
citations
#1440

Getting the most out of your tokenizer for pre-training and domain adaptation

Gautier Dagan, Gabriel Synnaeve, Baptiste Roziere

ICML 2024arXiv:2402.01035
62
citations
#1441

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Renjie Pi, Lewei Yao, Jiahui Gao et al.

CVPR 2024highlightarXiv:2311.06612
62
citations
#1442

All-in-one simulation-based inference

Manuel Gloeckler, Michael Deistler, Christian Weilbach et al.

ICML 2024arXiv:2404.09636
62
citations
#1443

SVGDreamer: Text Guided SVG Generation with Diffusion Model

XiMing Xing, Chuang Wang, Haitao Zhou et al.

CVPR 2024arXiv:2312.16476
62
citations
#1444

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

Guoqi Yu, Jing Zou, Xiaowei Hu et al.

ICML 2024arXiv:2402.12694
62
citations
#1445

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Michael Matthews, Michael Beukman, Benjamin Ellis et al.

ICML 2024spotlightarXiv:2402.16801
62
citations
#1446

Volumetric Environment Representation for Vision-Language Navigation

Liu, Wenguan Wang, Yi Yang

CVPR 2024highlightarXiv:2403.14158
62
citations
#1447

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.

CVPR 2024highlightarXiv:2401.14398
62
citations
#1448

DePT: Decoupled Prompt Tuning

Ji Zhang, Shihan Wu, Lianli Gao et al.

CVPR 2024arXiv:2309.07439
62
citations
#1449

Local All-Pair Correspondence for Point Tracking

Seokju Cho, Jiahui Huang, Jisu Nam et al.

ECCV 2024arXiv:2407.15420
62
citations
#1450

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Junming Chen, Yunfei Liu, Jianan Wang et al.

CVPR 2024arXiv:2401.04747
62
citations
#1451

Negative Label Guided OOD Detection with Pretrained Vision-Language Models

Xue JIANG, Feng Liu, Zhen Fang et al.

ICLR 2024spotlightarXiv:2403.20078
62
citations
#1452

SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins et al.

ECCV 2024arXiv:2403.13064
62
citations
#1453

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Junyuan Hong, Jiachen (Tianhao) Wang, Chenhui Zhang et al.

ICLR 2024spotlightarXiv:2312.03724
62
citations
#1454

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024arXiv:2404.00562
62
citations
#1455

Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks

Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.

AAAI 2024paperarXiv:2308.06582
62
citations
#1456

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash

ICLR 2024oralarXiv:2307.11833
62
citations
#1457

BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks

Frederikke Marin, Felix Teufel, Marc Horlacher et al.

ICLR 2024arXiv:2311.12570
61
citations
#1458

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Pengxiang Ding, Han Zhao, Wenjie Zhang et al.

ECCV 2024arXiv:2312.14457
61
citations
#1459

MuRF: Multi-Baseline Radiance Fields

Haofei Xu, Anpei Chen, Yuedong Chen et al.

CVPR 2024arXiv:2312.04565
61
citations
#1460

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

Weiyi Lv, Yuhang Huang, NING Zhang et al.

CVPR 2024arXiv:2403.02075
61
citations
#1461

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

ICML 2024arXiv:2404.14507
61
citations
#1462

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024paperarXiv:2312.10726
61
citations
#1463

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

CVPR 2024arXiv:2312.07509
61
citations
#1464

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Bohao Peng, Xiaoyang Wu, Li Jiang et al.

CVPR 2024arXiv:2403.14418
61
citations
#1465

Compressing LLMs: The Truth is Rarely Pure and Never Simple

AJAY JAISWAL, Zhe Gan, Xianzhi Du et al.

ICLR 2024arXiv:2310.01382
61
citations
#1466

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024paperarXiv:2312.01878
61
citations
#1467

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

Yuchen Yang, Kwonjoon Lee, Behzad Dariush et al.

ECCV 2024arXiv:2407.10299
61
citations
#1468

ControlRoom3D: Room Generation using Semantic Proxy Rooms

Jonas Schult, Sam Tsai, Lukas Höllein et al.

CVPR 2024arXiv:2312.05208
61
citations
#1469

The Neglected Tails in Vision-Language Models

Shubham Parashar, Tian Liu, Zhiqiu Lin et al.

CVPR 2024arXiv:2401.12425
61
citations
#1470

SKILL-MIX: a Flexible and Expandable Family of Evaluations for AI Models

Dingli Yu, Simran Kaur, Arushi Gupta et al.

ICLR 2024arXiv:2310.17567
61
citations
#1471

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

ECCV 2024arXiv:2404.15264
61
citations
#1472

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Shen Nie, Hanzhong Guo, Cheng Lu et al.

ICLR 2024arXiv:2311.01410
61
citations
#1473

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Oren Kraus, Kian Kenyon-Dean, Saber Saberian et al.

CVPR 2024highlightarXiv:2404.10242
61
citations
#1474

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.

CVPR 2024highlightarXiv:2309.02685
61
citations
#1475

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

che liu, Zhongwei Wan, Cheng Ouyang et al.

ICML 2024arXiv:2403.06659
61
citations
#1476

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.

ICLR 2024spotlightarXiv:2310.06743
61
citations
#1477

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Zhen Qin, Daoyuan Chen, Bingchen Qian et al.

ICML 2024arXiv:2312.06353
61
citations
#1478

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

Tianyu Li, Peijin Jia, Bangjun Wang et al.

ICLR 2024arXiv:2312.16108
61
citations
#1479

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Hongjie Wang, Bhishma Dedhia, Niraj Jha

CVPR 2024arXiv:2305.17328
61
citations
#1480

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Hung Le, Hailin Chen, Amrita Saha et al.

ICLR 2024arXiv:2310.08992
61
citations
#1481

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.

ECCV 2024arXiv:2310.07699
61
citations
#1482

LEAP: Liberate Sparse-View 3D Modeling from Camera Poses

Hanwen Jiang, Zhenyu Jiang, Yue Zhao et al.

ICLR 2024arXiv:2310.01410
61
citations
#1483

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Theodore Papamarkou, Maria Skoularidou, Konstantina Palla et al.

ICML 2024arXiv:2402.00809
60
citations
#1484

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024arXiv:2311.18830
60
citations
#1485

Feedback Loops With Language Models Drive In-Context Reward Hacking

Alexander Pan, Erik Jones, Meena Jagadeesan et al.

ICML 2024arXiv:2402.06627
60
citations
#1486

Loopy-SLAM: Dense Neural SLAM with Loop Closures

Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.

CVPR 2024arXiv:2402.09944
60
citations
#1487

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, Tracy P Y Mok

CVPR 2024arXiv:2404.00658
60
citations
#1488

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Yue Han, Junwei Zhu, Keke He et al.

ECCV 2024arXiv:2405.12970
60
citations
#1489

Test-Time Model Adaptation with Only Forward Passes

Shuaicheng Niu, Chunyan Miao, Guohao Chen et al.

ICML 2024arXiv:2404.01650
60
citations
#1490

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

Hyeonho Jeong, Jong Chul YE

ICLR 2024oralarXiv:2310.01107
60
citations
#1491

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni et al.

ICLR 2024arXiv:2305.15560
60
citations
#1492

DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting

Demin Yu, Xutao Li, Yunming Ye et al.

CVPR 2024arXiv:2312.06734
60
citations
#1493

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.

CVPR 2024arXiv:2403.17749
60
citations
#1494

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Yanxi Chen, Xuchen Pan, Yaliang Li et al.

ICML 2024arXiv:2312.04916
60
citations
#1495

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024arXiv:2404.03507
60
citations
#1496

Seamless Human Motion Composition with Blended Positional Encodings

German Barquero, Sergio Escalera, Cristina Palmero

CVPR 2024arXiv:2402.15509
60
citations
#1497

Lemur: Integrating Large Language Models in Automated Program Verification

Haoze Wu, Clark Barrett, Nina Narodytska

ICLR 2024arXiv:2310.04870
60
citations
#1498

Language Model Inversion

John X. Morris, Wenting Zhao, Justin Chiu et al.

ICLR 2024arXiv:2311.13647
60
citations
#1499

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.

CVPR 2024arXiv:2405.11618
60
citations
#1500

Training-Free Long-Context Scaling of Large Language Models

Chenxin An, Fei Huang, Jun Zhang et al.

ICML 2024arXiv:2402.17463
60
citations
#1501

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
60
citations
#1502

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

Ziqi Gao, Qichao Wang, Aochuan Chen et al.

ICML 2024arXiv:2405.03003
60
citations
#1503

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Yufei Guo, Yuanpei Chen, Xiaode Liu et al.

AAAI 2024paperarXiv:2312.06372
60
citations
#1504

Diffusion Models for Open-Vocabulary Segmentation

Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.

ECCV 2024arXiv:2306.09316
60
citations
#1505

Transformers, parallel computation, and logarithmic depth

Clayton Sanford, Daniel Hsu, Matus Telgarsky

ICML 2024spotlightarXiv:2402.09268
60
citations
#1506

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi, Keefer Rowan et al.

ICML 2024spotlight
60
citations
#1507

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Xinyu Zhan, Lixin Yang, Yifei Zhao et al.

CVPR 2024arXiv:2403.19417
60
citations
#1508

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

Devikalyan Das, Christopher Wewer, Raza Yunus et al.

CVPR 2024arXiv:2312.01196
60
citations
#1509

Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.

CVPR 2024arXiv:2405.06284
59
citations
#1510

SILC: Improving Vision Language Pretraining with Self-Distillation

Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.

ECCV 2024arXiv:2310.13355
59
citations
#1511

Foundation Policies with Hilbert Representations

Seohong Park, Tobias Kreiman, Sergey Levine

ICML 2024oralarXiv:2402.15567
59
citations
#1512

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

Tong Li, Zhaoyang Liu, Yanyan Shen et al.

AAAI 2024paperarXiv:2312.15235
59
citations
#1513

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Chenpeng Du, Yiwei Guo, Feiyu Shen et al.

AAAI 2024paperarXiv:2306.07547
59
citations
#1514

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

CVPR 2024arXiv:2306.12041
59
citations
#1515

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

Shuyang Sun, Runjia Li, Philip H.S. Torr et al.

CVPR 2024arXiv:2312.07661
59
citations
#1516

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629
59
citations
#1517

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

Xuelong Dai, Kaisheng Liang, Bin Xiao

ECCV 2024arXiv:2307.12499
59
citations
#1518

Point Cloud Pre-training with Diffusion Models

xiao zheng, Xiaoshui Huang, Guofeng Mei et al.

CVPR 2024arXiv:2311.14960
59
citations
#1519

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
59
citations
#1520

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Daniel Winter, Matan Cohen, Shlomi Fruchter et al.

ECCV 2024arXiv:2403.18818
59
citations
#1521

Driving Everywhere with Large Language Model Policy Adaptation

Boyi Li, Yue Wang, Jiageng Mao et al.

CVPR 2024arXiv:2402.05932
59
citations
#1522

Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2024arXiv:2403.18447
59
citations
#1523

NeRFiller: Completing Scenes via Generative 3D Inpainting

Ethan Weber, Aleksander Holynski, Varun Jampani et al.

CVPR 2024arXiv:2312.04560
59
citations
#1524

On Diffusion Modeling for Anomaly Detection

Victor Livernoche, Vineet Jain, Yashar Hezaveh et al.

ICLR 2024spotlightarXiv:2305.18593
59
citations
#1525

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Lichang Chen, Jiuhai Chen, Tom Goldstein et al.

ICML 2024arXiv:2306.03082
59
citations
#1526

FFT-Based Dynamic Token Mixer for Vision

Yuki Tatsunami, Masato Taki

AAAI 2024paperarXiv:2303.03932
59
citations
#1527

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
59
citations
#1528

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Hila Manor, Tomer Michaeli

ICML 2024arXiv:2402.10009
59
citations
#1529

Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution

Guangyuan Li, Chen Rao, Juncheng Mo et al.

CVPR 2024arXiv:2404.04785
59
citations
#1530

Decoding-time Realignment of Language Models

Tianlin Liu, Shangmin Guo, Leonardo Martins Bianco et al.

ICML 2024spotlightarXiv:2402.02992
59
citations
#1531

Exploring Target Representations for Masked Autoencoders

xingbin liu, Jinghao Zhou, Tao Kong et al.

ICLR 2024arXiv:2209.03917
59
citations
#1532

Orthogonal Adaptation for Modular Customization of Diffusion Models

Ryan Po, Guandao Yang, Kfir Aberman et al.

CVPR 2024highlightarXiv:2312.02432
59
citations
#1533

COLLIE: Systematic Construction of Constrained Text Generation Tasks

Shunyu Yao, Howard Chen, Austin Hanjie et al.

ICLR 2024arXiv:2307.08689
59
citations
#1534

MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

Jianan Zhou, Zhiguang Cao, Yaoxin Wu et al.

ICML 2024arXiv:2405.01029
59
citations
#1535

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen, WeiHua Li, Cheng Sun et al.

ECCV 2024arXiv:2409.10542
59
citations
#1536

Massive Editing for Large Language Models via Meta Learning

Chenmien Tan, Ge Zhang, Jie Fu

ICLR 2024arXiv:2311.04661
59
citations
#1537

DocFormerv2: Local Features for Document Understanding

Srikar Appalaraju, Peng Tang, Qi Dong et al.

AAAI 2024paperarXiv:2306.01733
58
citations
#1538

MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

Xurui Li, Ziming Huang, Feng Xue et al.

ICLR 2024arXiv:2401.16753
58
citations
#1539

Magnushammer: A Transformer-Based Approach to Premise Selection

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.

ICLR 2024arXiv:2303.04488
58
citations
#1540

On the Origins of Linear Representations in Large Language Models

Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar et al.

ICML 2024arXiv:2403.03867
58
citations
#1541

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Hao Li, Xue Yang, Zhaokai Wang et al.

CVPR 2024arXiv:2312.09238
58
citations
#1542

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

Tomer Garber, Tom Tirer

CVPR 2024arXiv:2312.16519
58
citations
#1543

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

CVPR 2024highlightarXiv:2402.18078
58
citations
#1544

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Zhiheng Xi, Wenxiang Chen, Boyang Hong et al.

ICML 2024arXiv:2402.05808
58
citations
#1545

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations
#1546

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

Yuqi Zhu, Jia Li, Ge Li et al.

AAAI 2024paperarXiv:2309.02772
58
citations
#1547

Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum

Fan Xu, Nan Wang, Hao Wu et al.

AAAI 2024paperarXiv:2312.06441
58
citations
#1548

Multi-View Causal Representation Learning with Partial Observability

Dingling Yao, Danru Xu, Sébastien Lachapelle et al.

ICLR 2024spotlightarXiv:2311.04056
58
citations
#1549

Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation

Hanyang Chi, Jian Pang, Bingfeng Zhang et al.

CVPR 2024arXiv:2405.00378
58
citations
#1550

OWL: A Large Language Model for IT Operations

Hongcheng Guo, Jian Yang, Jiaheng Liu et al.

ICLR 2024arXiv:2309.09298
58
citations
#1551

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Oleksiy Ostapenko, Zhan Su, Edoardo Ponti et al.

ICML 2024arXiv:2405.11157
58
citations
#1552

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

hang yao, Ming LIU, Zhicun Yin et al.

ECCV 2024arXiv:2406.07487
58
citations
#1553

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024arXiv:2407.12276
58
citations
#1554

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang et al.

CVPR 2024arXiv:2405.04940
58
citations
#1555

BRAVE: Broadening the visual encoding of vision-language models

Oguzhan Fatih Kar, Alessio Tonioni, Petra Poklukar et al.

ECCV 2024arXiv:2404.07204
58
citations
#1556

Bilateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An et al.

CVPR 2024arXiv:2403.11270
58
citations
#1557

SECap: Speech Emotion Captioning with Large Language Model

Yaoxun Xu, Hangting Chen, Jianwei Yu et al.

AAAI 2024paperarXiv:2312.10381
58
citations
#1558

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao, Siqi Fan, Yingru Dai et al.

CVPR 2024arXiv:2403.10145
58
citations
#1559

CLLMs: Consistency Large Language Models

Siqi Kou, Lanxiang Hu, Zhezhi He et al.

ICML 2024arXiv:2403.00835
58
citations
#1560

PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering

Bingheng Li, Erlin Pan, Zhao Kang

AAAI 2024paperarXiv:2312.14438
57
citations
#1561

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Omer Dahary, Or Patashnik, Kfir Aberman et al.

ECCV 2024arXiv:2403.16990
57
citations
#1562

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Kaifeng Lyu, Jikai Jin, Zhiyuan Li et al.

ICLR 2024arXiv:2311.18817
57
citations
#1563

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang et al.

ECCV 2024arXiv:2404.11593
57
citations
#1564

An Unforgeable Publicly Verifiable Watermark for Large Language Models

Aiwei Liu, Leyi Pan, Xuming Hu et al.

ICLR 2024arXiv:2307.16230
57
citations
#1565

MemFlow: Optical Flow Estimation and Prediction with Memory

Qiaole Dong, Yanwei Fu

CVPR 2024arXiv:2404.04808
57
citations
#1566

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

8137 Feiyu Zhu, Reid Simmons

AAAI 2024paperarXiv:2303.07033
57
citations
#1567

CogBench: a large language model walks into a psychology lab

Julian Coda-Forno, Marcel Binz, Jane Wang et al.

ICML 2024oralarXiv:2402.18225
57
citations
#1568

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Mohammed Muqeeth, Haokun Liu, Yufan Liu et al.

ICML 2024arXiv:2402.05859
57
citations
#1569

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

CVPR 2024highlightarXiv:2311.17950
57
citations
#1570

Editing Language Model

Based Knowledge Graph Embeddings

AAAI 2024paperarXiv:2305.14908
57
citations
#1571

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024arXiv:2312.04483
57
citations
#1572

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.

ECCV 2024arXiv:2310.17796
57
citations
#1573

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024
57
citations
#1574

Prototypical Information Bottlenecking and Disentangling for Multimodal Cancer Survival Prediction

Yilan Zhang, Yingxue XU, Jianqi Chen et al.

ICLR 2024spotlightarXiv:2401.01646
56
citations
#1575

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Hanan Gani, Shariq Bhat, Muzammal Naseer et al.

ICLR 2024arXiv:2310.10640
56
citations
#1576

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov et al.

ICML 2024arXiv:2402.10198
56
citations
#1577

OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views

Francis Engelmann, Fabian Manhardt, Michael Niemeyer et al.

ICLR 2024arXiv:2404.03650
56
citations
#1578

Latent Guard: a Safety Framework for Text-to-image Generation

Runtao Liu, Ashkan Khakzar, Jindong Gu et al.

ECCV 2024arXiv:2404.08031
56
citations
#1579

Instruction Tuning for Secure Code Generation

Jingxuan He, Mark Vero, Gabriela Krasnopolska et al.

ICML 2024arXiv:2402.09497
56
citations
#1580

Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D

Karran Pandey, Paul Guerrero, Matheus Gadelha et al.

CVPR 2024highlightarXiv:2312.02190
56
citations
#1581

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia, Letian Shi, Zifeng Ding et al.

CVPR 2024arXiv:2311.15977
56
citations
#1582

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Peter Kocsis, Vincent Sitzmann, Matthias Nießner

CVPR 2024arXiv:2312.12274
56
citations
#1583

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

ECCV 2024arXiv:2404.08351
56
citations
#1584

TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding

Yun Liu, Haolin Yang, Xu Si et al.

CVPR 2024arXiv:2401.08399
56
citations
#1585

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Xinhua Cheng, Tianyu Yang, Jianan Wang et al.

ICLR 2024arXiv:2310.11784
56
citations
#1586

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.

CVPR 2024arXiv:2403.10518
56
citations
#1587

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright et al.

ICLR 2024spotlightarXiv:2309.01753
56
citations
#1588

Position: What Can Large Language Models Tell Us about Time Series Analysis

Ming Jin, Yi-Fan Zhang, Wei Chen et al.

ICML 2024arXiv:2402.02713
56
citations
#1589

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

Bencheng Liao, Shaoyu Chen, Bo Jiang et al.

ECCV 2024arXiv:2303.08815
56
citations
#1590

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Jannik Kossen, Yarin Gal, Tom Rainforth

ICLR 2024arXiv:2307.12375
56
citations
#1591

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

kaijie ren, Lei Zhang

CVPR 2024arXiv:2403.11708
56
citations
#1592

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024arXiv:2407.20229
56
citations
#1593

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

Nima Shoghi, Adeesh Kolluru, John Kitchin et al.

ICLR 2024arXiv:2310.16802
56
citations
#1594

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Zicong Fan, Maria Parelli, Maria Kadoglou et al.

CVPR 2024highlightarXiv:2311.18448
56
citations
#1595

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

ECCV 2024arXiv:2407.05878
56
citations
#1596

Controlled Text Generation via Language Model Arithmetic

Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner et al.

ICLR 2024spotlightarXiv:2311.14479
56
citations
#1597

Neural Operators with Localized Integral and Differential Kernels

Miguel Liu-Schiaffini, Julius Berner, Boris Bonev et al.

ICML 2024arXiv:2402.16845
56
citations
#1598

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu et al.

ECCV 2024arXiv:2403.11868
56
citations
#1599

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

Yinan Zheng, Jianxiong Li, Dongjie Yu et al.

ICLR 2024arXiv:2401.10700
56
citations
#1600

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Jianhao Zeng, Dan Song, Weizhi Nie et al.

CVPR 2024arXiv:2311.18405
56
citations