Most Cited 2025 "text-image matching" Papers

22,274 papers found • Page 25 of 112

#4801

SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

Jinpeng Chen, Runmin Cong, Yuzhi Zhao et al.

ICML 2025arXiv:2505.02486
10
citations
#4802

SketchVideo: Sketch-based Video Generation and Editing

Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.

CVPR 2025arXiv:2503.23284
10
citations
#4803

Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers

Hang Zhou, Yuezhou Ma, Haixu Wu et al.

ICML 2025arXiv:2405.17527
10
citations
#4804

SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Yilun Zhao, Kaiyan Zhang, Tiansheng Hu et al.

NEURIPS 2025spotlightarXiv:2507.01001
10
citations
#4805

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Yuyang Peng, Shishi Xiao, Keming Wu et al.

CVPR 2025arXiv:2503.20672
10
citations
#4806

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Qining Zhang, Lei Ying

ICLR 2025arXiv:2409.17401
10
citations
#4807

Node-Time Conditional Prompt Learning in Dynamic Graphs

Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.

ICLR 2025oralarXiv:2405.13937
10
citations
#4808

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

Tianchi Xie, Minzhi Lin, Mengchen Liu et al.

NEURIPS 2025arXiv:2505.19028
10
citations
#4809

Safety Reasoning with Guidelines

Haoyu Wang, Zeyu Qin, Li Shen et al.

ICML 2025arXiv:2502.04040
10
citations
#4810

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Wenhui Liao, Jiapeng Wang, Hongliang Li et al.

CVPR 2025arXiv:2408.15045
10
citations
#4811

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Geigh Zollicoffer, Minh N. Vu, Ben Nebgen et al.

AAAI 2025paperarXiv:2409.08255
10
citations
#4812

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations

Ziyang Zhang, Yang Yu, Yucheng Chen et al.

CVPR 2025arXiv:2503.01019
10
citations
#4813

Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data

Yunhao Tang, Sid Wang, Lovish Madaan et al.

NEURIPS 2025arXiv:2503.19618
10
citations
#4814

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Drew Linsley, Peisen Zhou, Alekh Ashok et al.

ICLR 2025arXiv:2406.04138
10
citations
#4815

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Zhihao Sun, Haoran Jiang, Haoran Chen et al.

NEURIPS 2025arXiv:2411.19466
10
citations
#4816

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

You Li, Fan Ma, Yi Yang

CVPR 2025arXiv:2411.16752
10
citations
#4817

Atlas Gaussians Diffusion for 3D Generation

Haitao Yang, Yuan Dong, Hanwen Jiang et al.

ICLR 2025arXiv:2408.13055
10
citations
#4818

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Zhekai Chen, Ruihang Chu, Yukang Chen et al.

NEURIPS 2025arXiv:2507.18537
9
citations
#4819

VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion

Meng Wang, Huilong Pi, Ruihui Li et al.

AAAI 2025paperarXiv:2503.06219
9
citations
#4820

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

Rui Dai, Sile Hu, Xu Shen et al.

ICLR 2025arXiv:2504.10902
9
citations
#4821

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Fanhu Zeng, Haiyang Guo, Fei Zhu et al.

NEURIPS 2025spotlightarXiv:2502.17159
9
citations
#4822

Quaffure: Real-Time Quasi-Static Neural Hair Simulation

Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov et al.

CVPR 2025arXiv:2412.10061
9
citations
#4823

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

Xuan Liu, Xiaobin Chang

CVPR 2025arXiv:2503.18985
9
citations
#4824

Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?

Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen et al.

ICML 2025arXiv:2502.08991
9
citations
#4825

An Analysis for Reasoning Bias of Language Models with Small Initialization

Junjie Yao, zhongwang zhang, Zhi-Qin John Xu

ICML 2025spotlightarXiv:2502.04375
9
citations
#4826

Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback

Jiaming Ji, Xinyu Chen, Rui Pan et al.

NEURIPS 2025arXiv:2503.17682
9
citations
#4827

GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration

Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.

CVPR 2025arXiv:2411.17687
9
citations
#4828

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.

ICCV 2025arXiv:2411.14001
9
citations
#4829

ADIFF: Explaining audio difference using natural language

Soham Deshmukh, Shuo Han, Rita Singh et al.

ICLR 2025arXiv:2502.04476
9
citations
#4830

TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Yuchen Zeng, Tuan Dinh, Wonjun Kang et al.

ICML 2025spotlightarXiv:2506.05584
9
citations
#4831

Fourier Sliced-Wasserstein Embedding for Multisets and Measures

Tal Amir, Nadav Dym

ICLR 2025arXiv:2405.16519
9
citations
#4832

Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention

Xin Zou, Di Lu, Yizhou Wang et al.

NEURIPS 2025arXiv:2510.02912
9
citations
#4833

Textual Unlearning Gives a False Sense of Unlearning

Jiacheng Du, Zhibo Wang, Jie Zhang et al.

ICML 2025arXiv:2406.13348
9
citations
#4834

Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs

Bowen Tan, Zheng Xu, Eric Xing et al.

ICML 2025arXiv:2503.12347
9
citations
#4835

Fine-Tuning Visual Autogressive Models for Subject-Driven Generation

Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.

ICCV 2025
9
citations
#4836

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Haoran Hao, Jiaming Han, Changsheng Li et al.

CVPR 2025arXiv:2410.13360
9
citations
#4837

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Yehui Tang, Mabiao Long, Junchi Yan

ICLR 2025
9
citations
#4838

Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons

Shahaf Bassan, Ron Eliav, Shlomit Gur

ICLR 2025arXiv:2502.03391
9
citations
#4839

ACE: Anti-Editing Concept Erasure in Text-to-Image Models

Zihao Wang, Yuxiang Wei, Fan Li et al.

CVPR 2025arXiv:2501.01633
9
citations
#4840

Vertical Federated Learning with Missing Features During Training and Inference

Pedro Valdeira, Shiqiang Wang, Yuejie Chi

ICLR 2025arXiv:2410.22564
9
citations
#4841

Markov Persuasion Processes: Learning to Persuade From Scratch

Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.

NEURIPS 2025arXiv:2402.03077
9
citations
#4842

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Yeseul Cho, Baekrok Shin, Changmin Kang et al.

ICML 2025arXiv:2502.06905
9
citations
#4843

Aligning Text to Image in Diffusion Models is Easier Than You Think

Jaa-Yeon Lee, ByungHee Cha, Jeongsol Kim et al.

NEURIPS 2025arXiv:2503.08250
9
citations
#4844

Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis

Jingjing Ren, Wenbo Li, Zhongdao Wang et al.

ICCV 2025arXiv:2504.14470
9
citations
#4845

Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors

Xiao Teng, Long Lan, Dingyao Chen et al.

AAAI 2025paperarXiv:2412.12220
9
citations
#4846

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

Chen Zhao, En Ci, Yunzhe Xu et al.

NEURIPS 2025arXiv:2510.20661
9
citations
#4847

MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval

Haoran Tang, Meng Cao, Jinfa Huang et al.

AAAI 2025paperarXiv:2408.10575
9
citations
#4848

DELIFT: Data Efficient Language model Instruction Fine-Tuning

Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.

ICLR 2025arXiv:2411.04425
9
citations
#4849

Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation

Feilong Tang, Zhongxing Xu, Ming Hu et al.

AAAI 2025paperarXiv:2412.19871
9
citations
#4850

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

Zhao Song, Mingquan Ye, Junze Yin et al.

ICLR 2025arXiv:2306.04169
9
citations
#4851

RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation

Songhao Han, Boxiang Qiu, Yue Liao et al.

NEURIPS 2025oralarXiv:2506.06677
9
citations
#4852

Compositional Risk Minimization

Divyat Mahajan, Mohammad Pezeshki, Charles Arnal et al.

ICML 2025arXiv:2410.06303
9
citations
#4853

TASAR: Transfer-based Attack on Skeletal Action Recognition

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang et al.

ICLR 2025oralarXiv:2409.02483
9
citations
#4854

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

Yibo Zhang, Lihong Wang, Changqing Zou et al.

ICLR 2025arXiv:2405.15305
9
citations
#4855

Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study

Lili Zhao, Yang Wang, Qi Liu et al.

ICLR 2025
9
citations
#4856

Hierarchical Vector Quantization for Unsupervised Action Segmentation

Federico Spurio, Emad Bahrami, Gianpiero Francesca et al.

AAAI 2025paperarXiv:2412.17640
9
citations
#4857

BoA: Attention-aware Post-training Quantization without Backpropagation

Junhan Kim, Ho-young Kim, Eulrang Cho et al.

ICML 2025arXiv:2406.13474
9
citations
#4858

LLM-PySC2: Starcraft II learning environment for Large Language Models

Zongyuan Li, Yanan Ni, Runnan Qi et al.

NEURIPS 2025arXiv:2411.05348
9
citations
#4859

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Yiren Song, Xiaokang Liu, Mike Zheng Shou

ICCV 2025arXiv:2412.14580
9
citations
#4860

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

Yiren Lu, Yunlai Zhou, Disheng Liu et al.

CVPR 2025arXiv:2503.15835
9
citations
#4861

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Arun Reddy, Alexander Martin, Eugene Yang et al.

CVPR 2025arXiv:2503.19009
9
citations
#4862

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks

Gavin McCracken, Gabriela Moisescu-Pareja, Vincent Létourneau et al.

NEURIPS 2025arXiv:2505.18266
9
citations
#4863

Amortized Sampling with Transferable Normalizing Flows

Charlie Tan, Majdi Hassan, Leon Klein et al.

NEURIPS 2025arXiv:2508.18175
9
citations
#4864

SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation

Wenyi Yu, Siyin Wang, Xiaoyu Yang et al.

NEURIPS 2025arXiv:2505.17060
9
citations
#4865

Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling

Guillem Capellera, Antonio Rubio, Luis Ferraz et al.

CVPR 2025arXiv:2503.18589
9
citations
#4866

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Michal Nauman, Marek Cygan, Carmelo Sferrazza et al.

NEURIPS 2025oralarXiv:2505.23150
9
citations
#4867

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Donald Shenaj, Ondrej Bohdal, Mete Ozay et al.

ICCV 2025arXiv:2412.05148
9
citations
#4868

From Debate to Equilibrium: Belief‑Driven Multi‑Agent LLM Reasoning via Bayesian Nash Equilibrium

Yi Xie, Zhanke Zhou, Chentao Cao et al.

ICML 2025arXiv:2506.08292
9
citations
#4869

SapiensID: Foundation for Human Recognition

Minchul Kim, Dingqiang Ye, Yiyang Su et al.

CVPR 2025arXiv:2504.04708
9
citations
#4870

Multi-Focus Image Fusion via Explicit Defocus Blur Modelling

Yuhui Quan, Xi Wan, Zitao Tang et al.

AAAI 2025paper
9
citations
#4871

DIVE: Taming DINO for Subject-Driven Video Editing

Yi Huang, Wei Xiong, He Zhang et al.

ICCV 2025arXiv:2412.03347
9
citations
#4872

SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Chenyu Yang, Shuai Wang, Hangting Chen et al.

NEURIPS 2025arXiv:2506.07634
9
citations
#4873

LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering

Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Manhardt et al.

NEURIPS 2025spotlightarXiv:2505.23158
9
citations
#4874

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.

ICLR 2025arXiv:2410.02740
9
citations
#4875

Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Alexander Tyurin

ICML 2025arXiv:2412.11773
9
citations
#4876

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Miran Heo, Min-Hung Chen, De-An Huang et al.

CVPR 2025arXiv:2501.08326
9
citations
#4877

Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity

Artavazd Maranjyan, Alexander Tyurin, Peter Richtarik

ICML 2025arXiv:2501.16168
9
citations
#4878

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Shilin Xu, Haobo Yuan, Qingyu Shi et al.

ICLR 2025arXiv:2401.10228
9
citations
#4879

NeuralSVG: An Implicit Representation for Text-to-Vector Generation

Sagi Polaczek, Yuval Alaluf, Elad Richardson et al.

ICCV 2025arXiv:2501.03992
9
citations
#4880

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

Jinghan Li, Yuan Gao, Jinda Lu et al.

ICLR 2025arXiv:2410.06549
9
citations
#4881

QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation

Junyi Wu, Zhiteng Li, Zheng Hui et al.

ICCV 2025arXiv:2503.06545
9
citations
#4882

iMoT: Inertial Motion Transformer for Inertial Navigation

Son Minh Nguyen, Duc Viet Le, Paul Havinga

AAAI 2025paperarXiv:2412.12190
9
citations
#4883

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Yi Du, Zhipeng Zhao, Shaoshu Su et al.

CVPR 2025arXiv:2503.14558
9
citations
#4884

See It from My Perspective: How Language Affects Cultural Bias in Image Understanding

Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal et al.

ICLR 2025arXiv:2406.11665
9
citations
#4885

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.

NEURIPS 2025arXiv:2502.06852
9
citations
#4886

HELM: Hierarchical Encoding for mRNA Language Modeling

Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.

ICLR 2025arXiv:2410.12459
9
citations
#4887

MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context

Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.

AAAI 2025paperarXiv:2412.16897
9
citations
#4888

A Generalist Intracortical Motor Decoder

Joel Ye, Fabio Rizzoglio, Xuan Ma et al.

NEURIPS 2025
9
citations
#4889

Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models

Zhaoyi Liu, Huan Zhang

CVPR 2025arXiv:2502.18290
9
citations
#4890

Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

Decheng Liu, Zongqi Wang, Chunlei Peng et al.

AAAI 2025paperarXiv:2407.14367
9
citations
#4891

DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Shipei Zhou, Yuandong Ding, Chi Zhang et al.

AAAI 2025paperarXiv:2501.08565
9
citations
#4892

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

Simon Park, Abhishek Panigrahi, Yun Cheng et al.

ICML 2025arXiv:2501.02669
9
citations
#4893

DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes

Yiyuan Liang, Zhiying Yan, Liqun Chen et al.

AAAI 2025paperarXiv:2412.19458
9
citations
#4894

Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.

ICLR 2025arXiv:2502.02705
9
citations
#4895

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

Yan Wu, Korrawe Karunratanakul, Zhengyi Luo et al.

ICCV 2025highlightarXiv:2504.12540
9
citations
#4896

Continuous Thought Machines

Luke Darlow, Ciaran Regan, Sebastian Risi et al.

NEURIPS 2025oralarXiv:2505.05522
9
citations
#4897

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NEURIPS 2025arXiv:2505.24722
9
citations
#4898

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.

ICLR 2025arXiv:2406.17988
9
citations
#4899

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Daniel Palenicek, Florian Vogt, Joe Watson et al.

NEURIPS 2025arXiv:2502.07523
9
citations
#4900

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NEURIPS 2025arXiv:2506.09049
9
citations
#4901

GAS: Generative Avatar Synthesis from a Single Image

Yixing Lu, Junting Dong, YoungJoong Kwon et al.

ICCV 2025arXiv:2502.06957
9
citations
#4902

MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking

Haolin Qin, Tingfa Xu, Tianhao Li et al.

CVPR 2025arXiv:2503.17699
9
citations
#4903

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting

Yifei Qian, Zhongliang Guo, Bowen Deng et al.

CVPR 2025highlightarXiv:2502.20625
9
citations
#4904

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Yutao Zhu, Zhaoheng Huang, Zhicheng Dou et al.

AAAI 2025paperarXiv:2405.19670
9
citations
#4905

Progressive Compositionality in Text-to-Image Generative Models

Xu Han, Linghao Jin, Xiaofeng Liu et al.

ICLR 2025arXiv:2410.16719
9
citations
#4906

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Xudong Li, Yan Zhang, Yunhang Shen et al.

AAAI 2025paperarXiv:2401.11949
9
citations
#4907

Synthetic Prior for Few-Shot Drivable Head Avatar Inversion

Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.

CVPR 2025arXiv:2501.06903
9
citations
#4908

Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks

Chaoyue Liu, Han Bi, Like Hui et al.

NEURIPS 2025arXiv:2305.08813
9
citations
#4909

Offline Model-Based Optimization by Learning to Rank

Rong-Xi Tan, Ke Xue, Shen-Huan Lyu et al.

ICLR 2025arXiv:2410.11502
9
citations
#4910

Near-Optimal Sample Complexity for MDPs via Anchoring

Jongmin Lee, Mario Bravo, Roberto Cominetti

ICML 2025arXiv:2502.04477
9
citations
#4911

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.

ICLR 2025arXiv:2410.06215
9
citations
#4912

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Daiwei Chen, Yi Chen, Aniket Rege et al.

ICLR 2025
9
citations
#4913

PreciseCam: Precise Camera Control for Text-to-Image Generation

Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.

CVPR 2025arXiv:2501.12910
9
citations
#4914

Multi-Granular Multimodal Clue Fusion for Meme Understanding

Li Zheng, Hao Fei, Ting Dai et al.

AAAI 2025paperarXiv:2503.12560
9
citations
#4915

DiffFNO: Diffusion Fourier Neural Operator

Xiaoyi Liu, Hao Tang

CVPR 2025arXiv:2411.09911
9
citations
#4916

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges

Joshua Fixelle

CVPR 2025arXiv:2504.08710
9
citations
#4917

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.

NEURIPS 2025spotlightarXiv:2506.19004
9
citations
#4918

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Haibo Wang, Bo Feng, Zhengfeng Lai et al.

NEURIPS 2025arXiv:2505.05467
9
citations
#4919

Hyperbolic Category Discovery

Yuanpei Liu, Zhenqi He, Kai Han

CVPR 2025arXiv:2504.06120
9
citations
#4920

Adversarial Generative Flow Network for Solving Vehicle Routing Problems

Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.

ICLR 2025arXiv:2503.01931
9
citations
#4921

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

Peng Chen, Pi Bu, Yingyao Wang et al.

ICCV 2025arXiv:2503.09527
9
citations
#4922

HUMOTO: A 4D Dataset of Mocap Human Object Interactions

Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.

ICCV 2025arXiv:2504.10414
9
citations
#4923

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025arXiv:2505.19678
9
citations
#4924

Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

Junyi Ye, Jingyi Gu, Xinyun Zhao et al.

AAAI 2025paperarXiv:2410.18336
9
citations
#4925

h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform

Toan Nguyen, Kien Do, Duc Kieu et al.

CVPR 2025arXiv:2503.02187
9
citations
#4926

MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

Jaehyun Nam, Jinsung Yoon, Jiefeng Chen et al.

NEURIPS 2025arXiv:2506.15692
9
citations
#4927

Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement

Hesong Li, Ziqi Wu, Ruiwen Shao et al.

CVPR 2025arXiv:2504.02555
9
citations
#4928

Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

Hyunjee Lee, Youngsik Yun, Jeongmin Bae et al.

AAAI 2025paperarXiv:2408.07416
9
citations
#4929

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation

Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.

CVPR 2025arXiv:2411.18042
9
citations
#4930

MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition

Yang Yang, Xunde Dong, Yupeng Qiang

AAAI 2025paperarXiv:2502.12478
9
citations
#4931

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.

ICLR 2025arXiv:2307.09361
9
citations
#4932

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Fali Wang, Hui Liu, Zhenwei Dai et al.

NEURIPS 2025arXiv:2508.00890
9
citations
#4933

Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen

Alessandro Palma, Till Richter, Hanyi Zhang et al.

ICLR 2025arXiv:2407.11734
9
citations
#4934

RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation

Boyuan Cao, Jiaxin Ye, Yujie Wei et al.

NEURIPS 2025spotlightarXiv:2410.06055
9
citations
#4935

DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data

Ruiqi Wu, Xinjie wang, Liu.Liu et al.

NEURIPS 2025arXiv:2505.20460
9
citations
#4936

Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach

Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.

CVPR 2025highlightarXiv:2405.02700
9
citations
#4937

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

Ji Soo Lee, Jongha Kim, Jeehye Na et al.

AAAI 2025paperarXiv:2501.06761
9
citations
#4938

MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data

Yuqin Dai, Zhouheng Yao, Chunfeng Song et al.

ICML 2025arXiv:2502.05034
9
citations
#4939

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

Jisoo Kim, Jungbin Cho, Joonho Park et al.

AAAI 2025paperarXiv:2408.06010
9
citations
#4940

SegLLM: Multi-round Reasoning Segmentation with Large Language Models

Xudong Wang, Shaolun Zhang, Shufan Li et al.

ICLR 2025
9
citations
#4941

Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection

Fanhu Zeng, Zhen Cheng, Fei Zhu et al.

ICLR 2025arXiv:2409.04796
9
citations
#4942

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct

Yutong Wu, Di Huang, Wenxuan Shi et al.

AAAI 2025paperarXiv:2407.05700
9
citations
#4943

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.

CVPR 2025arXiv:2503.01715
9
citations
#4944

Understanding and Improving Length Generalization in Recurrent Models

Ricardo Buitrago Ruiz, Albert Gu

ICML 2025arXiv:2507.02782
9
citations
#4945

Can Textual Gradient Work in Federated Learning?

Minghui Chen, Ruinan Jin, Wenlong Deng et al.

ICLR 2025arXiv:2502.19980
9
citations
#4946

Energy-based Backdoor Defense Against Federated Graph Learning

Guancheng Wan, Zitong Shi, Wenke Huang et al.

ICLR 2025
9
citations
#4947

Gumbel Counterfactual Generation From Language Models

Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson et al.

ICLR 2025arXiv:2411.07180
9
citations
#4948

Combining Cost Constrained Runtime Monitors for AI Safety

Tim Hua, James Baskerville, Henri Lemoine et al.

NEURIPS 2025arXiv:2507.15886
9
citations
#4949

From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes

Long Ma, Zhiyuan Yan, Jin Xu et al.

NEURIPS 2025arXiv:2504.04827
9
citations
#4950

Multimodal Tabular Reasoning with Privileged Structured Information

Jun-Peng Jiang, Yu Xia, Hai-Long Sun et al.

NEURIPS 2025arXiv:2506.04088
9
citations
#4951

NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Yanrui Bin, Wenbo Hu, Haoyuan Wang et al.

ICCV 2025arXiv:2504.11427
9
citations
#4952

Progressive distillation induces an implicit curriculum

Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi et al.

ICLR 2025arXiv:2410.05464
9
citations
#4953

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov et al.

AAAI 2025paperarXiv:2402.19097
9
citations
#4954

Decomposition Polyhedra of Piecewise Linear Functions

Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich

ICLR 2025arXiv:2410.04907
9
citations
#4955

REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman et al.

ICLR 2025arXiv:2412.04759
9
citations
#4956

GENTEEL-NEGOTIATOR: LLM-Enhanced Mixture-of-Expert-Based Reinforcement Learning Approach for Polite Negotiation Dialogue

Priyanshu Priya, Rishikant Chigrupaatii, Mauajama Firdaus et al.

AAAI 2025paper
9
citations
#4957

PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.

ICML 2025oralarXiv:2505.24717
9
citations
#4958

Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

Sayan Banerjee, Krishna Balasubramanian, PROMIT GHOSAL

ICLR 2025arXiv:2409.08469
9
citations
#4959

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

Asmar Nadeem, Faegheh Sardari, Robert Dawes et al.

ICLR 2025oralarXiv:2406.06499
9
citations
#4960

Zero-Shot Styled Text Image Generation, but Make It Autoregressive

Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.

CVPR 2025arXiv:2503.17074
9
citations
#4961

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Jinlu Zhang, Yixin Chen, Zan Wang et al.

CVPR 2025highlightarXiv:2505.24315
9
citations
#4962

Value-Based Deep RL Scales Predictably

Oleh Rybkin, Michal Nauman, Preston Fu et al.

ICML 2025arXiv:2502.04327
9
citations
#4963

MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation

Zhaoning Yu, Hongyang Gao

ICLR 2025arXiv:2405.12519
9
citations
#4964

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Zheyang Xiong, Jack Cai, John Cooper et al.

ICML 2025spotlightarXiv:2410.05603
9
citations
#4965

Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Gen Li, Changxiao Cai

NEURIPS 2025arXiv:2505.21400
9
citations
#4966

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

Qiang Hu, Houqiang Zhong, Zihan Zheng et al.

AAAI 2025paperarXiv:2412.11362
9
citations
#4967

Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph

Xujian Liang, Zhaoquan Gu

AAAI 2025paperarXiv:2501.14300
9
citations
#4968

Distilling Structural Representations into Protein Sequence Models

Jeffrey Ouyang-Zhang, Chengyue Gong, Yue Zhao et al.

ICLR 2025
9
citations
#4969

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

Xiankang He, Guangkai Xu, Bo Zhang et al.

AAAI 2025paperarXiv:2405.15619
9
citations
#4970

LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning About Actions

Adam Ishay, Joohyung Lee

AAAI 2025paperarXiv:2501.00830
9
citations
#4971

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks

Valentyn Boreiko, Alexander Panfilov, Václav Voráček et al.

ICML 2025arXiv:2410.16222
9
citations
#4972

DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning

Qi Cao, Ruiyi Wang, Ruiyi Zhang et al.

NEURIPS 2025arXiv:2505.20241
9
citations
#4973

Multimodal LLM Guided Exploration and Active Mapping using Fisher Information

Wen Jiang, BOSHU LEI, Katrina Ashton et al.

ICCV 2025arXiv:2410.17422
9
citations
#4974

Is Your Video Language Model a Reliable Judge?

Ming Liu, Wensheng Zhang

ICLR 2025arXiv:2503.05977
9
citations
#4975

Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients

Li Lun, Kunyu Feng, Qinglong Ni et al.

CVPR 2025arXiv:2503.03272
9
citations
#4976

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Jiaxiang Gou, Luping Ji, Pei Liu et al.

AAAI 2025paperarXiv:2410.10573
9
citations
#4977

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Xiaoran Liu, Ruixiao Li, Zhigeng Liu et al.

ICLR 2025arXiv:2407.15176
9
citations
#4978

WildFake: A Large-Scale and Hierarchical Dataset for AI-Generated Images Detection

Yan Hong, Jianming Feng, Haoxing Chen et al.

AAAI 2025paper
9
citations
#4979

Prioritized Generative Replay

Ren Wang, Kevin Frans, Pieter Abbeel et al.

ICLR 2025arXiv:2410.18082
9
citations
#4980

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh

NEURIPS 2025arXiv:2505.05758
9
citations
#4981

Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens

Samuele Bortolotti, Emanuele Marconato, Paolo Morettin et al.

NEURIPS 2025arXiv:2502.11245
9
citations
#4982

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

ICLR 2025arXiv:2405.18213
9
citations
#4983

Superposition Yields Robust Neural Scaling

Yizhou Liu, Ziming Liu, Jeff Gore

NEURIPS 2025oralarXiv:2505.10465
9
citations
#4984

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues

Tao He, Lizi Liao, Yixin Cao et al.

AAAI 2025paperarXiv:2412.14584
9
citations
#4985

Mitigating Object Hallucinations via Sentence-Level Early Intervention

Shangpin Peng, Senqiao Yang, Li Jiang et al.

ICCV 2025arXiv:2507.12455
9
citations
#4986

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan Rodriguez, Haotian Zhang, Abhay Puri et al.

NEURIPS 2025arXiv:2505.20793
9
citations
#4987

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

Yabiao Wang, Shuo Wang, Jiangning Zhang et al.

CVPR 2025arXiv:2408.17135
9
citations
#4988

Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video

Junkai Fan, Kun Wang, Zhiqiang Yan et al.

AAAI 2025paperarXiv:2412.11395
9
citations
#4989

Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging

Hongjin Qian, Zheng Liu

NEURIPS 2025spotlightarXiv:2505.09316
9
citations
#4990

InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Sirui Xu, Dongting Li, Yucheng Zhang et al.

CVPR 2025arXiv:2509.09555
9
citations
#4991

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

zhengqiang ZHANG, Ruihuang Li, Lei Zhang

ICLR 2025arXiv:2410.18410
9
citations
#4992

LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.

AAAI 2025paperarXiv:2412.08388
9
citations
#4993

X-Fusion: Introducing New Modality to Frozen Large Language Models

Sicheng Mo, Thao Nguyen, Xun Huang et al.

ICCV 2025arXiv:2504.20996
9
citations
#4994

Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks

Yiyi Chen, Russa Biswas, Heather Lent et al.

AAAI 2025paperarXiv:2408.11749
9
citations
#4995

Beyond Sequence: Impact of Geometric Context for RNA Property Prediction

Junjie Xu, Artem Moskalev, Tommaso Mansi et al.

ICLR 2025arXiv:2410.11933
9
citations
#4996

Secure On-Device Video OOD Detection Without Backpropagation

Li Li, Peilin Cai, Yuxiao Zhou et al.

ICCV 2025arXiv:2503.06166
9
citations
#4997

AudSemThinker: Enhancing Audio-Language Models Through Reasoning over Semantics of Sound

Gijs Wijngaard, Elia Formisano, Michele Esposito et al.

NEURIPS 2025arXiv:2505.14142
9
citations
#4998

M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving

Xuesong Chen, Shaoshuai Shi, Tao Ma et al.

AAAI 2025paperarXiv:2503.18100
9
citations
#4999

Near, far: Patch-ordering enhances vision foundation models' scene understanding

Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.

ICLR 2025arXiv:2408.11054
9
citations
#5000

CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception

Senkang Hu, Yihang Tao, Guowen Xu et al.

AAAI 2025paperarXiv:2412.12000
9
citations