Most Cited AAAI 2025 "multimodal lms" Papers

3,028 papers found • Page 1 of 16

#1

U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

Chenxin Li, Xinyu Liu, Wuyang Li et al.

AAAI 2025paperarXiv:2406.02918
356
citations
#2

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608
302
citations
#3

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Xiaohuan Pei, Tao Huang, Chang Xu

AAAI 2025paperarXiv:2403.09977
192
citations
#4

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen et al.

AAAI 2025paperarXiv:2407.08136
171
citations
#5

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

Guosheng Zhao, Xiaofeng Wang, Zheng Zhu et al.

AAAI 2025paperarXiv:2403.06845
146
citations
#6

Segment Any 3D Gaussians

Jiazhong Cen, Jiemin Fang, Chen Yang et al.

AAAI 2025paperarXiv:2312.00860
145
citations
#7

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.

AAAI 2025paperarXiv:2311.17179
141
citations
#8

OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On

Yuhao Xu, Tao Gu, Weifeng Chen et al.

AAAI 2025paperarXiv:2403.01779
138
citations
#9

Language Prompt for Autonomous Driving

Dongming Wu, Wencheng Han, Yingfei Liu et al.

AAAI 2025paperarXiv:2309.04379
138
citations
#10

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025paperarXiv:2412.11664
136
citations
#11

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025paperarXiv:2412.16986
129
citations
#12

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

Senqiao Yang, Jiaming Liu, Renrui Zhang et al.

AAAI 2025paperarXiv:2312.14074
116
citations
#13

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025paperarXiv:2403.14520
110
citations
#14

IMAGDressing-v1: Customizable Virtual Dressing

Fei Shen, Xin Jiang, Xin He et al.

AAAI 2025paperarXiv:2407.12705
107
citations
#15

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai et al.

AAAI 2025paperarXiv:2408.09174
105
citations
#16

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment

Chenxi Liu, Qianxiong Xu, Hao Miao et al.

AAAI 2025paperarXiv:2406.01638
100
citations
#17

CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

Peiyuan Liu, Hang Guo, Tao Dai et al.

AAAI 2025paperarXiv:2403.07300
95
citations
#18

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Zhihang Lin, Mingbao Lin, Luxi Lin et al.

AAAI 2025paperarXiv:2405.05803
90
citations
#19

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Yao Lai, Sungyoung Lee, Guojin Chen et al.

AAAI 2025paperarXiv:2405.14918
87
citations
#20

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi et al.

AAAI 2025paperarXiv:2403.00762
84
citations
#21

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025paperarXiv:2408.15978
83
citations
#22

Enhance Vision-Language Alignment with Noise

Sida Huang, Hongyuan Zhang, Xuelong Li

AAAI 2025paperarXiv:2412.10817
82
citations
#23

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.

AAAI 2025paper
82
citations
#24

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Wenbin Wang, Liang Ding, Minyan Zeng et al.

AAAI 2025paperarXiv:2408.15556
81
citations
#25

Mamba YOLO: A Simple Baseline for Object Detection with State Space Model

Zeyu Wang, Chen Li, Huiying Xu et al.

AAAI 2025paperarXiv:2406.05835
80
citations
#26

VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool

Chia-Tung Ho, Haoxing Ren, Brucek Khailany

AAAI 2025paperarXiv:2408.08927
78
citations
#27

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Chengsen Wang, Qi Qi, Jingyu Wang et al.

AAAI 2025paperarXiv:2412.11376
78
citations
#28

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Zhen Ye, Peiwen Sun, Jiahe Lei et al.

AAAI 2025paperarXiv:2408.17175
75
citations
#29

DiT4Edit: Diffusion Transformer for Image Editing

Kunyu Feng, Yue Ma, Bingyuan Wang et al.

AAAI 2025paperarXiv:2411.03286
73
citations
#30

Augmenting Math Word Problems via Iterative Question Composing

Haoxiong Liu, Yifan Zhang, Yifan Luo et al.

AAAI 2025paperarXiv:2401.09003
69
citations
#31

XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning

Linzheng Chai, Jian Yang, Tao Sun et al.

AAAI 2025paperarXiv:2401.07037
66
citations
#32

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

Yakun Song, Zhuo Chen, Xiaofei Wang et al.

AAAI 2025paperarXiv:2401.07333
66
citations
#33

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

AAAI 2025paperarXiv:2403.02333
65
citations
#34

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Weihao Ye, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2409.10197
64
citations
#35

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Fei Shen, Hu Ye, Sibo Liu et al.

AAAI 2025paperarXiv:2407.02482
64
citations
#36

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao, Tingfa Xu, Yu Xin et al.

AAAI 2025paperarXiv:2504.20670
62
citations
#37

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Wenyi Xiao, Ziwei Huang, Leilei Gan et al.

AAAI 2025paperarXiv:2404.14233
61
citations
#38

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

Chuangchuang Tan, Renshuai Tao, Huan Liu et al.

AAAI 2025paperarXiv:2408.09647
61
citations
#39

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.

AAAI 2025paperarXiv:2408.08640
60
citations
#40

Unlocking the Power of LSTM for Long Term Time Series Forecasting

Yaxuan Kong, Zepu Wang, Yuqi Nie et al.

AAAI 2025paperarXiv:2408.10006
59
citations
#41

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Junxian Li, Di Zhang, Xunzhi Wang et al.

AAAI 2025paperarXiv:2408.07246
58
citations
#42

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Yongxin Guo, Jingyu Liu, Mingda Li et al.

AAAI 2025paperarXiv:2405.13382
57
citations
#43

FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation

Qinglun Zhang, Zhen Liu, Haoqiang Fan et al.

AAAI 2025paperarXiv:2412.04987
56
citations
#44

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

Wencheng Han, Dongqian Guo, Cheng-Zhong Xu et al.

AAAI 2025paperarXiv:2401.03641
56
citations
#45

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Tao Wu, Yong Zhang, Xintao Wang et al.

AAAI 2025paperarXiv:2408.13239
55
citations
#46

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chao Pang, Xingxing Weng, Jiang Wu et al.

AAAI 2025paperarXiv:2403.20213
54
citations
#47

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.

AAAI 2025paperarXiv:2402.13904
52
citations
#48

Language Model Can Listen While Speaking

Ziyang Ma, Yakun Song, Chenpeng Du et al.

AAAI 2025paperarXiv:2408.02622
51
citations
#49

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.

AAAI 2025paperarXiv:2405.15304
51
citations
#50

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.

AAAI 2025paperarXiv:2408.13770
51
citations
#51

MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Shuo Yang et al.

AAAI 2025paperarXiv:2402.19407
50
citations
#52

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025paperarXiv:2404.12353
50
citations
#53

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Yuyang Ye, Zhi Zheng, Yishan Shen et al.

AAAI 2025paperarXiv:2408.09698
49
citations
#54

Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation

Jiaqi Chen, Bingqian Lin, Xinmin Liu et al.

AAAI 2025paperarXiv:2407.05890
48
citations
#55

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225
47
citations
#56

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025paperarXiv:2406.15339
46
citations
#57

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

Zican Shi, Jing Hu, Jie Ren et al.

AAAI 2025paperarXiv:2412.10116
46
citations
#58

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
46
citations
#59

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025paperarXiv:2404.14239
46
citations
#60

End-to-End Autonomous Driving Through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong et al.

AAAI 2025paperarXiv:2404.00717
46
citations
#61

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719
44
citations
#62

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Yu Yang, Jianbiao Mei, Yukai Ma et al.

AAAI 2025paperarXiv:2408.14197
43
citations
#63

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Wanggui He, Siming Fu, Mushui Liu et al.

AAAI 2025paperarXiv:2407.07614
43
citations
#64

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval

Zixu Li, Zhiwei Chen, Haokun Wen et al.

AAAI 2025paper
42
citations
#65

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
42
citations
#66

Transformer Layers as Painters

Qi Sun, Marc Pickett, Aakash Kumar Nain et al.

AAAI 2025paperarXiv:2407.09298
42
citations
#67

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
42
citations
#68

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
41
citations
#69

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
41
citations
#70

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
40
citations
#71

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#72

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
39
citations
#73

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
39
citations
#74

Multi-Objective Evolution of Heuristic Using Large Language Model

Shunyu Yao, Fei Liu, Xi Lin et al.

AAAI 2025paperarXiv:2409.16867
39
citations
#75

GFlow: Recovering 4D World from Monocular Video

Shizun Wang, Xingyi Yang, Qiuhong Shen et al.

AAAI 2025paperarXiv:2405.18426
38
citations
#76

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations
#77

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
38
citations
#78

Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.

AAAI 2025paperarXiv:2408.09110
37
citations
#79

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
36
citations
#80

Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.

AAAI 2025paperarXiv:2402.05813
36
citations
#81

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Arian Askari, Christian Poelitz, Xinye Tang

AAAI 2025paperarXiv:2406.12692
36
citations
#82

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Haojun Shi, Suyu Ye, Xinyu Fang et al.

AAAI 2025paperarXiv:2408.12574
36
citations
#83

Read, Watch and Scream! Sound Generation from Text and Video

Yujin Jeong, Yunji Kim, Sanghyuk Chun et al.

AAAI 2025paperarXiv:2407.05551
36
citations
#84

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#85

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
36
citations
#86

SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

AAAI 2025paperarXiv:2502.04347
35
citations
#87

Improving Retrieval Augmented Language Model with Self-Reasoning

Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.

AAAI 2025paperarXiv:2407.19813
35
citations
#88

Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

Yifan Hu, Peiyuan Liu, Peng Zhu et al.

AAAI 2025paperarXiv:2406.03751
35
citations
#89

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
35
citations
#90

LLM-Powered User Simulator for Recommender System

Zijian Zhang, Shuchang Liu, Ziru Liu et al.

AAAI 2025paperarXiv:2412.16984
35
citations
#91

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Derong Xu, Xinhang Li, Ziheng Zhang et al.

AAAI 2025paperarXiv:2412.18537
34
citations
#92

Fair Text-to-Image Diffusion via Fair Mapping

Jia Li, Lijie Hu, Jingfeng Zhang et al.

AAAI 2025paperarXiv:2311.17695
33
citations
#93

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.

AAAI 2025paperarXiv:2405.20535
33
citations
#94

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.

AAAI 2025paperarXiv:2409.01227
33
citations
#95

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang et al.

AAAI 2025paperarXiv:2407.00737
33
citations
#96

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song et al.

AAAI 2025paperarXiv:2407.14078
33
citations
#97

Guided Real Image Dehazing Using YCbCr Color Space

Wenxuan Fang, Junkai Fan, Yu Zheng et al.

AAAI 2025paperarXiv:2412.17496
33
citations
#98

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Qihan Huang, Siming Fu, Jinlong Liu et al.

AAAI 2025paperarXiv:2409.17920
33
citations
#99

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Geon Lee, Wenchao Yu, Kijung Shin et al.

AAAI 2025paperarXiv:2502.11418
33
citations
#100

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.

AAAI 2025paperarXiv:2405.16203
32
citations
#101

PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Qingdong He, Jiangning Zhang, Jinlong Peng et al.

AAAI 2025paperarXiv:2405.15214
32
citations
#102

ACPBench: Reasoning About Action, Change, and Planning

Harsha Kokel, Michael Katz, Kavitha Srinivas et al.

AAAI 2025paperarXiv:2410.05669
32
citations
#103

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.

AAAI 2025paperarXiv:2406.19101
32
citations
#104

Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization

Xuekang Zhu, Xiaochen Ma, Lei Su et al.

AAAI 2025paperarXiv:2412.13753
31
citations
#105

ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement

Mengqi Lei, Haochen Wu, Xinhua Lv et al.

AAAI 2025paperarXiv:2412.08345
31
citations
#106

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

Yuchen Tian, Weixiang Yan, Qian Yang et al.

AAAI 2025paperarXiv:2405.00253
31
citations
#107

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

Can Jin, Tianjin Huang, Yihua Zhang et al.

AAAI 2025paperarXiv:2312.01397
30
citations
#108

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia, Yipei Wang, Yan Li et al.

AAAI 2025paperarXiv:2405.03988
30
citations
#109

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai et al.

AAAI 2025paperarXiv:2412.11023
30
citations
#110

CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models

Zihui Cheng, Qiguang Chen, Jin Zhang et al.

AAAI 2025paperarXiv:2412.12932
30
citations
#111

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao et al.

AAAI 2025paperarXiv:2402.17564
30
citations
#112

Graphic Design with Large Multimodal Model

Yutao Cheng, Zhao Zhang, Maoke Yang et al.

AAAI 2025paperarXiv:2404.14368
29
citations
#113

Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content

Sheng Wu, Dongxiao He, Xiaobao Wang et al.

AAAI 2025paperarXiv:2412.10460
29
citations
#114

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

Weibo Gao, Qi Liu, Linan Yue et al.

AAAI 2025paperarXiv:2501.10332
29
citations
#115

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks

Meng Lou, Yunxiang Fu, Yizhou Yu

AAAI 2025paperarXiv:2409.09649
28
citations
#116

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Haitao Zhou, Chuang Wang, Rui Nie et al.

AAAI 2025paperarXiv:2408.11475
28
citations
#117

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

Xin Yi, Shunfan Zheng, Linlin Wang et al.

AAAI 2025paperarXiv:2412.12497
28
citations
#118

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025paperarXiv:2412.10650
27
citations
#119

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer

Lei Su, Xiaochen Ma, Xuekang Zhu et al.

AAAI 2025paperarXiv:2412.14598
27
citations
#120

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu, Yichen Zhu, Ning Liu et al.

AAAI 2025paperarXiv:2403.06199
27
citations
#121

Perception-Guided Jailbreak Against Text-to-Image Models

Yihao Huang, Le Liang, Tianlin Li et al.

AAAI 2025paperarXiv:2408.10848
27
citations
#122

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
27
citations
#123

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663
27
citations
#124

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

Ming Li, Yongchun Gu, Yi Wang et al.

AAAI 2025paper
27
citations
#125

Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?

Yanchen Xu, Siqi Huang, Hongyuan Zhang et al.

AAAI 2025paperarXiv:2412.08128
27
citations
#126

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2412.13611
26
citations
#127

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267
26
citations
#128

BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen et al.

AAAI 2025paperarXiv:2403.18365
26
citations
#129

DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance

Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.

AAAI 2025paperarXiv:2406.18459
26
citations
#130

Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing

Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.

AAAI 2025paperarXiv:2501.04376
26
citations
#131

DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation

Hourun Li, Yifan Wang, Zhiping Xiao et al.

AAAI 2025paperarXiv:2412.15005
26
citations
#132

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

Lucio La Cava, Andrea Tagarelli

AAAI 2025paperarXiv:2401.07115
26
citations
#133

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma, Zhitao Gao, Qi Chai et al.

AAAI 2025paperarXiv:2409.03155
26
citations
#134

BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion

Huafeng Li, Dayong Su, Qing Cai et al.

AAAI 2025paperarXiv:2412.08050
25
citations
#135

LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation

Qidong Liu, Xian Wu, Wanyu Wang et al.

AAAI 2025paperarXiv:2409.19925
25
citations
#136

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.

AAAI 2025paperarXiv:2412.09278
25
citations
#137

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Yunlong Tang, Daiki Shimada, Jing Bi et al.

AAAI 2025paperarXiv:2403.16276
25
citations
#138

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.

AAAI 2025paperarXiv:2408.10945
25
citations
#139

ParGo: Bridging Vision-Language with Partial and Global Views

An-Lan Wang, Bin Shan, Wei Shi et al.

AAAI 2025paperarXiv:2408.12928
25
citations
#140

MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

Zixuan Gong, Qi Zhang, Guangyin Bao et al.

AAAI 2025paperarXiv:2404.12630
25
citations
#141

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.

AAAI 2025paperarXiv:2408.10682
25
citations
#142

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2507.21606
24
citations
#143

Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs

Jianxiang Yu, Yuxiang Ren, Chenghua Gong et al.

AAAI 2025paperarXiv:2310.09872
24
citations
#144

WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting

Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz

AAAI 2025paperarXiv:2412.17176
24
citations
#145

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang, Han Lai, Xinyu Zhang et al.

AAAI 2025paperarXiv:2409.01577
24
citations
#146

Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training

Yunwei Lan, Zhigao Cui, Chang Liu et al.

AAAI 2025paperarXiv:2503.15017
24
citations
#147

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Kaicheng Yang, Tiancheng Gu, Xiang An et al.

AAAI 2025paperarXiv:2408.09441
24
citations
#148

OpenVIS: Open-vocabulary Video Instance Segmentation

Pinxue Guo, Hao Huang, Peiyang He et al.

AAAI 2025paperarXiv:2305.16835
24
citations
#149

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Yun Qu, Yuhang Jiang, Boyuan Wang et al.

AAAI 2025paperarXiv:2412.11120
24
citations
#150

FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping

Yuzhou Ji, He Zhu, Junshu Tang et al.

AAAI 2025paperarXiv:2406.01916
24
citations
#151

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

Yuhao Wang, Xuehu Liu, Tianyu Yan et al.

AAAI 2025paperarXiv:2412.10707
24
citations
#152

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

AAAI 2025paperarXiv:2412.14528
24
citations
#153

NightHaze: Nighttime Image Dehazing via Self-Prior Learning

Beibei Lin, Yeying Jin, Yan Wending et al.

AAAI 2025paperarXiv:2403.07408
24
citations
#154

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Angela Castillo, Jonas Kohler, Juan C. Pérez et al.

AAAI 2025paperarXiv:2312.12487
23
citations
#155

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
23
citations
#156

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Jian Ma, Yonglin Deng, Chen Chen et al.

AAAI 2025paperarXiv:2407.02252
23
citations
#157

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia et al.

AAAI 2025paperarXiv:2403.02084
23
citations
#158

SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

Zouying Cao, Yifei Yang, Hai Zhao

AAAI 2025paperarXiv:2408.11491
23
citations
#159

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation

Qiming Zhu, Jialun Cao, Yaojie Lu et al.

AAAI 2025paperarXiv:2408.13204
23
citations
#160

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng, Lei Yu, Zhijun Tu et al.

AAAI 2025paperarXiv:2409.19589
23
citations
#161

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

Xun Huang, Ziyu Xu, Hai Wu et al.

AAAI 2025paperarXiv:2408.03677
23
citations
#162

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.

AAAI 2025paperarXiv:2304.11787
22
citations
#163

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh, Shuo Han, Hazim Bukhari et al.

AAAI 2025paperarXiv:2407.18062
22
citations
#164

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Haoyu Wang, Zhilu Zhang, Donglin Di et al.

AAAI 2025paperarXiv:2404.17364
22
citations
#165

FoldToken: Learning Protein Language via Vector Quantization and Beyond

Zhangyang Gao, Cheng Tan, Jue Wang et al.

AAAI 2025paperarXiv:2403.09673
22
citations
#166

Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Yuxiang Ji, Boyong He, Zhuoyue Tan et al.

AAAI 2025paperarXiv:2409.16925
22
citations
#167

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.

AAAI 2025paperarXiv:2409.09345
22
citations
#168

Spatiotemporal-aware Trend-Seasonality Decomposition Network for Traffic Flow Forecasting

Lingxiao Cao, Bin Wang, Guiyuan Jiang et al.

AAAI 2025paperarXiv:2502.12213
21
citations
#169

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

Binyuan Huang, Yuqing Wen, Yucheng Zhao et al.

AAAI 2025paperarXiv:2403.19438
21
citations
#170

3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

Qingyuan Zhou, Weidong Yang, Ben Fei et al.

AAAI 2025paperarXiv:2404.05522
21
citations
#171

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?

Ben Yao, Yazhou Zhang, Qiuchi Li et al.

AAAI 2025paperarXiv:2407.12725
21
citations
#172

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

Shengbin Yue, Siyuan Wang, Wei Chen et al.

AAAI 2025paperarXiv:2407.09893
21
citations
#173

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian et al.

AAAI 2025paperarXiv:2407.20584
21
citations
#174

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Chengyang Ye, Yunzhi Zhuge, Pingping Zhang

AAAI 2025paperarXiv:2412.19492
21
citations
#175

Training on the Benchmark Is Not All You Need

Shiwen Ni, Xiangtao Kong, Chengming Li et al.

AAAI 2025paperarXiv:2409.01790
21
citations
#176

Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification

Haojian Huang, Chuanyu Qin, Zhe Liu et al.

AAAI 2025paperarXiv:2409.00755
21
citations
#177

Argumentative Large Language Models for Explainable and Contestable Claim Verification

Gabriel Freedman, Adam Dejl, Deniz Gorur et al.

AAAI 2025paperarXiv:2405.02079
21
citations
#178

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi, Wenbo Li, Yuechen Zhang et al.

AAAI 2025paperarXiv:2406.16476
21
citations
#179

Hierarchical Classification Auxiliary Network for Time Series Forecasting

Yanru Sun, Zongxia Xie, Dongyue Chen et al.

AAAI 2025paperarXiv:2405.18975
21
citations
#180

LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction

Er Jin, Qihui Feng, Yongli Mou et al.

AAAI 2025paperarXiv:2501.01767
21
citations
#181

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Jiaqi Huang, Zunnan Xu, Ting Liu et al.

AAAI 2025paperarXiv:2501.08580
21
citations
#182

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li et al.

AAAI 2025paperarXiv:2501.05764
20
citations
#183

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

Tingting Liu, Salvatore Giorgi, Ankit Aich et al.

AAAI 2025paperarXiv:2411.12877
20
citations
#184

What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph

Yutao Jiang, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2501.02268
20
citations
#185

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Jixun Yao, Yang Yuguang, Yu Pan et al.

AAAI 2025paperarXiv:2412.04724
20
citations
#186

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Alessio Gravina, Moshe Eliasof, Claudio Gallicchio et al.

AAAI 2025paperarXiv:2405.01009
20
citations
#187

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Michael-Andrei Panaitescu-Liess, Zora Che, Bang An et al.

AAAI 2025paperarXiv:2407.17417
20
citations
#188

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Jintong Hu, Bin Xia, Bin Chen et al.

AAAI 2025paperarXiv:2407.18046
20
citations
#189

Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.

AAAI 2025paperarXiv:2404.14812
20
citations
#190

Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution

Zeyu Xiao, Zhuoyuan Li, Wei Jia

AAAI 2025paper
20
citations
#191

SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

Shuaijie Shen, Chao Wang, Renzhuo Huang et al.

AAAI 2025paperarXiv:2408.14909
20
citations
#192

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768
20
citations
#193

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Yuchi Wang, Junliang Guo, Jianhong Bai et al.

AAAI 2025paperarXiv:2405.15758
20
citations
#194

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025paperarXiv:2412.12628
20
citations
#195

Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening

Jie Huang, Rui Huang, Jinghao Xu et al.

AAAI 2025paperarXiv:2502.04903
20
citations
#196

SAFIRE: Segment Any Forged Image Region

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam et al.

AAAI 2025paperarXiv:2412.08197
20
citations
#197

Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising

Junyi Li, Zhilu Zhang, Wangmeng Zuo

AAAI 2025paperarXiv:2404.07846
19
citations
#198

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.

AAAI 2025paperarXiv:2409.02664
19
citations
#199

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Shaofei Huang, Rui Ling, Hongyu Li et al.

AAAI 2025paperarXiv:2408.15876
19
citations
#200

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

Chao Zeng, Songwei Liu, Yusheng Xie et al.

AAAI 2025paperarXiv:2408.08554
19
citations
PreviousNext