AAAI

5,317 papers tracked across 2 years

Top Papers in AAAI 2025

View all papers →
#1

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025
283
citations
#2

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.

AAAI 2025
137
citations
#3

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025
115
citations
#4

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025
115
citations
#5

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025
106
citations
#6

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai et al.

AAAI 2025
99
citations
#7

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.

AAAI 2025
82
citations
#8

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi et al.

AAAI 2025
81
citations
#9

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Yao Lai, Sungyoung Lee, Guojin Chen et al.

AAAI 2025
79
citations
#10

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025
74
citations
#11

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Wenbin Wang, Liang Ding, Minyan Zeng et al.

AAAI 2025
73
citations
#12

VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool

Chia-Tung Ho, Haoxing Ren, Brucek Khailany

AAAI 2025
72
citations
#13

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Chengsen Wang, Qi Qi, Jingyu Wang et al.

AAAI 2025
72
citations
#14

DiT4Edit: Diffusion Transformer for Image Editing

Kunyu Feng, Yue Ma, Bingyuan Wang et al.

AAAI 2025
69
citations
#15

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Zhen Ye, Peiwen Sun, Jiahe Lei et al.

AAAI 2025
65
citations
#16

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

Yakun Song, Zhuo Chen, Xiaofei Wang et al.

AAAI 2025
64
citations
#17

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

AAAI 2025
63
citations
#18

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Fei Shen, Hu Ye, Sibo Liu et al.

AAAI 2025
62
citations
#19

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.

AAAI 2025
58
citations
#20

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao, Tingfa Xu, Yu Xin et al.

AAAI 2025
55
citations