Most Cited AAAI Highlight "long-term consistency" Papers

5,317 papers found • Page 1 of 27

#1

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024paperarXiv:2302.08453
1423
citations
#2

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024paperarXiv:2309.01431
469
citations
#3

Preference Ranking Optimization for Human Alignment

Feifan Song, Bowen Yu, Minghao Li et al.

AAAI 2024paperarXiv:2306.17492
335
citations
#4

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608
283
citations
#5

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024paperarXiv:2304.01186
276
citations
#6

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI 2024paperarXiv:2305.16986
276
citations
#7

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024paperarXiv:2305.14836
267
citations
#8

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
263
citations
#9

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024paperarXiv:2301.11798
259
citations
#10

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366
249
citations
#11

Omni-Kernel Network for Image Restoration

Yuning Cui, Wenqi Ren, Alois Knoll

AAAI 2024paper
235
citations
#12

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi et al.

AAAI 2024paperarXiv:2308.11730
235
citations
#13

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

Songhua Yang, Hanjie Zhao, Senbin Zhu et al.

AAAI 2024paperarXiv:2308.03549
204
citations
#14

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
190
citations
#15

MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting

Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.

AAAI 2024paperarXiv:2401.00423
177
citations
#16

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024paperarXiv:2308.07707
175
citations
#17

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2024paperarXiv:2401.01686
173
citations
#18

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024paperarXiv:2308.11681
156
citations
#19

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

Teng Hu, Jiangning Zhang, Ran Yi et al.

AAAI 2024paperarXiv:2312.05767
140
citations
#20

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.

AAAI 2024paperarXiv:2303.08714
139
citations
#21

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.

AAAI 2025paperarXiv:2311.17179
137
citations
#22

Language Prompt for Autonomous Driving

Dongming Wu, Wencheng Han, Yingfei Liu et al.

AAAI 2025paperarXiv:2309.04379
133
citations
#23

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024paperarXiv:2312.16337
130
citations
#24

SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation

Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.

AAAI 2024paperarXiv:2312.17071
129
citations
#25

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun, Yang Han, Zihan Zhao et al.

AAAI 2024paperarXiv:2308.13149
127
citations
#26

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness

Yu Kang, Xianghui Sun, Liangyu Chen et al.

AAAI 2025paperarXiv:2412.11664
125
citations
#27

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Zhenyu Li, Sunqi Fan, Yu Gu et al.

AAAI 2024paperarXiv:2308.12060
122
citations
#28

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025paperarXiv:2412.16986
115
citations
#29

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue, Jing Zhang, Kun Hu et al.

AAAI 2024paperarXiv:2308.08746
112
citations
#30

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024paperarXiv:2311.13314
108
citations
#31

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.

AAAI 2024paperarXiv:2307.05722
107
citations
#32

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Han Zhao, Min Zhang, Wei Zhao et al.

AAAI 2025paperarXiv:2403.14520
106
citations
#33

TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning

jiexi Liu, Songcan Chen

AAAI 2024paperarXiv:2312.15709
102
citations
#34

Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

AAAI 2024paperarXiv:2309.05305
100
citations
#35

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
100
citations
#36

An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.

AAAI 2024paperarXiv:2312.10325
99
citations
#37

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai et al.

AAAI 2025paperarXiv:2408.09174
99
citations
#38

Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation

Yutong Liu, Haijiang Zhu, Mengting Liu et al.

AAAI 2024paper
98
citations
#39

UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation

Kefu Yi, Kai Luo, Xiaolei Luo et al.

AAAI 2024paperarXiv:2312.08952
97
citations
#40

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
96
citations
#41

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
96
citations
#42

8976 PointAttN: You Only Need Attention for Point Cloud Completion

Jun Wang, Ying Cui, Dongyan Guo et al.

AAAI 2024paper
92
citations
#43

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024paperarXiv:2305.13921
92
citations
#44

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Taylor Sorensen, Liwei Jiang, Jena Hwang et al.

AAAI 2024paperarXiv:2309.00779
91
citations
#45

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Yiding Lu, Yijie Lin, Mouxing Yang et al.

AAAI 2024paperarXiv:2308.11164
90
citations
#46

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024paperarXiv:2402.16897
88
citations
#47

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
87
citations
#48

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305
86
citations
#49

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024paperarXiv:2312.08733
82
citations
#50

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation

Shuanghao Bai, Min Zhang, Wanqi Zhou et al.

AAAI 2024paperarXiv:2312.09553
82
citations
#51

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.

AAAI 2025paper
82
citations
#52

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Yunfan Ye, Yuhang Huang, Renjiao Yi et al.

AAAI 2024paperarXiv:2401.02032
81
citations
#53

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi et al.

AAAI 2025paperarXiv:2403.00762
81
citations
#54

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863
80
citations
#55

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025paperarXiv:2408.15978
79
citations
#56

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Yao Lai, Sungyoung Lee, Guojin Chen et al.

AAAI 2025paperarXiv:2405.14918
79
citations
#57

GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time

Haoran Ye, Jiarui Wang, Helan Liang et al.

AAAI 2024paperarXiv:2312.08224
76
citations
#58

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.

AAAI 2024paperarXiv:2305.15685
76
citations
#59

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

AAAI 2024paperarXiv:2307.16424
76
citations
#60

VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool

Chia-Tung Ho, Haoxing Ren, Brucek Khailany

AAAI 2025paperarXiv:2408.08927
76
citations
#61

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Heng Wang, Jianbo Ma, Santiago Pascual et al.

AAAI 2024paperarXiv:2308.09300
75
citations
#62

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.

AAAI 2024paperarXiv:2312.12142
74
citations
#63

Graph Neural Prompting with Large Language Models

Yijun Tian, Huan Song, Zichen Wang et al.

AAAI 2024paperarXiv:2309.15427
74
citations
#64

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data

Chengsen Wang, Qi Qi, Jingyu Wang et al.

AAAI 2025paperarXiv:2412.11376
74
citations
#65

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Wenbin Wang, Liang Ding, Minyan Zeng et al.

AAAI 2025paperarXiv:2408.15556
73
citations
#66

FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update

Ji Liu, Juncheng Jia, Tianshi Che et al.

AAAI 2024paperarXiv:2312.05770
72
citations
#67

Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks

Yingpeng Du, Di Luo, Rui Yan et al.

AAAI 2024paperarXiv:2307.10747
72
citations
#68

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244
71
citations
#69

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
71
citations
#70

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Malyaban Bal, Abhronil Sengupta

AAAI 2024paperarXiv:2308.10873
70
citations
#71

SkeletonGait: Gait Recognition Using Skeleton Maps

Chao Fan, Jingzhe Ma, Dongyang Jin et al.

AAAI 2024paperarXiv:2311.13444
70
citations
#72

Plug-In Diffusion Model for Sequential Recommendation

Haokai Ma, Ruobing Xie, Lei Meng et al.

AAAI 2024paperarXiv:2401.02913
69
citations
#73

DiT4Edit: Diffusion Transformer for Image Editing

Kunyu Feng, Yue Ma, Bingyuan Wang et al.

AAAI 2025paperarXiv:2411.03286
69
citations
#74

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang, Zongzhen Wu, Bo Jiang et al.

AAAI 2024paperarXiv:2211.09648
69
citations
#75

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Zhen Ye, Peiwen Sun, Jiahe Lei et al.

AAAI 2025paperarXiv:2408.17175
68
citations
#76

Learning to Rank in Generative Retrieval

Yongqi Li, Nan Yang, Liang Wang et al.

AAAI 2024paperarXiv:2306.15222
67
citations
#77

NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields

Junge Zhang, Feihu Zhang, Shaochen Kuang et al.

AAAI 2024paperarXiv:2304.14811
66
citations
#78

Augmenting Math Word Problems via Iterative Question Composing

Haoxiong Liu, Yifan Zhang, Yifan Luo et al.

AAAI 2025paperarXiv:2401.09003
66
citations
#79

DiffusionTrack: Diffusion Model for Multi-Object Tracking

Run Luo, Zikai Song, Lintao Ma et al.

AAAI 2024paperarXiv:2308.09905
66
citations
#80

Make RepVGG Greater Again: A Quantization-Aware Approach

Xuesong Nie, Yunfeng Yan, Siyuan Li et al.

AAAI 2024paperarXiv:2212.01593
65
citations
#81

Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong et al.

AAAI 2025paperarXiv:2403.02333
65
citations
#82

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

Yakun Song, Zhuo Chen, Xiaofei Wang et al.

AAAI 2025paperarXiv:2401.07333
64
citations
#83

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Fei Shen, Hu Ye, Sibo Liu et al.

AAAI 2025paperarXiv:2407.02482
62
citations
#84

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024paperarXiv:2312.10726
61
citations
#85

Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks

Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.

AAAI 2024paperarXiv:2308.06582
61
citations
#86

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Yufei Guo, Yuanpei Chen, Xiaode Liu et al.

AAAI 2024paperarXiv:2312.06372
60
citations
#87

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao, Tingfa Xu, Yu Xin et al.

AAAI 2025paperarXiv:2504.20670
59
citations
#88

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024paperarXiv:2312.01878
59
citations
#89

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024paperarXiv:2308.13198
59
citations
#90

DocFormerv2: Local Features for Document Understanding

Srikar Appalaraju, Peng Tang, Qi Dong et al.

AAAI 2024paperarXiv:2306.01733
58
citations
#91

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Chenpeng Du, Yiwei Guo, Feiyu Shen et al.

AAAI 2024paperarXiv:2306.07547
58
citations
#92

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.

AAAI 2025paperarXiv:2408.08640
58
citations
#93

Correlation Matching Transformation Transformers for UHD Image Restoration

Cong Wang, Jinshan Pan, Wei Wang et al.

AAAI 2024paperarXiv:2406.00629
58
citations
#94

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations
#95

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

Tong Li, Zhaoyang Liu, Yanyan Shen et al.

AAAI 2024paperarXiv:2312.15235
57
citations
#96

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
57
citations
#97

Editing Language Model

Based Knowledge Graph Embeddings

AAAI 2024paperarXiv:2305.14908
57
citations
#98

Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Taeyoon Kwon, Kai Ong, Dongjin Kang et al.

AAAI 2024paperarXiv:2312.07399
57
citations
#99

PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering

Bingheng Li, Erlin Pan, Zhao Kang

AAAI 2024paperarXiv:2312.14438
57
citations
#100

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

8137 Feiyu Zhu, Reid Simmons

AAAI 2024paperarXiv:2303.07033
56
citations
#101

SECap: Speech Emotion Captioning with Large Language Model

Yaoxun Xu, Hangting Chen, Jianwei Yu et al.

AAAI 2024paperarXiv:2312.10381
56
citations
#102

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

Yuqi Zhu, Jia Li, Ge Li et al.

AAAI 2024paperarXiv:2309.02772
56
citations
#103

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Wenfang Yao, Kejing Yin, William Cheung et al.

AAAI 2024paperarXiv:2403.06197
56
citations
#104

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

AAAI 2024paperarXiv:2307.13808
55
citations
#105

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
55
citations
#106

TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning

Siheng Xiong, Yuan Yang, Ali Payani et al.

AAAI 2024paperarXiv:2312.15816
55
citations
#107

VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting

Seunggu Kang, WonJun Moon, Euiyeon Kim et al.

AAAI 2024paperarXiv:2312.16580
54
citations
#108

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chao Pang, Xingxing Weng, Jiang Wu et al.

AAAI 2025paperarXiv:2403.20213
53
citations
#109

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time

Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin

AAAI 2024paperarXiv:2312.12343
53
citations
#110

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

Shu Yin, Peican Zhu, Lianwei Wu et al.

AAAI 2024paperarXiv:2312.05739
53
citations
#111

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Dong Wu, Mingmin Chi, Xuan Zang et al.

AAAI 2024paperarXiv:2309.00526
52
citations
#112

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models

Weihao Ye, Qiong Wu, Wenhao Lin et al.

AAAI 2025paperarXiv:2409.10197
52
citations
#113

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.

AAAI 2025paperarXiv:2408.13770
51
citations
#114

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024paperarXiv:2401.03991
51
citations
#115

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.

AAAI 2025paperarXiv:2405.15304
51
citations
#116

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.

AAAI 2025paperarXiv:2402.13904
50
citations
#117

Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition

Jianyang Xie, Yanda Meng, Yitian Zhao et al.

AAAI 2024paper
50
citations
#118

MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation

Jinfeng Xu, Zheyu Chen, Shuo Yang et al.

AAAI 2025paperarXiv:2402.19407
50
citations
#119

EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

Junjue Wang, Zhuo Zheng, Zihang Chen et al.

AAAI 2024paperarXiv:2312.12222
50
citations
#120

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang, L. F. D’Haro, Yiming Chen et al.

AAAI 2024paperarXiv:2312.15407
49
citations
#121

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Yufeng Huang, Jiji Tang, Zhuo Chen et al.

AAAI 2024paperarXiv:2305.06152
49
citations
#122

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao, Yuxin Mao, Bowen He et al.

AAAI 2024paperarXiv:2308.08288
48
citations
#123

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Hang Hua, Yunlong Tang, Chenliang Xu et al.

AAAI 2025paperarXiv:2404.12353
48
citations
#124

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

Mengke Li, Zhikai HU, Yang Lu et al.

AAAI 2024paperarXiv:2306.06963
48
citations
#125

Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection

Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.

AAAI 2024paper
48
citations
#126

Language Model Can Listen While Speaking

Ziyang Ma, Yakun Song, Chenpeng Du et al.

AAAI 2025paperarXiv:2408.02622
47
citations
#127

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

hongcheng Guo, Jian Yang, Jiaheng Liu et al.

AAAI 2024paperarXiv:2401.04749
47
citations
#128

Image Conductor: Precision Control for Interactive Video Synthesis

Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.

AAAI 2025paperarXiv:2406.15339
46
citations
#129

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma et al.

AAAI 2025paperarXiv:2404.14239
46
citations
#130

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

Chiyu Zhang, Xiaogang Xu, Lei Wang et al.

AAAI 2024paperarXiv:2210.12381
46
citations
#131

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024paper
45
citations
#132

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612
45
citations
#133

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

Yan Wang, Zhixuan Chu, Xin Ouyang et al.

AAAI 2024paper
44
citations
#134

Fine-Grained Prototypes Distillation for Few-Shot Object Detection

Zichen Wang, Bo Yang, Haonan Yue et al.

AAAI 2024paperarXiv:2401.07629
44
citations
#135

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.

AAAI 2024paperarXiv:2309.06933
44
citations
#136

End-to-End Autonomous Driving Through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong et al.

AAAI 2025paperarXiv:2404.00717
44
citations
#137

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

Chenrui Zhang, Lin Liu, Chuyuan Wang et al.

AAAI 2024paperarXiv:2308.12033
43
citations
#138

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Mengzhao Jia, Can Xie, Liqiang Jing

AAAI 2024paperarXiv:2312.10493
43
citations
#139

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style

Shuai Tan, Bin Ji, Ye Pan

AAAI 2024paperarXiv:2403.06365
43
citations
#140

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Pan Wang, Qiang Zhou, Yawen Wu et al.

AAAI 2025paperarXiv:2412.12225
43
citations
#141

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Xinyi He, Mengyu Zhou, Xinrun Xu et al.

AAAI 2024paperarXiv:2312.13671
43
citations
#142

ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval

Zixu Li, Zhiwei Chen, Haokun Wen et al.

AAAI 2025paper
42
citations
#143

Large Language Models Are Neurosymbolic Reasoners

Meng Fang, Shilong Deng, Yudi Zhang et al.

AAAI 2024paperarXiv:2401.09334
41
citations
#144

Attribute-Missing Graph Clustering Network

Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.

AAAI 2024paper
41
citations
#145

Learning to Prompt with Text Only Supervision for Vision-Language Models

Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.

AAAI 2025paperarXiv:2401.02418
41
citations
#146

Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector

An Lao, Qi Zhang, Chongyang Shi et al.

AAAI 2024paperarXiv:2312.11023
41
citations
#147

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.

AAAI 2024paperarXiv:2312.00050
41
citations
#148

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen et al.

AAAI 2025paperarXiv:2412.14719
41
citations
#149

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling

Shimin Zhang, Qu Yang, Chenxiang Ma et al.

AAAI 2024paperarXiv:2308.13250
41
citations
#150

Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion

Shenghong Luo, Xuhang Chen, Weiwen Chen et al.

AAAI 2024paperarXiv:2308.13739
40
citations
#151

A Diffusion-Based Framework for Multi-Class Anomaly Detection

Haoyang He, Jiangning Zhang, Hongxu Chen et al.

AAAI 2024paperarXiv:2312.06607
40
citations
#152

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

Zhihang Liu, Jun Li, Hongtao Xie et al.

AAAI 2024paperarXiv:2312.12155
40
citations
#153

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar, Ali Etemad

AAAI 2024paperarXiv:2211.13929
40
citations
#154

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025paperarXiv:2412.14995
39
citations
#155

Controllable Mind Visual Diffusion Model

Bohan Zeng, Shanglin Li, Xuhui Liu et al.

AAAI 2024paperarXiv:2305.10135
39
citations
#156

No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation

Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.

AAAI 2024paperarXiv:2312.10080
39
citations
#157

StyleSinger: Style Transfer for Out

of-Domain Singing Voice Synthesis

AAAI 2024paperarXiv:2312.10741
39
citations
#158

Text-Guided Molecule Generation with Diffusion Language Model

Haisong Gong, Qiang Liu, Shu Wu et al.

AAAI 2024paperarXiv:2402.13040
39
citations
#159

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

AAAI 2024paperarXiv:2405.04453
39
citations
#160

Multi-Architecture Multi-Expert Diffusion Models

Yunsung Lee, Jin-Young Kim, Hyojun Go et al.

AAAI 2024paperarXiv:2306.04990
39
citations
#161

RATT: A Thought Structure for Coherent and Correct LLM Reasoning

Jinghan Zhang, Xiting Wang, Weijieying Ren et al.

AAAI 2025paperarXiv:2406.02746
39
citations
#162

Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

Soopil Kim, Sion An, Philip Chikontwe et al.

AAAI 2024paperarXiv:2312.13783
38
citations
#163

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

Xiantao Hu, Ying Tai, Xu Zhao et al.

AAAI 2025paperarXiv:2412.15691
38
citations
#164

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Bojia Zi, Shihao Zhao, Xianbiao Qi et al.

AAAI 2025paperarXiv:2403.12035
38
citations
#165

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Clément Chadebec, Onur Tasar, Eyal Benaroche et al.

AAAI 2025paperarXiv:2406.02347
38
citations
#166

Rethinking Reverse Distillation for Multi-Modal Anomaly Detection

Zhihao Gu, Jiangning Zhang, Liang Liu et al.

AAAI 2024paper
38
citations
#167

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Yaoting Wang, Liu Weisong, Guangyao Li et al.

AAAI 2024paperarXiv:2309.07929
38
citations
#168

Latent Space Editing in Transformer-Based Flow Matching

Vincent Tao Hu, Wei Zhang, Meng Tang et al.

AAAI 2024paperarXiv:2312.10825
38
citations
#169

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations
#170

STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation

Liangcai Su, Junwei Pan, Ximei Wang et al.

AAAI 2024paperarXiv:2308.13537
37
citations
#171

Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures

Gehui Xu, Jie Wen, Chengliang Liu et al.

AAAI 2024paper
37
citations
#172

Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning

Shangchao Su, Mingzhao Yang, Bin Li et al.

AAAI 2024paperarXiv:2211.07864
37
citations
#173

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.

AAAI 2025paperarXiv:2407.19548
37
citations
#174

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
37
citations
#175

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
37
citations
#176

MathAttack: Attacking Large Language Models towards Math Solving Ability

Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.

AAAI 2024paperarXiv:2309.01686
37
citations
#177

SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples

Chen Ma, Ningfei Wang, Qi Alfred Chen et al.

AAAI 2024paperarXiv:2312.09520
37
citations
#178

SUTrack: Towards Simple and Unified Single Object Tracking

Xin Chen, Ben Kang, Wanting Geng et al.

AAAI 2025paperarXiv:2412.19138
37
citations
#179

U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Xiang Ma, Xuemei Li, Lexin Fang et al.

AAAI 2024paperarXiv:2401.02236
36
citations
#180

VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding

Guibiao Liao, Jiankun Li, Xiaoqing Ye

AAAI 2024paper
36
citations
#181

DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception

Jiayu Zou, Kun Tian, Zheng Zhu et al.

AAAI 2024paperarXiv:2303.08333
36
citations
#182

Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

Liqi He, Zuchao Li, Xiantao Cai et al.

AAAI 2024paperarXiv:2312.08762
36
citations
#183

Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu et al.

AAAI 2025paperarXiv:2403.02738
36
citations
#184

LION: Implicit Vision Prompt Tuning

Haixin Wang, Jianlong Chang, Yihang Zhai et al.

AAAI 2024paperarXiv:2303.09992
35
citations
#185

Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance

Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.

AAAI 2025paperarXiv:2412.12974
35
citations
#186

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization

Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.

AAAI 2024paperarXiv:2401.06385
35
citations
#187

Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

Shuang Li, Jiangjie Chen, Siyu Yuan et al.

AAAI 2024paperarXiv:2308.13961
35
citations
#188

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

Han Huang, Yulun Wu, Junsheng Zhou et al.

AAAI 2024paperarXiv:2312.13977
35
citations
#189

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022
35
citations
#190

FedMut: Generalized Federated Learning via Stochastic Mutation

Ming Hu, Cao Yue, Anran Li et al.

AAAI 2024paper
35
citations
#191

Exploiting Label Skews in Federated Learning with Model Concatenation

Yiqun Diao, Qinbin Li, Bingsheng He

AAAI 2024paperarXiv:2312.06290
35
citations
#192

6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning

Jiangrong Shen, Wenyao Ni, Qi Xu et al.

AAAI 2024paper
35
citations
#193

Multi-Objective Evolution of Heuristic Using Large Language Model

Shunyu Yao, Fei Liu, Xi Lin et al.

AAAI 2025paperarXiv:2409.16867
34
citations
#194

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025paperarXiv:2409.12753
34
citations
#195

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Yiwen Tang, Ray Zhang, Zoey Guo et al.

AAAI 2024paperarXiv:2310.03059
34
citations
#196

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

Artyom Stitsyuk, Jaesik Choi

AAAI 2025paperarXiv:2412.17323
34
citations
#197

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

Decheng Liu, Xijun Wang, Chunlei Peng et al.

AAAI 2024paperarXiv:2312.11285
34
citations
#198

DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization

Aritra Bhowmick, Mert Kosan, Zexi Huang et al.

AAAI 2024paperarXiv:2312.12697
34
citations
#199

Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning

Jinsong Shi, Pan Gao, Jie Qin

AAAI 2024paperarXiv:2312.06995
34
citations
#200

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Yi Zhang, Ce Zhang, Ke Yu et al.

AAAI 2024paperarXiv:2401.07457
33
citations
PreviousNext