Research Alpha Leak - Rising Stars in Research

#1

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024

1,423

citations

#2

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024

458

citations

#3

Preference Ranking Optimization for Human Alignment

Feifan Song, Bowen Yu, Minghao Li et al.

AAAI 2024

334

citations

#4

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024

276

citations

#5

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI 2024

276

citations

#6

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024

266

citations

#7

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024

259

citations

#8

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024

256

citations

#9

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024

240

citations

#10

Omni-Kernel Network for Image Restoration

Yuning Cui, Wenqi Ren, Alois Knoll

AAAI 2024

235

citations

#11

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi et al.

AAAI 2024

231

citations

#12

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

Songhua Yang, Hanjie Zhao, Senbin Zhu et al.

AAAI 2024

204

citations

#13

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024

190

citations

#14

MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting

Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.

AAAI 2024

177

citations

#15

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2024

173

citations

#16

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024

170

citations

#17

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024

156

citations

#18

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.

AAAI 2024

139

citations

#19

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024

130

citations

#20

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Liangtai Sun, Yang Han, Zihan Zhao et al.

AAAI 2024

127

citations

AAAI

Top Papers in AAAI 2024

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Benchmarking Large Language Models in Retrieval-Augmented Generation

Preference Ranking Optimization for Human Alignment

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Detecting and Preventing Hallucinations in Large Vision Language Models

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Omni-Kernel Network for Image Restoration

Knowledge Graph Prompting for Multi-Document Question Answering

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

ResDiff: Combining CNN and Diffusion Model for Image Super-resolution

Task Contamination: Language Models May Not Be Few-Shot Anymore

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research