Most Cited AAAI 2025 "reasoning" Papers
3,028 papers found • Page 1 of 16
Conference
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Chenxin Li, Xinyu Liu, Wuyang Li et al.
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Xiaohuan Pei, Tao Huang, Chang Xu
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen et al.
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Guosheng Zhao, Xiaofeng Wang, Zheng Zhu et al.
Segment Any 3D Gaussians
Jiazhong Cen, Jiemin Fang, Chen Yang et al.
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.
OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On
Yuhao Xu, Tao Gu, Weifeng Chen et al.
Language Prompt for Autonomous Driving
Dongming Wu, Wencheng Han, Yingfei Liu et al.
C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen et al.
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection
Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang, Jiaming Liu, Renrui Zhang et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
IMAGDressing-v1: Customizable Virtual Dressing
Fei Shen, Xin Jiang, Xin He et al.
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai et al.
TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment
Chenxi Liu, Qianxiong Xu, Hao Miao et al.
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
Peiyuan Liu, Hang Guo, Tao Dai et al.
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin, Mingbao Lin, Luxi Lin et al.
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Yao Lai, Sungyoung Lee, Guojin Chen et al.
Point Cloud Mamba: Point Cloud Learning via State Space Model
Tao Zhang, Haobo Yuan, Lu Qi et al.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang, Zijian Ma, Yunpu Ma et al.
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.
Enhance Vision-Language Alignment with Noise
Sida Huang, Hongyuan Zhang, Xuelong Li
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng et al.
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
Zeyu Wang, Chen Li, Huiying Xu et al.
VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool
Chia-Tung Ho, Haoxing Ren, Brucek Khailany
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang et al.
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye, Peiwen Sun, Jiahe Lei et al.
DiT4Edit: Diffusion Transformer for Image Editing
Kunyu Feng, Yue Ma, Bingyuan Wang et al.
Augmenting Math Word Problems via Iterative Question Composing
Haoxiong Liu, Yifan Zhang, Yifan Luo et al.
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
Yakun Song, Zhuo Chen, Xiaofei Wang et al.
XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning
Linzheng Chai, Jian Yang, Tao Sun et al.
Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning
Yiming Huang, Xiao Liu, Yeyun Gong et al.
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Fei Shen, Hu Ye, Sibo Liu et al.
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
Yao Xiao, Tingfa Xu, Yu Xin et al.
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao, Ziwei Huang, Leilei Gan et al.
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
Chuangchuang Tan, Renshuai Tao, Huan Liu et al.
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.
Unlocking the Power of LSTM for Long Term Time Series Forecasting
Yaxuan Kong, Zepu Wang, Yuqi Nie et al.
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Junxian Li, Di Zhang, Xunzhi Wang et al.
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Yongxin Guo, Jingyu Liu, Mingda Li et al.
FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation
Qinglun Zhang, Zhen Liu, Haoqiang Fan et al.
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
Wencheng Han, Dongqian Guo, Cheng-Zhong Xu et al.
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Tao Wu, Yong Zhang, Xintao Wang et al.
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Chao Pang, Xingxing Weng, Jiang Wu et al.
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
Language Model Can Listen While Speaking
Ziyang Ma, Yakun Song, Chenpeng Du et al.
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Yuyang Ye, Zhi Zheng, Yishan Shen et al.
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
Jiaqi Chen, Bingqian Lin, Xinmin Liu et al.
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
End-to-End Autonomous Driving Through V2X Cooperation
Haibao Yu, Wenxian Yang, Jiaru Zhong et al.
HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs
Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu, Kai Li, Yue Ma et al.
HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection
Zican Shi, Jing Hu, Jie Ren et al.
Image Conductor: Precision Control for Interactive Video Synthesis
Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen et al.
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
Yu Yang, Jianbiao Mei, Yukai Ma et al.
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He, Siming Fu, Mushui Liu et al.
Transformer Layers as Painters
Qi Sun, Marc Pickett, Aakash Kumar Nain et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Xiantao Hu, Ying Tai, Xu Zhao et al.
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
Zixu Li, Zhiwei Chen, Haokun Wen et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
Multi-Objective Evolution of Heuristic Using Large Language Model
Shunyu Yao, Fei Liu, Xi Lin et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou et al.
GFlow: Recovering 4D World from Monocular Video
Shizun Wang, Xingyi Yang, Qiuhong Shen et al.
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL
Arian Askari, Christian Poelitz, Xinye Tang
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi, Suyu Ye, Xinyu Fang et al.
SCALM: Detecting Bad Practices in Smart Contracts Through LLMs
Zongwei Li, Xiaoqi Li, Wenkai Li et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
Improving Retrieval Augmented Language Model with Self-Reasoning
Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting
Yifan Hu, Peiyuan Liu, Peng Zhu et al.
LLM-Powered User Simulator for Recommender System
Zijian Zhang, Shuchang Liu, Ziru Liu et al.
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
Derong Xu, Xinhang Li, Ziheng Zhang et al.
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
Geon Lee, Wenchao Yu, Kijung Shin et al.
Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Zhen Yang et al.
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Yuxuan Zhang, Qing Zhang, Yiren Song et al.
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang, Siming Fu, Jinlong Liu et al.
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.
Guided Real Image Dehazing Using YCbCr Color Space
Wenxuan Fang, Junkai Fan, Yu Zheng et al.
Fair Text-to-Image Diffusion via Fair Mapping
Jia Li, Lijie Hu, Jingfeng Zhang et al.
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He, Jiangning Zhang, Jinlong Peng et al.
Evolutionary Large Language Model for Automated Feature Transformation
Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.
ACPBench: Reasoning About Action, Change, and Planning
Harsha Kokel, Michael Katz, Kavitha Srinivas et al.
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
Yuchen Tian, Weixiang Yan, Qian Yang et al.
ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement
Mengqi Lei, Haochen Wu, Xinhua Lv et al.
Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization
Xuekang Zhu, Xiaochen Ma, Lei Su et al.
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao et al.
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jian Jia, Yipei Wang, Yan Li et al.
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin, Tianjin Huang, Yihua Zhang et al.
Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai et al.
Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems
Weibo Gao, Qi Liu, Linan Yue et al.
Graphic Design with Large Multimodal Model
Yutao Cheng, Zhao Zhang, Maoke Yang et al.
Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content
Sheng Wu, Dongxiao He, Xiaobao Wang et al.
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Haitao Zhou, Chuang Wang, Rui Nie et al.
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline
Ming Li, Yongchun Gu, Yi Wang et al.
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng et al.
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu, Yichen Zhu, Ning Liu et al.
Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer
Lei Su, Xiaochen Ma, Xuekang Zhu et al.
Perception-Guided Jailbreak Against Text-to-Image Models
Yihao Huang, Le Liang, Tianlin Li et al.
Why Does Dropping Edges Usually Outperform Adding Edges in Graph Contrastive Learning?
Yanchen Xu, Siqi Huang, Hongyuan Zhang et al.
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le et al.
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou, Haote Yang, Dairong Chen et al.
DisCo: Graph-Based Disentangled Contrastive Learning for Cold-Start Cross-Domain Recommendation
Hourun Li, Yifan Wang, Zhiping Xiao et al.
DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance
Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.
Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models
Jie Ma, Zhitao Gao, Qi Chai et al.
BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models
Haitao Li, Qingyao Ai, Jia Chen et al.
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava, Andrea Tagarelli
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation
Qidong Liu, Xian Wu, Wanyu Wang et al.
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang, Bin Shan, Wei Shi et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models
Kazi Hasan Ibn Arif, JinYi Yoon, Dimitrios S. Nikolopoulos et al.
BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion
Huafeng Li, Dayong Su, Qing Cai et al.
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao et al.
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang, Tiancheng Gu, Xiang An et al.
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
Yuzhou Ji, He Zhu, Junshu Tang et al.
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs
Jianxiang Yu, Yuxiang Ren, Chenghua Gong et al.
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Muye Huang, Han Lai, Xinyu Zhang et al.
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
Yuhao Wang, Xuehu Liu, Tianyu Yan et al.
Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training
Yunwei Lan, Zhigao Cui, Chang Liu et al.
NightHaze: Nighttime Image Dehazing via Self-Prior Learning
Beibei Lin, Yeying Jin, Yan Wending et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
OpenVIS: Open-vocabulary Video Instance Segmentation
Pinxue Guo, Hao Huang, Peiyang He et al.
Effective Diffusion Transformer Architecture for Image Super-Resolution
Kun Cheng, Lei Yu, Zhijun Tu et al.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection
Xun Huang, Ziyu Xu, Hai Wu et al.
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma, Yonglin Deng, Chen Chen et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Qiming Zhu, Jialun Cao, Yaojie Lu et al.
Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Yuxiang Ji, Boyong He, Zhuoyue Tan et al.
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh, Shuo Han, Hazim Bukhari et al.
FoldToken: Learning Protein Language via Vector Quantization and Beyond
Zhangyang Gao, Cheng Tan, Jue Wang et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di et al.
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
Chengyang Ye, Yunzhi Zhuge, Pingping Zhang
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification
Haojian Huang, Chuanyu Qin, Zhe Liu et al.
Hierarchical Classification Auxiliary Network for Time Series Forecasting
Yanru Sun, Zongxia Xie, Dongyue Chen et al.
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
Shengbin Yue, Siyuan Wang, Wei Chen et al.
Spatiotemporal-aware Trend-Seasonality Decomposition Network for Traffic Flow Forecasting
Lingxiao Cao, Bin Wang, Guiyuan Jiang et al.
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
Er Jin, Qihui Feng, Yongli Mou et al.
Argumentative Large Language Models for Explainable and Contestable Claim Verification
Gabriel Freedman, Adam Dejl, Deniz Gorur et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
Binyuan Huang, Yuqing Wen, Yucheng Zhao et al.
3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei et al.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Shuwei Shi, Wenbo Li, Yuechen Zhang et al.
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
The Illusion of Empathy: How AI Chatbots Shape Conversation Perception
Tingting Liu, Salvatore Giorgi, Ankit Aich et al.
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An et al.
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems
Alessio Gravina, Moshe Eliasof, Claudio Gallicchio et al.
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai et al.
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Jixun Yao, Yang Yuguang, Yu Pan et al.
Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening
Jie Huang, Rui Huang, Jinghao Xu et al.
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
SAFIRE: Segment Any Forged Image Region
Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam et al.
Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Ziheng Zhou, Jinxing Zhou, Wei Qian et al.
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns
Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.
GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution
Jintong Hu, Bin Xia, Bin Chen et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu, Hao Fei, Liangming Pan et al.
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
Amir Zandieh, Majid Daliri, Insu Han
Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
Junyi Li, Zhilu Zhang, Wangmeng Zuo