Most Cited AAAI Paper "generalized reward policy optimization" Papers
5,317 papers found • Page 1 of 27
Conference
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
Chong Mou, Xintao Wang, Liangbin Xie et al.
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos
Yue Ma, Yingqing HE, Xiaodong Cun et al.
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Gengze Zhou, Yicong Hong, Qi Wu
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer
Junde Wu, Wei Ji, Huazhu Fu et al.
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Omni-Kernel Network for Image Restoration
Yuning Cui, Wenqi Ren, Alois Knoll
Knowledge Graph Prompting for Multi-Document Question Answering
Yu Wang, Nedim Lipka, Ryan A. Rossi et al.
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting
Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.
Fast Machine Unlearning without Retraining through Selective Synaptic Dampening
Jack Foster, Stefan Schoepf, Alexandra Brintrup
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
Teng Hu, Jiangning Zhang, Ran Yi et al.
ResDiff: Combining CNN and Diffusion Model for Image Super-resolution
Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.
Language Prompt for Autonomous Driving
Dongming Wu, Wencheng Han, Yingfei Liu et al.
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao et al.
C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection
Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue, Jing Zhang, Kun Hu et al.
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations
Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
jiexi Liu, Songcan Chen
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention
Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai et al.
Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Yutong Liu, Haijiang Zhu, Mengting Liu et al.
UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation
Kefu Yi, Kai Luo, Xiaolei Luo et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
8976 PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang, Ying Cui, Dongyan Guo et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
Yiding Lu, Yijie Lin, Mouxing Yang et al.
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin, Junlong Du, Qiang Wang et al.
Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation
Shuanghao Bai, Min Zhang, Wanqi Zhou et al.
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection
Yunfan Ye, Yuhang Huang, Renjiao Yi et al.
Point Cloud Mamba: Point Cloud Learning via State Space Model
Tao Zhang, Haobo Yuan, Lu Qi et al.
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang, Zijian Ma, Yunpu Ma et al.
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Yao Lai, Sungyoung Lee, Guojin Chen et al.
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time
Haoran Ye, Jiarui Wang, Helan Liang et al.
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
Baoquan Zhang, Chuyao Luo, Demin Yu et al.
VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool
Chia-Tung Ho, Haoxing Ren, Brucek Khailany
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.
Graph Neural Prompting with Large Language Models
Yijun Tian, Huan Song, Zichen Wang et al.
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang et al.
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng et al.
FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update
Ji Liu, Juncheng Jia, Tianshi Che et al.
Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks
Yingpeng Du, Di Luo, Rui Yan et al.
Temporal Adaptive RGBT Tracking with Modality Prompt
Hongyu Wang, Xiaotao Liu, Yifan Li et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Malyaban Bal, Abhronil Sengupta
SkeletonGait: Gait Recognition Using Skeleton Maps
Chao Fan, Jingzhe Ma, Dongyang Jin et al.
Plug-In Diffusion Model for Sequential Recommendation
Haokai Ma, Ruobing Xie, Lei Meng et al.
DiT4Edit: Diffusion Transformer for Image Editing
Kunyu Feng, Yue Ma, Bingyuan Wang et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye, Peiwen Sun, Jiahe Lei et al.
Learning to Rank in Generative Retrieval
Yongqi Li, Nan Yang, Liang Wang et al.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Junge Zhang, Feihu Zhang, Shaochen Kuang et al.
Augmenting Math Word Problems via Iterative Question Composing
Haoxiong Liu, Yifan Zhang, Yifan Luo et al.
DiffusionTrack: Diffusion Model for Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning
Yiming Huang, Xiao Liu, Yeyun Gong et al.
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
Yakun Song, Zhuo Chen, Xiaofei Wang et al.
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Fei Shen, Hu Ye, Sibo Liu et al.
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li et al.
Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks
Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
Yufei Guo, Yuanpei Chen, Xiaode Liu et al.
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
Yao Xiao, Tingfa Xu, Yu Xin et al.
HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning
Xingtong Yu, Yuan Fang, Zemin Liu et al.
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.
Correlation Matching Transformation Transformers for UHD Image Restoration
Cong Wang, Jinshan Pan, Wei Wang et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.
MASTER: Market-Guided Stock Transformer for Stock Price Forecasting
Tong Li, Zhaoyang Liu, Yanyan Shen et al.
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Editing Language Model
Based Knowledge Graph Embeddings
Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang et al.
PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering
Bingheng Li, Erlin Pan, Zhao Kang
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
8137 Feiyu Zhu, Reid Simmons
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu et al.
Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Yuqi Zhu, Jia Li, Ge Li et al.
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William Cheung et al.
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning
Siheng Xiong, Yuan Yang, Ali Payani et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Chao Pang, Xingxing Weng, Jiang Wu et al.
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking
Shu Yin, Peican Zhu, Lianwei Wu et al.
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Dong Wu, Mingmin Chi, Xuan Zang et al.
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang et al.
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang, Zhuo Zheng, Zihang Chen et al.
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao, Yuxin Mao, Bowen He et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Mengke Li, Zhikai HU, Yang Lu et al.
Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection
Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.
Language Model Can Listen While Speaking
Ziyang Ma, Yakun Song, Chenpeng Du et al.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
hongcheng Guo, Jian Yang, Jiaheng Liu et al.
Image Conductor: Precision Control for Interactive Video Synthesis
Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu, Kai Li, Yue Ma et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
Yan Wang, Zhixuan Chu, Xin Ouyang et al.
Fine-Grained Prototypes Distillation for Few-Shot Object Detection
Zichen Wang, Bo Yang, Haonan Yue et al.
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.
End-to-End Autonomous Driving Through V2X Cooperation
Haibao Yu, Wenxian Yang, Jiaru Zhong et al.
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
Chenrui Zhang, Lin Liu, Chuyuan Wang et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, Bin Ji, Ye Pan
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries
Xinyi He, Mengyu Zhou, Xinrun Xu et al.
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
Zixu Li, Zhiwei Chen, Haokun Wen et al.
Large Language Models Are Neurosymbolic Reasoners
Meng Fang, Shilong Deng, Yudi Zhang et al.
Attribute-Missing Graph Clustering Network
Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen et al.
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang, Qu Yang, Chenxiang Ma et al.
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.
A Diffusion-Based Framework for Multi-Class Anomaly Detection
Haoyang He, Jiangning Zhang, Hongxu Chen et al.
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs
Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh
Controllable Mind Visual Diffusion Model
Bohan Zeng, Shanglin Li, Xuhui Liu et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
StyleSinger: Style Transfer for Out
of-Domain Singing Voice Synthesis
Text-Guided Molecule Generation with Diffusion Language Model
Haisong Gong, Qiang Liu, Shu Wu et al.
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
Multi-Architecture Multi-Expert Diffusion Models
Yunsung Lee, Jin-Young Kim, Hyojun Go et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
Soopil Kim, Sion An, Philip Chikontwe et al.
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Xiantao Hu, Ying Tai, Xu Zhao et al.
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
Rethinking Reverse Distillation for Multi-Modal Anomaly Detection
Zhihao Gu, Jiangning Zhang, Liang Liu et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
Latent Space Editing in Transformer-Based Flow Matching
Vincent Tao Hu, Wei Zhang, Meng Tang et al.
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou et al.
STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Liangcai Su, Junwei Pan, Ximei Wang et al.
Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures
Gehui Xu, Jie Wen, Chengliang Liu et al.
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su, Mingzhao Yang, Bin Li et al.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples
Chen Ma, Ningfei Wang, Qi Alfred Chen et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting
Xiang Ma, Xuemei Li, Lexin Fang et al.
VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding
Guibiao Liao, Jiankun Li, Xiaoqing Ye
DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception
Jiayu Zou, Kun Tian, Zheng Zhu et al.
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization
Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views
Han Huang, Yulun Wu, Junsheng Zhou et al.
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
FedMut: Generalized Federated Learning via Stochastic Mutation
Ming Hu, Cao Yue, Anran Li et al.
Exploiting Label Skews in Federated Learning with Model Concatenation
Yiqun Diao, Qinbin Li, Bingsheng He
6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning
Jiangrong Shen, Wenyao Ni, Qi Xu et al.
Multi-Objective Evolution of Heuristic Using Large Language Model
Shunyu Yao, Fei Liu, Xi Lin et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Yiwen Tang, Ray Zhang, Zoey Guo et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization
Aritra Bhowmick, Mert Kosan, Zexi Huang et al.
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Yi Zhang, Ce Zhang, Ke Yu et al.