Most Cited AAAI 2025 "spatial mutual information" Papers
3,028 papers found • Page 1 of 16
Conference
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection
Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.
C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai et al.
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.
Point Cloud Mamba: Point Cloud Learning via State Space Model
Tao Zhang, Haobo Yuan, Lu Qi et al.
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Yao Lai, Sungyoung Lee, Guojin Chen et al.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang, Zijian Ma, Yunpu Ma et al.
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng et al.
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang et al.
VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool
Chia-Tung Ho, Haoxing Ren, Brucek Khailany
DiT4Edit: Diffusion Transformer for Image Editing
Kunyu Feng, Yue Ma, Bingyuan Wang et al.
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye, Peiwen Sun, Jiahe Lei et al.
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
Yakun Song, Zhuo Chen, Xiaofei Wang et al.
Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning
Yiming Huang, Xiao Liu, Yeyun Gong et al.
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Fei Shen, Hu Ye, Sibo Liu et al.
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
Yao Xiao, Tingfa Xu, Yu Xin et al.
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang et al.
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.
Language Model Can Listen While Speaking
Ziyang Ma, Yakun Song, Chenpeng Du et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu, Kai Li, Yue Ma et al.
Image Conductor: Precision Control for Interactive Video Synthesis
Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.
End-to-End Autonomous Driving Through V2X Cooperation
Haibao Yu, Wenxian Yang, Jiaru Zhong et al.
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs
Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Xiantao Hu, Ying Tai, Xu Zhao et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
Multi-Objective Evolution of Heuristic Using Large Language Model
Shunyu Yao, Fei Liu, Xi Lin et al.
SCALM: Detecting Bad Practices in Smart Contracts Through LLMs
Zongwei Li, Xiaoqi Li, Wenkai Li et al.
MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL
Arian Askari, Christian Poelitz, Xinye Tang
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Yuxuan Zhang, Qing Zhang, Yiren Song et al.
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Zhen Yang et al.
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
Derong Xu, Xinhang Li, Ziheng Zhang et al.
Guided Real Image Dehazing Using YCbCr Color Space
Wenxuan Fang, Junkai Fan, Yu Zheng et al.
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He, Jiangning Zhang, Jinlong Peng et al.
Evolutionary Large Language Model for Automated Feature Transformation
Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content
Sheng Wu, Dongxiao He, Xiaobao Wang et al.
Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems
Weibo Gao, Qi Liu, Linan Yue et al.
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng et al.
Graphic Design with Large Multimodal Model
Yutao Cheng, Zhao Zhang, Maoke Yang et al.
When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline
Ming Li, Yongchun Gu, Yi Wang et al.
A Comprehensive Overhaul of Multimodal Assistant with Small Language Models
Minjie Zhu, Yichen Zhu, Ning Liu et al.
Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai et al.
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou, Haote Yang, Dairong Chen et al.
DiffuseHigh: Training-Free Progressive High-Resolution Image Synthesis Through Structure Guidance
Younghyun Kim, Geunmin Hwang, Junyu Zhang et al.
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le et al.
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jian Jia, Yipei Wang, Yan Li et al.
Perception-Guided Jailbreak Against Text-to-Image Models
Yihao Huang, Le Liang, Tianlin Li et al.
Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
Lucio La Cava, Andrea Tagarelli
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing
Xinghe Fu, Zhiyuan Yan, Taiping Yao et al.
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu
Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
Yuzhou Ji, He Zhu, Junshu Tang et al.
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Yunlong Tang, Daiki Shimada, Jing Bi et al.
MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao et al.
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Yun Qu, Yuhang Jiang, Boyuan Wang et al.
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma, Yonglin Deng, Chen Chen et al.
Hierarchical Classification Auxiliary Network for Time Series Forecasting
Yanru Sun, Zongxia Xie, Dongyue Chen et al.
Numerical Pruning for Efficient Autoregressive Models
Xuan Shen, Zhao Song, Yufa Zhou et al.
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
Hongbang Yuan, Zhuoran Jin, Pengfei Cao et al.
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
Soham Deshmukh, Shuo Han, Hazim Bukhari et al.
NightHaze: Nighttime Image Dehazing via Self-Prior Learning
Beibei Lin, Yeying Jin, Yan Wending et al.
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Angela Castillo, Jonas Kohler, Juan C. Pérez et al.
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei et al.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Shuwei Shi, Wenbo Li, Yuechen Zhang et al.
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification
Haojian Huang, Chuanyu Qin, Zhe Liu et al.
FoldToken: Learning Protein Language via Vector Quantization and Beyond
Zhangyang Gao, Cheng Tan, Jue Wang et al.
BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion
Huafeng Li, Dayong Su, Qing Cai et al.
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns
Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di et al.
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
Er Jin, Qihui Feng, Yongli Mou et al.
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, Yasin Yilmaz
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Yuxuan Bian, Ailing Zeng, Xuan Ju et al.
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Jixun Yao, Yang Yuguang, Yu Pan et al.
Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models
Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation
Yanyong Huang, Minghui Lu, Wei Huang et al.
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
Zhenlong Yuan, Cong Liu, Fei Shen et al.
FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency
Han Huang, Yulun Wu, Chao Deng et al.
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection
Junjun Pan, Yixin Liu, Xin Zheng et al.
Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Ziheng Zhou, Jinxing Zhou, Wei Qian et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
The Illusion of Empathy: How AI Chatbots Shape Conversation Perception
Tingting Liu, Salvatore Giorgi, Ankit Aich et al.
Towards Adversarially Robust Dataset Distillation by Curvature Regularization
Eric Xue, Yijiang Li, Haoyang Liu et al.
Design Principle Transfer in Neural Architecture Search via Large Language Models
Xun Zhou, Xingyu Wu, Liang Feng et al.
Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving
Yuhang Lu, Yichen Yao, Jiadong Tu et al.
Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach
Zhiwei Li, Guodong Long, Tianyi Zhou et al.
Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection
Shengjia Chen, Luping Ji, Weiwei Duan et al.
MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning
Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao et al.
A Many-Objective Problem Where Crossover Is Provably Indispensable
Andre Opris
Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing
Xiongfei Su, Siyuan Li, Yuning Cui et al.
Structured Packing in LLM Training Improves Long Context Utilization
Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur et al.
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo, Yu Zhang, Changhao Pan et al.
Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
Lingting Zhu, Guying Lin, Jinnan Chen et al.
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.
Improved Bounds for Online Facility Location with Predictions
Dimitris Fotakis, Evangelia Gergatsouli, Themistoklis Gouleakis et al.
Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang et al.
FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
Speeding Up the NSGA-II with a Simple Tie-Breaking Rule
Benjamin Doerr, Tudor Ivan, Martin S. Krejca
CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
Falcon: Faster and Parallel Inference of Large Language Models Through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao, Weisheng Xie, Yiwei Xiang et al.
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Jiawen Zhu, Huayi Tang, Xin Chen et al.
Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection
Fenfang Tao, Guo-Sen Xie, Fang Zhao et al.
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang, Yunshui Li, Jiaming Li et al.
GraphMoRE: Mitigating Topological Heterogeneity via Mixture of Riemannian Experts
Zihao Guo, Qingyun Sun, Haonan Yuan et al.
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly et al.
NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving
Chengyue Wang, Haicheng Liao, Bonan Wang et al.
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization
Guanghan Li, Xun Zhang, Yufei Zhang et al.
Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model
Hang Zhou, Jiale Cai, Yuteng Ye et al.
Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning
Jian Lang, Zhangtao Cheng, Ting Zhong et al.
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang, Xiangtai Li, Henghui Ding et al.
Federated Unlearning with Gradient Descent and Conflict Mitigation
Zibin Pan, Zhichao Wang, Chi Li et al.
TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts
Yu-Hao Huang, Chang Xu, Yueying Wu et al.
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
Zijian Gu, Jianwei Ma, Yan Huang et al.
Large Language Model Meets Graph Neural Network in Knowledge Distillation
Shengxiang Hu, Guobing Zou, Song Yang et al.
Low-Light Image Enhancement via Generative Perceptual Priors
Han Zhou, Wei Dong, Xiaohong Liu et al.
Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering
Yifan Lu, Yigeng Zhou, Jing Li et al.
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
Xiaohu Du, Fan Mo, Ming Wen et al.
Video Diffusion Models Are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin et al.
Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning
Junlin He, Tong Nie, Wei Ma
LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Tu Ao, Yanhua Yu, Yuling Wang et al.
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
Zekang Yang, Wang Zeng, Sheng Jin et al.
Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach
Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.
Local Conditional Controlling for Text-to-Image Diffusion Models
Yibo Zhao, Liang Peng, Yang Yang et al.
DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Zhenlong Yuan, Jinguo Luo, Fei Shen et al.
BotSim: LLM-Powered Malicious Social Botnet Simulation
Boyu Qiao, Kun Li, Wei Zhou et al.
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
Qianxiong Xu, Cheng Long, Ziyue Li et al.
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang et al.
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Pengcheng Zhao, Jinxing Zhou, Yang Zhao et al.
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert
Dapeng Zhang, Dayu Chen, Peng Zhi et al.
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
Yaming Yang, Dilxat Muhtar, Yelong Shen et al.
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su, Qiuxia Wu, Panpan Cai et al.
Sum of Squares Circuits
Lorenzo Loconte, Stefan Mengel, Antonio Vergari
On the Relationship Between Monotone and Squared Probabilistic Circuits
Benjie Wang, Guy Van den Broeck
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Gen Zhan, Li Yang et al.
Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels
Ruitao Pu, Yuan Sun, Yang Qin et al.
Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation
Zhenxin Lei, Man Yao, Jiakui Hu et al.
HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers
Yifeng Wang, Yi Zhao
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Shengeng Tang, Jiayi He, Dan Guo et al.
Exploring More from Multiple Gait Modalities for Human Identification
Dongyang Jin, Chao Fan, Weihua Chen et al.
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis
Xinlei Huang, Zhiqi Ma, Dian Meng et al.
AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning
Sudarshan Rajagopalan, Vishal M. Patel
OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving
Tianyi Yan, Junbo Yin, Xianpeng Lang et al.
CL-Attack: Textual Backdoor Attacks via Cross-Lingual Triggers
Jingyi Zheng, Tianyi Hu, Tianshuo Cong et al.
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, XinQiang Yu, Han Yang et al.
Debiased All-in-one Image Restoration with Task Uncertainty Regularization
Gang Wu, Junjun Jiang, Yijun Wang et al.
GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
Jinqing Zhang, Yanan Zhang, Yunlong Qi et al.
Yuan: Yielding Unblemished Aesthetics Through a Unified Network for Visual Imperfections Removal in Generated Images
Zhenyu Yu, Chee Seng Chan
UniMuMo: Unified Text, Music, and Motion Generation
Han Yang, Kun Su, Yutong Zhang et al.
Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Lihu Chen, Adam Dejl, Francesca Toni
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
Changwei Wang, Shunpeng Chen, Yukun Song et al.
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification
Chenyang Yu, Xuehu Liu, Jiawen Zhu et al.
SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
Hongyu Yan, Zijun Li, Kunming Luo et al.
Citations and Trust in LLM Generated Responses
Yifan Ding, Matthew Facciani, Ellen Joyce et al.
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions
Matan Levi, Yair Allouche, Daniel Ohayon et al.
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
Zhixuan Shen, Haonan Luo, Kexun Chen et al.
CoRA: Collaborative Information Perception by Large Language Model’s Weights for Recommendation
Yuting Liu, Jinghao Zhang, Yizhou Dang et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.