Most Cited AAAI "u-net architecture" Papers
5,317 papers found • Page 2 of 27
Conference
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
Lingjun Zhang, Xinyuan Chen, Yaohui Wang et al.
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
Hui LIN, Zhiheng Ma, Xiaopeng Hong et al.
ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank
Zhanjie Zhang, Quanwei Zhang, Wei Xing et al.
Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Mengke Li, Zhikai HU, Yang Lu et al.
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao, Yuxin Mao, Bowen He et al.
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Yuyang Ye, Zhi Zheng, Yishan Shen et al.
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
Jiaqi Chen, Bingqian Lin, Xinmin Liu et al.
DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity
Yeying Jin, Wenhan Yang, W. Ye et al.
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Mañas, Benno Krojer, Aishwarya Agrawal
Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection
Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.
DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis
Pan Wang, Qiang Zhou, Yawen Wu et al.
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye, Guang Liu, Xinya Wu et al.
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Image Conductor: Precision Control for Interactive Video Synthesis
Yaowei Li, Xintao Wang, Zhaoyang Zhang et al.
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.
End-to-End Autonomous Driving Through V2X Cooperation
Haibao Yu, Wenxian Yang, Jiaru Zhong et al.
HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs
Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Chenyang Zhu, Kai Li, Yue Ma et al.
DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning
Huiping Zhuang, Run He, Kai Tong et al.
HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection
Zican Shi, Jing Hu, Jie Ren et al.
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang et al.
Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning
Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.
Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen et al.
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, Bin Ji, Ye Pan
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries
Xinyi He, Mengyu Zhou, Xinrun Xu et al.
Fine-Grained Prototypes Distillation for Few-Shot Object Detection
Zichen Wang, Bo Yang, Haonan Yue et al.
Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt
Jiaqi Liu, Kai Wu, Qiang Nie et al.
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
Yan Wang, Zhixuan Chu, Xin Ouyang et al.
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
XiaoHui Zhang, Jiangyan Yi, Chenglong Wang et al.
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
Chenrui Zhang, Lin Liu, Chuyuan Wang et al.
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
Yu Yang, Jianbiao Mei, Yukai Ma et al.
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.
Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization
Yongyi Su, Xun Xu, Kui Jia
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He, Siming Fu, Mushui Liu et al.
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang, Qu Yang, Chenxiang Ma et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Transformer Layers as Painters
Qi Sun, Marc Pickett, Aakash Kumar Nain et al.
Object-Aware Domain Generalization for Object Detection
WooJu Lee, Dasol Hong, Hyungtae Lim et al.
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang, Xinyang Jiang, De Cheng et al.
Fine-Grained Distillation for Long Document Retrieval
Yucheng Zhou, Tao Shen, Xiubo Geng et al.
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Xiantao Hu, Ying Tai, Xu Zhao et al.
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Qingping Zheng, Yuanfan Guo, Jiankang Deng et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer
Fei Wang, Dan Guo, Kun Li et al.
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
Zixu Li, Zhiwei Chen, Haokun Wen et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
A Diffusion-Based Framework for Multi-Class Anomaly Detection
Haoyang He, Jiangning Zhang, Hongxu Chen et al.
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
Attribute-Missing Graph Clustering Network
Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
Large Language Models Are Neurosymbolic Reasoners
Meng Fang, Shilong Deng, Yudi Zhang et al.
Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models
Liang Li, Qingyuan Li, Bo Zhang et al.
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Taolin Zhang, Sunan He, Tao Dai et al.
How to Protect Copyright Data in Optimization of Large Language Models?
Timothy Chu, Zhao Song, Chiwun Yang
Text-Guided Molecule Generation with Diffusion Language Model
Haisong Gong, Qiang Liu, Shu Wu et al.
Rethinking Propagation for Unsupervised Graph Domain Adaptation
Meihan Liu, Zeyu Fang, Zhen Zhang et al.
StyleSinger: Style Transfer for Out
of-Domain Singing Voice Synthesis
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu, Dongfang Li, Zihao Zheng et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
Soopil Kim, Sion An, Philip Chikontwe et al.
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li, Zhewei Yao, Xiaoxia Wu et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
Controllable Mind Visual Diffusion Model
Bohan Zeng, Shanglin Li, Xuhui Liu et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Liangcai Su, Junwei Pan, Ximei Wang et al.
Approximating the Shapley Value without Marginal Contributions
Patrick Kolpaczki, Viktor Bengs, Maximilian Muschalik et al.
Multi-Objective Evolution of Heuristic Using Large Language Model
Shunyu Yao, Fei Liu, Xi Lin et al.
SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples
Chen Ma, Ningfei Wang, Qi Alfred Chen et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
Latent Space Editing in Transformer-Based Flow Matching
Vincent Tao Hu, Wei Zhang, Meng Tang et al.
Multi-Architecture Multi-Expert Diffusion Models
Yunsung Lee, Jin-Young Kim, Hyojun Go et al.
MCL-NER: Cross-Lingual Named Entity Recognition via Multi-View Contrastive Learning
Authors: Ying Mo, Jian Yang, Jiahao Liu et al.
Rethinking Reverse Distillation for Multi-Modal Anomaly Detection
Zhihao Gu, Jiangning Zhang, Liang Liu et al.
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou et al.
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting
Xiang Ma, Xuemei Li, Lexin Fang et al.
GFlow: Recovering 4D World from Monocular Video
Shizun Wang, Xingyi Yang, Qiuhong Shen et al.
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su, Mingzhao Yang, Bin Li et al.
Exploiting Label Skews in Federated Learning with Model Concatenation
Yiqun Diao, Qinbin Li, Bingsheng He
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
SAM-PARSER: Fine-Tuning SAM Efficiently by Parameter Space Reconstruction
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures
Gehui Xu, Jie Wen, Chengliang Liu et al.
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
Robust Node Classification on Graph Data with Graph and Label Noise
Yonghua Zhu, Lei Feng, Zhenyun Deng et al.
Parallel Vertex Diffusion for Unified Visual Grounding
Authors: Zesen Cheng, Kehan Li, Peng Jin et al.
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement
Dehuan Zhang, Jingchun Zhou, Chunle Guo et al.
When Model Meets New Normals: Test-Time Adaptation for Unsupervised Time-Series Anomaly Detection
MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL
Arian Askari, Christian Poelitz, Xinye Tang
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.
VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding
Guibiao Liao, Jiankun Li, Xiaoqing Ye
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi, Suyu Ye, Xinyu Fang et al.
DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception
Jiayu Zou, Kun Tian, Zheng Zhu et al.
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views
Han Huang, Yulun Wu, Junsheng Zhou et al.
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
Jingchun Zhou, Zongxin He, Kin-Man Lam et al.
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney et al.
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Xinwei Long, Jiali Zeng, Fandong Meng et al.
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization
Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.
Improving Retrieval Augmented Language Model with Self-Reasoning
Yuan Xia, Jingbo Zhou, Zhenhui Shi et al.
SCALM: Detecting Bad Practices in Smart Contracts Through LLMs
Zongwei Li, Xiaoqi Li, Wenkai Li et al.
FedMut: Generalized Federated Learning via Stochastic Mutation
Ming Hu, Cao Yue, Anran Li et al.
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
Hailang Huang, Zhijie Nie, Ziqiao Wang et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting
Yifan Hu, Peiyuan Liu, Peng Zhu et al.
LLM-Powered User Simulator for Recommender System
Zijian Zhang, Shuchang Liu, Ziru Liu et al.
Training-Free Quantum Architecture Search
Zhimin He, Maijie Deng, Shenggen Zheng et al.
Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection
Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
Qianrui Zhou, Hua Xu, Hao Li et al.
Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling
Shujuan Li, Junsheng Zhou, Baorui Ma et al.
6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning
Jiangrong Shen, Wenyao Ni, Qi Xu et al.
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
Derong Xu, Xinhang Li, Ziheng Zhang et al.
DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization
Aritra Bhowmick, Mert Kosan, Zexi Huang et al.
GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion
Le Cheng, Peican Zhu, Keke Tang et al.
Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
En Yu, Jie Lu, Bin Zhang et al.
Probabilities of Causation with Nonbinary Treatment and Effect
Ang Li, Judea Pearl
Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Yi Zhang, Ce Zhang, Ke Yu et al.
Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
Senqiao Yang, Jiarui Wu, Jiaming Liu et al.
Towards Effective and General Graph Unlearning via Mutual Evolution
Xunkai Li, Yulin Zhao, Zhengyu Wu et al.
Hierarchical Multi-Marginal Optimal Transport for Network Alignment
Zhichen Zeng, Boxin Du, Si Zhang et al.
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Yan Cai, Linlin Wang, Ye Wang et al.
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction
Urchade Zaratiana, Nadi Tomeh, Pierre Holat et al.
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Yiwen Tang, Ray Zhang, Zoey Guo et al.
Graph Invariant Learning with Subgraph Co-mixup for Out-of-Distribution Generalization
Tianrui Jia, Haoyang Li, Cheng Yang et al.
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Yuxuan Zhang, Qing Zhang, Yiren Song et al.
FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization
Cheng Yang, Jixi Liu, Yunhe Yan et al.
Provably Powerful Graph Neural Networks for Directed Multigraphs
Beni Egressy, Luc von Niederhäusern, Jovan Blanuša et al.
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
Geon Lee, Wenchao Yu, Kijung Shin et al.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles
Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer et al.
Explaining Generalization Power of a DNN Using Interactive Concepts
Huilin Zhou, Hao Zhang, Huiqi Deng et al.
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu, Yuhang Ma, Zhen Yang et al.
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling
Xiaobao Wu, Fengjun Pan, Thong Nguyen et al.
TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection
Tianxiang Chen, Zhentao Tan, Qi Chu et al.
Frequency-Adaptive Pan-Sharpening with Mixture of Experts
Xuanhua He, Keyu Yan, Rui Li et al.
Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning
Mengmeng Sheng, Zeren Sun, Zhenhuang Cai et al.
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Barys Liskavets, Maxim Ushakov, Shuvendu Roy et al.
Relevant Intrinsic Feature Enhancement Network for Few-Shot Semantic Segmentation
Xiaoyi Bao, Jie Qin, Siyang Sun et al.
Guided Real Image Dehazing Using YCbCr Color Space
Wenxuan Fang, Junkai Fan, Yu Zheng et al.
Urban Region Embedding via Multi-View Contrastive Prediction
Zechen Li, Weiming Huang, Kai Zhao et al.
Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye et al.
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang, Siming Fu, Jinlong Liu et al.
Fair Text-to-Image Diffusion via Fair Mapping
Jia Li, Lijie Hu, Jingfeng Zhang et al.
Graph-Aware Contrasting for Multivariate Time-Series Classification
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
Pan Xie, Qipeng Zhang, Peng Taiying et al.
Audio Generation with Multiple Conditional Diffusion Model
Zhifang Guo, Jianguo Mao, Tao Rui et al.
Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed
Yubin Xiao, Di Wang, Boyang Li et al.
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang, Wentao Yang, Songxuan Lai et al.
Rethinking Graph Masked Autoencoders through Alignment and Uniformity
Liang Wang, Xiang Tao, Qiang Liu et al.
Root Cause Analysis in Microservice Using Neural Granger Causal Discovery
Cheng-Ming Lin, Ching Chang, Wei-Yao Wang et al.
QAGait: Revisit Gait Recognition from a Quality Perspective
Zengbin Wang, Saihui Hou, Man Zhang et al.
A Non-parametric Graph Clustering Framework for Multi-View Data
Shengju Yu, Siwei Wang, Zhibin Dong et al.
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He, Jiangning Zhang, Jinlong Peng et al.
Region-Disentangled Diffusion Model for High-Fidelity PPG-to-ECG Translation
Debaditya Shome, Pritam Sarkar, Ali Etemad
Adaptive Hardness Negative Sampling for Collaborative Filtering
Riwei Lai, Rui Chen, Qilong Han et al.
Shrinking Your TimeStep: Towards Low-Latency Neuromorphic Object Recognition with Spiking Neural Networks
Yongqi Ding, Lin Zuo, Mengmeng Jing et al.
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation
Haiming Zhang, Xu Yan, Dongfeng Bai et al.
Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization
Yanan Wu, Zhixiang Chi, Yang Wang et al.
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem, Hyunseok Seo
ACPBench: Reasoning About Action, Change, and Planning
Harsha Kokel, Michael Katz, Kavitha Srinivas et al.
Evolutionary Large Language Model for Automated Feature Transformation
Nanxu Gong, Chandan K Reddy, Wangyang Ying et al.
CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image Segmentation
Shoukun Sun, Min Xian, Fei Xu et al.
Domain-Controlled Prompt Learning
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Learning Generalized Medical Image Segmentation from Decoupled Feature Queries
1207 Qi Bi, Jingjun Yi, Hao Zheng et al.
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
Yuchen Tian, Weixiang Yan, Qian Yang et al.
Deep Contrastive Graph Learning with Clustering-Oriented Guidance
Mulin Chen, Bocheng Wang, Xuelong Li
Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization
Xuekang Zhu, Xiaochen Ma, Lei Su et al.
ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement
Mengqi Lei, Haochen Wu, Xinhua Lv et al.
G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks
Anchun Gui, Jinqiang Ye, Han Xiao
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao et al.
Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification
Zhiwei Zhao, Bin Liu, Yan Lu et al.
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jian Jia, Yipei Wang, Yan Li et al.
Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
Qingyuan Cai, Xuecai Hu, Saihui Hou et al.
TopoGCL: Topological Graph Contrastive Learning
Yuzhou Chen, Jose Frias, Yulia Gel
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai et al.
CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers
Yi Rong, Haoran Zhou, Lixin Yuan et al.