Most Cited AAAI 2024 "post-training llms" Papers
2,289 papers found • Page 1 of 12
Conference
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
Chong Mou, Xintao Wang, Liangbin Xie et al.
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta, Nils Blach, Ales Kubicek et al.
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
ExpeL: LLM Agents Are Experiential Learners
Andrew Zhao, Daniel Huang, Quentin Xu et al.
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
MemoryBank: Enhancing Large Language Models with Long-Term Memory
Wanjun Zhong, Lianghong Guo, Qiqi Gao et al.
Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos
Yue Ma, Yingqing HE, Xiaodong Cun et al.
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Gengze Zhou, Yicong Hong, Qi Wu
MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer
Junde Wu, Wei Ji, Huazhu Fu et al.
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Knowledge Graph Prompting for Multi-Document Question Answering
Yu Wang, Nedim Lipka, Ryan A. Rossi et al.
Omni-Kernel Network for Image Restoration
Yuning Cui, Wenqi Ren, Alois Knoll
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
PMET: Precise Model Editing in a Transformer
Xiaopeng Li, Shasha Li, Shezheng Song et al.
MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting
Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.
Generalized Planning in PDDL Domains with Pretrained Large Language Models
Tom Silver, Soham Dan, Kavitha Srinivas et al.
Fast Machine Unlearning without Retraining through Selective Synaptic Dampening
Jack Foster, Stefan Schoepf, Alexandra Brintrup
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
Teng Hu, Jiangning Zhang, Ran Yi et al.
Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
Mingzhan Yang, Guangxin Han, Bin Yan et al.
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.
ResDiff: Combining CNN and Diffusion Model for Image Super-resolution
Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin, Haoxuan Che, Yi Lin et al.
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao et al.
SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
LGMRec: Local and Global Graph Learning for Multimodal Recommendation
Zhiqiang Guo, Jianjun Li, Guohui Li et al.
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Ceyao Zhang, Kaijie Yang, Siyi Hu et al.
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue, Jing Zhang, Kun Hu et al.
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Raphael Schumann, Wanrong Zhu, Weixi Feng et al.
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations
Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
jiexi Liu, Songcan Chen
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis
Caoyun Fan, Jindou Chen, Yaohui Jin et al.
An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention
Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.
UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation
Kefu Yi, Kai Luo, Xiaolei Luo et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Yutong Liu, Haijiang Zhu, Mengting Liu et al.
LDMVFI: Video Frame Interpolation with Latent Diffusion Models
Duolikun Danier, Fan Zhang, David Bull
Explicit Visual Prompts for Visual Object Tracking
Liangtao Shi, Bineng Zhong, Qihua Liang et al.
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
Yiding Lu, Yijie Lin, Mouxing Yang et al.
8976 PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang, Ying Cui, Dongyan Guo et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Yi Xin, Junlong Du, Qiang Wang et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly
Yuhan Li, Yishun Dou, Yue Shi et al.
Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation
Shuanghao Bai, Min Zhang, Wanqi Zhou et al.
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin, Junlong Du, Qiang Wang et al.
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time
Haoran Ye, Jiarui Wang, Helan Liang et al.
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Wan-Duo Ma, Avisek Lahiri, J. P. Lewis et al.
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection
Zhongjie Ba, Qingyu Liu, Zhenguang Liu et al.
AVSegFormer: Audio-Visual Segmentation with Transformer
Shengyi Gao, Zhe Chen, Guo Chen et al.
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection
Yunfan Ye, Yuhang Huang, Renjiao Yi et al.
MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA
Lang Yu, Qin Chen, Jie Zhou et al.
EcomGPT: Instruction-Tuning Large Language Models with Chain-of-Task Tasks for E-commerce
Li Yangning, Shirong Ma, Xiaobin Wang et al.
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
Baoquan Zhang, Chuyao Luo, Demin Yu et al.
SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking
Wang Yu Hsiang, Jun-Wei Hsieh, Ping-Yang Chen et al.
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
Graph Neural Prompting with Large Language Models
Yijun Tian, Huan Song, Zichen Wang et al.
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye et al.
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang et al.
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.
FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update
Ji Liu, Juncheng Jia, Tianshi Che et al.
Temporal Adaptive RGBT Tracking with Modality Prompt
Hongyu Wang, Xiaotao Liu, Yifan Li et al.
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Malyaban Bal, Abhronil Sengupta
Teaching Large Language Models to Translate with Comparison
Jiali Zeng, Fandong Meng, Yongjing Yin et al.
Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks
Yingpeng Du, Di Luo, Rui Yan et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
Zhengliang Shi, Shen Gao, Minghang Zhu et al.
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv, Itai Gat, Sagie Benaim et al.
SkeletonGait: Gait Recognition Using Skeleton Maps
Chao Fan, Jingzhe Ma, Dongyang Jin et al.
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun, Mingyao Zhou, Wenjing Chen et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting
Qihe Huang, Lei Shen, Ruixin Zhang et al.
Plug-In Diffusion Model for Sequential Recommendation
Haokai Ma, Ruobing Xie, Lei Meng et al.
Learning Content-Enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation
Qi Bi, Shaodi You, Theo Gevers
Learning to Rank in Generative Retrieval
Yongqi Li, Nan Yang, Liang Wang et al.
Generative-Based Fusion Mechanism for Multi-Modal Tracking
Zhangyong Tang, Tianyang Xu, Xiaojun Wu et al.
Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
Generating Images of Rare Concepts Using Pre-trained Diffusion Models
Dvir Samuel, Rami Ben-Ari, Simon Raviv et al.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Junge Zhang, Feihu Zhang, Shaochen Kuang et al.
BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving
Haicheng Liao, Zhenning Li, Huanming Shen et al.
DiffusionTrack: Diffusion Model for Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks
Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.
Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang et al.
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li et al.
HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning
Xingtong Yu, Yuan Fang, Zemin Liu et al.
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
Yufei Guo, Yuanpei Chen, Xiaode Liu et al.
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
MASTER: Market-Guided Stock Transformer for Stock Price Forecasting
Tong Li, Zhaoyang Liu, Yanyan Shen et al.
Correlation Matching Transformation Transformers for UHD Image Restoration
Cong Wang, Jinshan Pan, Wei Wang et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
FFT-Based Dynamic Token Mixer for Vision
Yuki Tatsunami, Masato Taki
Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum
Fan Xu, Nan Wang, Hao Wu et al.
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu et al.
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Yuqi Zhu, Jia Li, Ge Li et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.
PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering
Bingheng Li, Erlin Pan, Zhao Kang
Editing Language Model
Based Knowledge Graph Embeddings
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
8137 Feiyu Zhu, Reid Simmons
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William Cheung et al.
SwitchTab: Switched Autoencoders Are Effective Tabular Learners
Jing Wu, Suiyao Chen, Qi Zhao et al.
TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning
Siheng Xiong, Yuan Yang, Ali Payani et al.
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models
Zhongxi Chen, Ke Sun, Xianming Lin
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy, Rami Ben-Ari, Nir Darshan et al.
Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects
Jian Hu, Jiayi Lin, Shaogang Gong et al.
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Li Li, Wei Ji, Yiming Wu et al.
Visual Instruction Tuning with Polite Flamingo
Delong Chen, Jianfeng Liu, Wenliang Dai et al.
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Dong Wu, Mingmin Chi, Xuan Zang et al.
SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-resolution
Zhengxue Wang, Zhiqiang Yan, Jian Yang
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers
Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos et al.
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang, Zhuo Zheng, Zihang Chen et al.
GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking
Shu Yin, Peican Zhu, Lianwei Wu et al.
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen, Peiyan Dong, Lei Lu et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption
Ziteng Cui, Lin Gu, Xiao Sun et al.
Understanding the Role of the Projector in Knowledge Distillation
Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning
Longchao Da, Minquan Gao, Hua Wei et al.
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu, Yunjie Tian, Qixiang Ye et al.
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang, Shikun Li, Pengju Wang et al.
High-Order Structure Based Middle-Feature Learning for Visible-Infrared Person Re-identification
Liuxiang Qiu, Si Chen, Yan Yan et al.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
hongcheng Guo, Jian Yang, Jiaheng Liu et al.
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection
Li Xiang, Junbo Yin, Wei Li et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
Lingjun Zhang, Xinyuan Chen, Yaohui Wang et al.
Strong Baselines for Parameter-Efficient Few-Shot Fine-Tuning
Samyadeep Basu, Shell Hu, Daniela Massiceti et al.
CUTS+: High-Dimensional Causal Discovery from Irregular Time-Series
Yuxiao Cheng, Lianglong Li, Tingxiong Xiao et al.
ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank
Zhanjie Zhang, Quanwei Zhang, Wei Xing et al.
Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Mengke Li, Zhikai HU, Yang Lu et al.
Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
Hui LIN, Zhiheng Ma, Xiaopeng Hong et al.
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao, Yuxin Mao, Bowen He et al.
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Mañas, Benno Krojer, Aishwarya Agrawal
Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection
Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.
DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity
Yeying Jin, Wenhan Yang, W. Ye et al.
Unifying Visual and Vision-Language Tracking via Contrastive Learning
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye, Guang Liu, Xinya Wu et al.
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang et al.
DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning
Huiping Zhuang, Run He, Kai Tong et al.
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.
Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning
Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.
Fine-Grained Prototypes Distillation for Few-Shot Object Detection
Zichen Wang, Bo Yang, Haonan Yue et al.
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
Yan Wang, Zhixuan Chu, Xin Ouyang et al.
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries
Xinyi He, Mengyu Zhou, Xinrun Xu et al.
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, Bin Ji, Ye Pan
Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt
Jiaqi Liu, Kai Wu, Qiang Nie et al.
Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization
Yongyi Su, Xun Xu, Kui Jia
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
XiaoHui Zhang, Jiangyan Yi, Chenglong Wang et al.
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang, Qu Yang, Chenxiang Ma et al.
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
Chenrui Zhang, Lin Liu, Chuyuan Wang et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.
Fine-Grained Distillation for Long Document Retrieval
Yucheng Zhou, Tao Shen, Xiubo Geng et al.
Object-Aware Domain Generalization for Object Detection
WooJu Lee, Dasol Hong, Hyungtae Lim et al.
EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer
Fei Wang, Dan Guo, Kun Li et al.
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang, Xinyang Jiang, De Cheng et al.
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Qingping Zheng, Yuanfan Guo, Jiankang Deng et al.
Large Language Models Are Neurosymbolic Reasoners
Meng Fang, Shilong Deng, Yudi Zhang et al.
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
A Diffusion-Based Framework for Multi-Class Anomaly Detection
Haoyang He, Jiangning Zhang, Hongxu Chen et al.
Attribute-Missing Graph Clustering Network
Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.
Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models
Liang Li, Qingyuan Li, Bo Zhang et al.
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Taolin Zhang, Sunan He, Tao Dai et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu, Dongfang Li, Zihao Zheng et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li, Zhewei Yao, Xiaoxia Wu et al.
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
Soopil Kim, Sion An, Philip Chikontwe et al.
How to Protect Copyright Data in Optimization of Large Language Models?
Timothy Chu, Zhao Song, Chiwun Yang
Rethinking Propagation for Unsupervised Graph Domain Adaptation
Meihan Liu, Zeyu Fang, Zhen Zhang et al.
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.