Most Cited AAAI Poster "moral uncertainty" Papers
5,317 papers found • Page 1 of 27
Conference
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
Chong Mou, Xintao Wang, Liangbin Xie et al.
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta, Nils Blach, Ales Kubicek et al.
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
ExpeL: LLM Agents Are Experiential Learners
Andrew Zhao, Daniel Huang, Quentin Xu et al.
U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Chenxin Li, Xinyu Liu, Wuyang Li et al.
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
MemoryBank: Enhancing Large Language Models with Long-Term Memory
Wanjun Zhong, Lianghong Guo, Qiqi Gao et al.
Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos
Yue Ma, Yingqing HE, Xiaodong Cun et al.
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Gengze Zhou, Yicong Hong, Qi Wu
MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer
Junde Wu, Wei Ji, Huazhu Fu et al.
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
Knowledge Graph Prompting for Multi-Document Question Answering
Yu Wang, Nedim Lipka, Ryan A. Rossi et al.
Omni-Kernel Network for Image Restoration
Yuning Cui, Wenqi Ren, Alois Knoll
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Xiaohuan Pei, Tao Huang, Chang Xu
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
PMET: Precise Model Editing in a Transformer
Xiaopeng Li, Shasha Li, Shezheng Song et al.
MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting
Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.
Generalized Planning in PDDL Domains with Pretrained Large Language Models
Tom Silver, Soham Dan, Kavitha Srinivas et al.
Fast Machine Unlearning without Retraining through Selective Synaptic Dampening
Jack Foster, Stefan Schoepf, Alexandra Brintrup
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen et al.
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Guosheng Zhao, Xiaofeng Wang, Zheng Zhu et al.
Segment Any 3D Gaussians
Jiazhong Cen, Jiemin Fang, Chen Yang et al.
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
Teng Hu, Jiangning Zhang, Ran Yi et al.
Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
Mingzhan Yang, Guangxin Han, Bin Yan et al.
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery
Konstantin Klemmer, Esther Rolf, Caleb Robinson et al.
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.
ResDiff: Combining CNN and Diffusion Model for Image Super-resolution
Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.
Language Prompt for Autonomous Driving
Dongming Wu, Wencheng Han, Yingfei Liu et al.
OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On
Yuhao Xu, Tao Gu, Weifeng Chen et al.
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin, Haoxuan Che, Yi Lin et al.
C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness
Yu Kang, Xianghui Sun, Liangyu Chen et al.
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao et al.
SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection
Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Ceyao Zhang, Kaijie Yang, Siyi Hu et al.
LGMRec: Local and Global Graph Learning for Multimodal Recommendation
Zhiqiang Guo, Jianjun Li, Guohui Li et al.
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang, Jiaming Liu, Renrui Zhang et al.
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue, Jing Zhang, Kun Hu et al.
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao et al.
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Raphael Schumann, Wanrong Zhu, Weixi Feng et al.
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations
Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.
IMAGDressing-v1: Customizable Virtual Dressing
Fei Shen, Xin Jiang, Xin He et al.
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai et al.
TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
jiexi Liu, Songcan Chen
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis
Caoyun Fan, Jindou Chen, Yaohui Jin et al.
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention
Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.
UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation
Kefu Yi, Kai Luo, Xiaolei Luo et al.
TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment
Chenxi Liu, Qianxiong Xu, Hao Miao et al.
Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Yutong Liu, Haijiang Zhu, Mengting Liu et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
LDMVFI: Video Frame Interpolation with Latent Diffusion Models
Duolikun Danier, Fan Zhang, David Bull
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
Explicit Visual Prompts for Visual Object Tracking
Liangtao Shi, Bineng Zhong, Qihua Liang et al.
CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning
Peiyuan Liu, Hang Guo, Tao Dai et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
8976 PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang, Ying Cui, Dongyan Guo et al.
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
Yiding Lu, Yijie Lin, Mouxing Yang et al.
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin, Mingbao Lin, Luxi Lin et al.
MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Yi Xin, Junlong Du, Qiang Wang et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Yao Lai, Sungyoung Lee, Guojin Chen et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly
Yuhan Li, Yishun Dou, Yue Shi et al.
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin, Junlong Du, Qiang Wang et al.
Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation
Shuanghao Bai, Min Zhang, Wanqi Zhou et al.
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time
Haoran Ye, Jiarui Wang, Helan Liang et al.
Point Cloud Mamba: Point Cloud Learning via State Space Model
Tao Zhang, Haobo Yuan, Lu Qi et al.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang, Zijian Ma, Yunpu Ma et al.
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Wan-Duo Ma, Avisek Lahiri, J. P. Lewis et al.
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.
Enhance Vision-Language Alignment with Noise
Sida Huang, Hongyuan Zhang, Xuelong Li
AVSegFormer: Audio-Visual Segmentation with Transformer
Shengyi Gao, Zhe Chen, Guo Chen et al.
Exposing the Deception: Uncovering More Forgery Clues for Deepfake Detection
Zhongjie Ba, Qingyu Liu, Zhenguang Liu et al.
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng et al.
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection
Yunfan Ye, Yuhang Huang, Renjiao Yi et al.
MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA
Lang Yu, Qin Chen, Jie Zhou et al.
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model
Zeyu Wang, Chen Li, Huiying Xu et al.
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
Baoquan Zhang, Chuyao Luo, Demin Yu et al.
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.
EcomGPT: Instruction-Tuning Large Language Models with Chain-of-Task Tasks for E-commerce
Li Yangning, Shirong Ma, Xiaobin Wang et al.
SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking
Wang Yu Hsiang, Jun-Wei Hsieh, Ping-Yang Chen et al.
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool
Chia-Tung Ho, Haoxing Ren, Brucek Khailany
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang et al.
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye et al.
Graph Neural Prompting with Large Language Models
Yijun Tian, Huan Song, Zichen Wang et al.
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang et al.
Temporal Adaptive RGBT Tracking with Modality Prompt
Hongyu Wang, Xiaotao Liu, Yifan Li et al.
FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update
Ji Liu, Juncheng Jia, Tianshi Che et al.
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye, Peiwen Sun, Jiahe Lei et al.
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Malyaban Bal, Abhronil Sengupta
Teaching Large Language Models to Translate with Comparison
Jiali Zeng, Fandong Meng, Yongjing Yin et al.
DiT4Edit: Diffusion Transformer for Image Editing
Kunyu Feng, Yue Ma, Bingyuan Wang et al.
Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks
Yingpeng Du, Di Luo, Rui Yan et al.
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv, Itai Gat, Sagie Benaim et al.
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
Zhengliang Shi, Shen Gao, Minghang Zhu et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
SkeletonGait: Gait Recognition Using Skeleton Maps
Chao Fan, Jingzhe Ma, Dongyang Jin et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting
Qihe Huang, Lei Shen, Ruixin Zhang et al.
Plug-In Diffusion Model for Sequential Recommendation
Haokai Ma, Ruobing Xie, Lei Meng et al.
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun, Mingyao Zhou, Wenjing Chen et al.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Junge Zhang, Feihu Zhang, Shaochen Kuang et al.
Learning to Rank in Generative Retrieval
Yongqi Li, Nan Yang, Liang Wang et al.
Generating Images of Rare Concepts Using Pre-trained Diffusion Models
Dvir Samuel, Rami Ben-Ari, Simon Raviv et al.
Generative-Based Fusion Mechanism for Multi-Modal Tracking
Zhangyong Tang, Tianyang Xu, Xiaojun Wu et al.
Augmenting Math Word Problems via Iterative Question Composing
Haoxiong Liu, Yifan Zhang, Yifan Luo et al.
Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
Learning Content-Enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation
Qi Bi, Shaodi You, Theo Gevers
BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving
Haicheng Liao, Zhenning Li, Huanming Shen et al.
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
Yakun Song, Zhuo Chen, Xiaofei Wang et al.
DiffusionTrack: Diffusion Model for Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma et al.
XCOT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning
Linzheng Chai, Jian Yang, Tao Sun et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning
Yiming Huang, Xiao Liu, Yeyun Gong et al.
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye, Qiong Wu, Wenhao Lin et al.
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Fei Shen, Hu Ye, Sibo Liu et al.
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
Yao Xiao, Tingfa Xu, Yu Xin et al.
Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks
Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.
Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang et al.
HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning
Xingtong Yu, Yuan Fang, Zemin Liu et al.
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li et al.
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
Chuangchuang Tan, Renshuai Tao, Huan Liu et al.
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao, Ziwei Huang, Leilei Gan et al.
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
Yufei Guo, Yuanpei Chen, Xiaode Liu et al.
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
Wenwen Zhuang, Xin Huang, Xiantao Zhang et al.
MASTER: Market-Guided Stock Transformer for Stock Price Forecasting
Tong Li, Zhaoyang Liu, Yanyan Shen et al.
Unlocking the Power of LSTM for Long Term Time Series Forecasting
Yaxuan Kong, Zepu Wang, Yuqi Nie et al.
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Correlation Matching Transformation Transformers for UHD Image Restoration
Cong Wang, Jinshan Pan, Wei Wang et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
FFT-Based Dynamic Token Mixer for Vision
Yuki Tatsunami, Masato Taki
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Junxian Li, Di Zhang, Xunzhi Wang et al.
Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Yuqi Zhu, Jia Li, Ge Li et al.
Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum
Fan Xu, Nan Wang, Hao Wu et al.
Editing Language Model
Based Knowledge Graph Embeddings
PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering
Bingheng Li, Erlin Pan, Zhao Kang
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Yongxin Guo, Jingyu Liu, Mingda Li et al.
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
8137 Feiyu Zhu, Reid Simmons
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William Cheung et al.
TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning
Siheng Xiong, Yuan Yang, Ali Payani et al.
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
Wencheng Han, Dongqian Guo, Cheng-Zhong Xu et al.
FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation
Qinglun Zhang, Zhen Liu, Haoqiang Fan et al.
SwitchTab: Switched Autoencoders Are Effective Tabular Learners
Jing Wu, Suiyao Chen, Qi Zhao et al.
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Tao Wu, Yong Zhang, Xintao Wang et al.
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models
Zhongxi Chen, Ke Sun, Xianming Lin
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy, Rami Ben-Ari, Nir Darshan et al.
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Li Li, Wei Ji, Yiming Wu et al.
Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects
Jian Hu, Jiayi Lin, Shaogang Gong et al.
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Chao Pang, Xingxing Weng, Jiang Wu et al.
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
Visual Instruction Tuning with Polite Flamingo
Delong Chen, Jianfeng Liu, Wenliang Dai et al.
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-resolution
Zhengxue Wang, Zhiqiang Yan, Jian Yang
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen, Peiyan Dong, Lei Lu et al.
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang, Zhuo Zheng, Zihang Chen et al.
Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers
Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos et al.
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Dong Wu, Mingmin Chi, Xuan Zang et al.
GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking
Shu Yin, Peican Zhu, Lianwei Wu et al.
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu, Yunjie Tian, Qixiang Ye et al.
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang, Shikun Li, Pengju Wang et al.
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya et al.
Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning
Longchao Da, Minquan Gao, Hua Wei et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
Understanding the Role of the Projector in Knowledge Distillation
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption
Ziteng Cui, Lin Gu, Xiao Sun et al.
SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
hongcheng Guo, Jian Yang, Jiaheng Liu et al.
High-Order Structure Based Middle-Feature Learning for Visible-Infrared Person Re-identification
Liuxiang Qiu, Si Chen, Yan Yan et al.
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection
Li Xiang, Junbo Yin, Wei Li et al.
Language Model Can Listen While Speaking
Ziyang Ma, Yakun Song, Chenpeng Du et al.
TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers
Chuanrui Zhang, Yingshuang Zou, Zhuoling Li et al.
Strong Baselines for Parameter-Efficient Few-Shot Fine-Tuning
Samyadeep Basu, Shell Hu, Daniela Massiceti et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.