Tat-Seng Chua

47

Papers

578

Total Citations

Papers (47)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Towards 3D Molecule-Text Interpretation in Language Models

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Language Representations Can be What Recommenders Need: Findings and Potentials

GOODAT: Towards Test-Time Graph Out-of-Distribution Detection

Temporally and Distributionally Robust Optimization for Cold-Start Recommendation

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

NeurIPS 2025arXiv

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Optimize Incompatible Parameters Through Compatibility-aware Knowledge Integration

Neural Causal Graph for Interpretable and Intervenable Classification

Learning Image and User Features for Recommendation in Social Networks

Discovering Spatio-Temporal Rationales for Video Question Answering

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

Visual Relation Grounding in Videos

Fine-Grained Scene Graph Generation with Data Transfer

Video Graph Transformer for Video Question Answering

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

Auto-Encoding Morph-Tokens for Multimodal LLM

NExT-GPT: Any-to-Any Multimodal LLM

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

NExT-Chat: An LMM for Chat, Detection and Segmentation

Online Collaborative Learning for Open-Vocabulary Visual Classifiers

Visual Translation Embedding Network for Visual Relation Detection

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

Meta-Transfer Learning for Few-Shot Learning

Hyperbolic Visual Embedding Learning for Zero-Shot Recognition

SESS: Self-Ensembling Semi-Supervised 3D Object Detection

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions

Few-Shot 3D Point Cloud Semantic Segmentation

Invariant Grounding for Video Question Answering

Learning to Self-Train for Semi-Supervised Few-Shot Classification

Neural Sparse Voxel Fields

Towards Multi-Grained Explainability for Graph Neural Networks

Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss

VPGTrans: Transfer Visual Prompt Generator across LLMs

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion