Tai Wang

16

Papers

278

Total Citations

Papers (16)

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Monocular 3D Object Detection with Depth from Motion

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

Scene as Occupancy