Sen Wang

28

Papers

12

Total Citations

Papers (28)

General Scene Adaptation for Vision-and-Language Navigation

Quantifying and Narrowing the Unknown: Interactive Text-to-Video Retrieval via Uncertainty Minimization

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation

Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation

From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning

SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation

NeurIPS 2025arXiv

Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets

MoMask: Generative Masked Modeling of 3D Human Motions

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation

ZSTAD: Zero-Shot Temporal Activity Detection

LiDAR-Aug: A General Rendering-Based Augmentation Framework for 3D Object Detection

Generating Diverse and Natural 3D Human Motions From Text

DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets

Interactive Visual Hull Refinement for Specular and Transparent Object Surface Reconstruction

Detailed Surface Geometry and Albedo Recovery From RGB-D Video Under Natural Illumination

TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts

Semantics Disentangling for Generalized Zero-Shot Learning

EventHPE: Event-Based 3D Human Pose and Shape Estimation

3D Human Shape Reconstruction from a Polarization Image

Object Wake-Up: 3D Object Rigging from a Single Image

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation

M3GYM: A Large-Scale Multimodal Multi-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

Improved Feature Distillation via Projector Ensemble

RVD: A Handheld Device-Based Fundus Video Dataset for Retinal Vessel Segmentation