Yi Liu

Google Scholar OpenReview

43

Papers

3,949

Total Citations

10

h-index

Papers (43)

DETRs Beat YOLOs on Real-time Object Detection

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

RRM: Robust Reward Model Training Mitigates Reward Hacking

On the Role of Attention Heads in Large Language Model Safety

SUTrack: Towards Simple and Unified Single Object Tracking

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Temporal Reasoning Transfer from Text to Video

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video

NeurIPS 2025arXiv

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

NeurIPS 2025arXiv

Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation

NeurIPS 2025arXiv

DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding

NeurIPS 2025arXiv

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

NeurIPS 2025arXiv

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

NeurIPS 2025arXiv

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

NeurIPS 2025arXiv

Sampled Estimators For Softmax Must Be Biased

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

NeurIPS 2025arXiv

Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

VisualLens: Personalization through Task-Agnostic Visual History

NeurIPS 2025arXiv

Controllable Protein Sequence Generation with LLM Preference Optimization

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

NeurIPS 2025arXiv

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

NeurIPS 2025arXiv

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chains of Diffusion Models

1066 Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

CAMixerSR: Only Details Need More "Attention"

Tactile-Augmented Radiance Fields

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

Tuning-free Estimation and Inference of Cumulative Distribution Function under Local Differential Privacy

Multi-Source Conformal Inference Under Distribution Shift