Jie Tang

16

Papers

1,836

Total Citations

Papers (16)

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

LVBench: An Extreme Long Video Understanding Benchmark

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Bilateral Propagation Network for Depth Completion

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

Sketch and Refine: Towards Fast and Accurate Lane Detection

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

TriSampler: A Better Negative Sampling Principle for Dense Retrieval

Small Language Model Makes an Effective Long Text Extractor

Towards Efficient Exact Optimization of Language Model Alignment

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

CogAgent: A Visual Language Model for GUI Agents

AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning