Bin Wang

26

Papers

601

Total Citations

Papers (26)

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

ToolACE: Winning the Points of LLM Function Calling

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

ROSE: Remove Objects with Side Effects in Videos

A New Dataset and Framework for Real-World Blurred Images Super-Resolution

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

LLM4RSR: Large Language Models as Data Correctors for Robust Sequential Recommendation

Towards Ship License Plate Recognition in the Wild: A Large Benchmark and Strong Baseline

Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology

Distributed Bilevel Optimization with Communication Compression

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Chimera: Improving Generalist Model with Domain-Specific Experts

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Hybrid Layout Control for Diffusion Transformer: Fewer Annotations, Superior Aesthetics

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Spatiotemporal-aware Trend-Seasonality Decomposition Network for Traffic Flow Forecasting

Reverse Distribution Based Video Moment Retrieval for Effective Bias Elimination

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities

W2P: Switching from Weak Supervision to Partial Supervision for Semantic Segmentation

Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection