Sanjiv Kumar
11
Papers
247
Total Citations
Papers (11)
Think before you speak: Training Language Models With Pause Tokens
ICLR 2024
187
citations
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
ICLR 2024
42
citations
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
ICLR 2025
15
citations
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
NeurIPS 2025
2
citations
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
NeurIPS 2025
1
citations
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
CVPR 2024
0
citations
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
CVPR 2024
0
citations
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
ICML 2024
0
citations
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
ICML 2024
0
citations
USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval
ICML 2024
0
citations
Tandem Transformers for Inference Efficient LLMs
ICML 2024
0
citations