Ruoxi Jia

10

Papers

20

Total Citations

Papers (10)

LLMs Can Plan Only If We Tell Them

Detecting Adversarial Data Using Perturbation Forgery

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Position: A Safe Harbor for AI Evaluation and Red Teaming

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Probing Hidden Knowledge Holes in Unlearned LLMs