5 min read
Research New research demonstrates that backdoor behaviours introduced into LLMs during fine-tuning can persist through subsequent safety alignment procedures, including RLHF and adversarial training, posing significant supply chain risks.