Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
This research discovers that benign fine-tuning can cause language models to lose their ability to respect contextual privacy whilst maintaining strong performance on standard safety benchmarks. The study shows that diverse training data characteristics, from emotional dialogue to debugging code, degrade privacy reasoning in a "silent failure" that current evaluations fail to detect.