Efficient Differentially Private Fine-Tuning with QLoRA and Prefix Tuning for Large Language Models

Zhouyi Tan; Xi Xiong; Dong Xu

doi:10.54097/we271q84

Authors

Zhouyi Tan
Xi Xiong
Dong Xu

DOI:

https://doi.org/10.54097/we271q84

Keywords:

QLoRA, Prefix Tuning, Differential Privacy, Fine-Tuning, Large Language Models

Abstract

Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) tasks. However, fine-tuning LLMs using private datasets raises significant privacy concerns, as models can inadvertently memorize sensitive information. Differentially Private Stochastic Gradient Descent (DP-SGD) provides a mathematically rigorous solution but suffers from high computational overhead, slow convergence, and excessive privacy budget consumption, making it impractical for large-scale models. To address these challenges, we propose an efficient differentially private fine-tuning method that combines Quantized Low-Rank Adaptation (QLoRA) and Prefix Tuning. QLoRA employs 4-bit NormalFloat quantization and low-rank adaptation, significantly reducing memory consumption and improving computational efficiency. Prefix Tuning optimizes a small set of prefix vectors without modifying the model’s main parameters, further reducing the impact of DP noise. Additionally, we introduce a hybrid adaptive gradient clipping strategy, which applies sample-wise adaptive clipping for Prefix Tuning and group-wise clipping for QLoRA, effectively balancing privacy protection and model utility. We evaluate our approach on GPT-2 using benchmark datasets including E2E NLG Challenge, XSum, SST-2, and DART, measuring performance using BLEU, ROUGE, and F1-score. Results demonstrate that QLoRA + Prefix Tuning achieves up to 75% memory reduction while maintaining over 95% of the original model performance under a moderate privacy budget (ε=3), outperforming traditional DP fine-tuning methods. Our work provides a practical and scalable solution for privacy-preserving LLM fine-tuning in resource-constrained environments.

Downloads

Download data is not yet available.

References

[1] T. B. Brown, B. Mann, N. Ryder, et al., "Language models are few-shot learners," Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020.

[2] N. Papernot, M. Abadi, Ú. Erlingsson, I. Goodfellow, and K. Talwar, "Semi-supervised knowledge transfer for deep learning from private training data," Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 2017.

[3] R. Shokri and V. Shmatikov, "Privacy-preserving deep learning," Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS), Denver, CO, USA, 2015, pp. 1310–1321.

[4] L. Zhu and Z. Liu, "Deep leakage from gradients," Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.

[5] C. Dwork, "Differential privacy," Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), Venice, Italy, Jul. 2006, pp. 1–12.

[6] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS), Vienna, Austria, 2016, pp. 308–318.

[7] H. Zhu, X. Zhang, and L. Liu, "Efficient differential privacy in NLP: Trade-offs between privacy, efficiency, and performance," Transactions of the Association for Computational Linguistics (TACL), vol. 10, pp. 328–345, 2022.

[8] J. Xie, A. K. Ghosh, Y. Liu, and T. Goldstein, "DP-Friendly Transformer Adaptation," Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), 2023.

[9] E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen, "LoRA: Low-rank adaptation of large language models," Proceedings of the 40th International Conference on Machine Learning (ICML), 2021.

[10] E. Dettmers, M. Lewis, L. Belkada, and Y. Zettlemoyer, "QLoRA: Efficient fine-tuning of quantized LLMs," Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

[11] X. Li and P. Liang, "Prefix tuning: Optimizing continuous prompts for generation," Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021, pp. 4582–4597.

[12] D. Zhang, S. Ji, and X. Wang, "Differentially private gradient descent with adaptive clipping," Advances in Neural Information Processing Systems (NeurIPS), 2020.

[13] A. Yousefpour, M. Shokri, D. Evans, N. Papernot, and A. Mandal, "Opacus: User-friendly differential privacy library in PyTorch," Advances in Neural Information Processing Systems (NeurIPS), 2021.

[14] Z. Yu, A. Tripathy, and C. Song, "Differentially private fine-tuning of language models," Proceedings of the 12th International Conference on Learning Representations (ICLR), 2022.

[15] B. Chien, J. Clark, and C. Raffel, "Efficient self-supervised learning with LoRA," Advances in Neural Information Processing Systems (NeurIPS), 2022.

[16] Y. Sun, S. Xu, and J. Wang, "Efficient differentially private fine-tuning with gradient clipping strategies," Proceedings of the 12th AAAI Conference on Artificial Intelligence (AAAI), 2022.

[17] S. Wang, A. Lin, J. Hilton, and O. Thakkar, "Benchmarking differentially private fine-tuning of large language models," Proceedings of the 12th International Conference on Learning Representations (ICLR), 2023.

[18] H. Yu, S. Guo, and Y. Zhou, "LoRA-based differentially private NLP fine-tuning," Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.

[19] R. Pinsker and D. Rothschild, "Optimizing fine-tuning for differentially private NLP," Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

[20] M. Carlini, C. Liu, J. Kos, and P. Song, "The secret sharer: Measuring unintended neural network memorization & extracting secrets," Advances in Neural Information Processing Systems (NeurIPS), 2019.