A Brief Guide to Data Privacy and GenAI Chatbots

Undergraduate

Instructor

As we integrate AI tools such as OpenAI’s ChatGPT and Anthropic’s Claude into our teaching and research, it’s crucial to understand how these platforms handle the data we provide. This is particularly important concerning our intellectual property (IP) and compliance with FERPA regulations on student privacy.

Author

Keefe Reuther

Published

October 16, 2024

Key Points to Consider

Data Usage for Model Training
- OpenAI’s ChatGPT:
  - By default, any content you input—including prompts, file uploads, and feedback—may be used to improve their services, including training AI models like ChatGPT. They don’t use content from business offerings such as ChatGPT Team, ChatGPT Enterprise, and their API Platform to train models.
- Anthropic’s Claude:
  - Anthropic will not use your inputs or outputs to train their models unless:
    - Your conversations are flagged for Trust & Safety review.
    - You explicitly report materials via feedback mechanisms.
    - You have explicitly opted in to the use of your inputs and outputs for training purposes.
Data Retention Policies
- OpenAI retains records of your interactions for up to 30 days, even if you opt out, to monitor for misuse or abuse before deleting them.
- Anthropic allows you to delete individual conversations, which will be removed immediately from your conversation history and automatically deleted from their back-end systems within 30 days.
Legal Disclosures and Third-Party Access
- Both OpenAI and Anthropic may disclose your content to comply with legal obligations, meaning your data could potentially be accessed by third parties under certain circumstances.
- Data may also be shared with affiliates and service providers who process data on their behalf.
Best Practices for Protecting IP and Student Privacy
- De-identify Sensitive Information: Always remove or anonymize any personal, student-related, or proprietary information before inputting it into AI tools.
- Avoid Sharing Protected Data: Do not input any data that could violate FERPA regulations or disclose confidential research.
- Stay Informed: Regularly review the privacy policies of these AI services to stay updated on how your data is handled.
- Institutional Guidelines: Ensure compliance with your institution’s policies regarding the use of AI tools and data privacy.

Actionable Steps

Opt Out of Data Usage for Training:
- For ChatGPT, follow the opt-out instructions below.
- For Claude, refrain from opting in to data usage for training and avoid submitting feedback that could be used for this purpose.
Delete Unnecessary Conversations:
- Regularly delete any conversations that may contain sensitive information to minimize data retention risks.
Exercise Caution with Input Content:
- Be mindful of the information you share in AI chat windows. When in doubt, rephrase or generalize the content to avoid including identifiable information.
Consider Going Local:
- While it takes a bit of technical training, you can run many high quality LLM models locally. If interested, Hugging Face is the place to go for Open Source models. Also check out Ollama for a more user friendly interface.

Additional Resources

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{reuther2024,
  author = {Reuther, Keefe},
  title = {A {Brief} {Guide} to {Data} {Privacy} and {GenAI} {Chatbots}},
  date = {2024-10-16},
  url = {https://reutherlab.netlify.app/docs/blog/posts/aiprivacypolicy/},
  langid = {en}
}

For attribution, please cite this work as:

Reuther, Keefe. 2024. “A Brief Guide to Data Privacy and GenAI Chatbots.” October 16, 2024. https://reutherlab.netlify.app/docs/blog/posts/aiprivacypolicy/.