AIML-05

Have you limited access to your ML training data to only staff with an explicit business need?

Explanation

This question is asking whether your organization restricts access to machine learning (ML) training data to only those employees who have a legitimate business reason to access it. Training data for ML models often contains sensitive information that could include personal data, proprietary business information, or other confidential content. This question is being asked because unrestricted access to this data creates security and privacy risks. If too many people can access training data, it increases the risk of data breaches, unauthorized use, or data leakage. The security assessment is trying to determine if you have proper access controls in place that follow the principle of least privilege - giving people only the access they absolutely need to do their jobs. This is a fundamental security practice that reduces your attack surface and limits potential damage from insider threats or compromised accounts. To best answer this question, you should describe: 1. How access to ML training data is controlled (technical controls) 2. The process for granting access (administrative controls) 3. How you determine who has a legitimate business need 4. How you audit or review these access permissions over time

Guidance

Looking for limited access to training data.

Example Responses

Example Response 1

Yes, we strictly limit access to ML training data using role-based access controls (RBAC) implemented through our data lake security policies Only data scientists, ML engineers, and specific data governance personnel have access to training datasets Access requests require manager approval and business justification documentation through our access management system We conduct quarterly access reviews to verify that only authorized personnel maintain access, and we immediately revoke access when team members change roles or leave the organization All access to training data is logged and monitored for unusual patterns.

Example Response 2

Yes, our ML training data is stored in a segregated environment with restricted access We implement a multi-layered security approach where training data access requires both membership in authorized security groups and just-in-time privileged access management Before gaining access, staff must submit a formal request detailing their specific business need, which must be approved by both the data owner and security team Our system automatically tracks and logs all access to training data, and we perform monthly entitlement reviews to ensure access remains appropriate Additionally, we use data loss prevention tools to prevent unauthorized exfiltration of training data.

Example Response 3

No, we currently don't have formal restrictions on who can access our ML training data within our development team All of our developers, data scientists, and QA engineers (approximately 35 people) have access to the training datasets to facilitate collaboration and speed up development cycles We recognize this is a gap in our security controls and are working to implement a more restrictive access model Our planned improvements include implementing role-based access controls, developing a formal access request process, and conducting regular access reviews We expect to have these controls in place within the next quarter.

Context

Tab: AI
Category: AI Machine Learning