Is your ML training data monitored and audited?
Explanation
Guidance
Looking for how you reduce the risk of compromising training data.
Example Responses
Example Response 1
Yes, our ML training data is comprehensively monitored and audited through multiple layers of controls We maintain a data provenance system that tracks the origin, transformations, and usage of all training datasets Access to training data is restricted through role-based access controls and all access events are logged in our SIEM system We employ automated data quality monitoring tools that continuously scan for anomalies, drift, or potential poisoning attempts Our Data Governance team conducts quarterly audits of all training datasets, reviewing access logs, transformation history, and data quality metrics Additionally, we use cryptographic hashing to verify data integrity throughout the ML pipeline and maintain immutable audit logs of all data modifications These processes are documented in our ML Data Governance Policy, which is reviewed annually and following any security incidents.
Example Response 2
Yes, we have implemented a robust monitoring and auditing framework for our ML training data Our approach includes: 1) A centralized data catalog that maintains metadata about all training datasets including source, ownership, sensitivity classification, and usage history; 2) Automated data lineage tracking that records all transformations applied to datasets; 3) Continuous monitoring through our DataGuard platform that alerts on unusual access patterns or unexpected modifications to training data; 4) Monthly automated data quality assessments that check for drift, outliers, and potential poisoning; and 5) Bi-annual formal audits conducted by our internal audit team in collaboration with ML engineers All training data access requires multi-factor authentication, and privileged operations (like deletion or bulk modification) require approval through our change management system We maintain these logs for a minimum of 18 months to support forensic analysis if needed.
Example Response 3
We currently do not have a formal monitoring and auditing system specifically for ML training data Our data scientists maintain their own datasets and are responsible for ensuring data quality While we do have general system access logs that would capture who accessed data storage systems, we don't have specialized tools for tracking ML data lineage or monitoring for potential data poisoning attempts Our development team is planning to implement a data governance framework in the next quarter that will include monitoring and auditing capabilities for ML training data, but this is still in the planning phase In the interim, we mitigate risks through strict access controls to our data storage systems and by conducting manual reviews of datasets before they are used for training production models.
Context
- Tab
- AI
- Category
- AI Machine Learning

