Audit logging and monitoring overview
How do Microsoft online services employ audit logging?
Microsoft online services employ audit logging to detect unauthorized activities and provide accountability for Microsoft personnel. Audit logs capture details about system configuration changes and access events, with details to identify who was responsible for the activity, when and where the activity took place, and what the outcome of the activity was. Automated log analysis supports near real-time detection of suspicious behavior. Potential incidents are escalated to the appropriate Microsoft security response team for further investigation.
Microsoft online services internal audit logging captures log data from various sources, such as:
- Event logs
- AppLocker logs
- Performance data
- System Center data
- Call detail records
- Quality of experience data
- IIS Web Server logs
- SQL Server logs
- Syslog data
- Security audit logs
How do Microsoft online services centralize and report on audit logs?
Many types of log data are uploaded from Microsoft servers to a proprietary security monitoring solution for near real-time (NRT) analysis and an internal big data computing service (Cosmos) or Azure Data Explorer (Kusto) for long-term storage. This data transfer occurs over a FIPS 140-2 validated TLS connection on approved ports and protocols using automated log management tools.
Logs are processed in NRT using rule-based, statistical, and machine learning methods to detect system performance indicators and potential security events. Machine learning models use incoming log data and historical log data stored in Cosmos or Kusto to continuously improve detection capabilities. Security-related detections generate alerts, notifying on-call engineers of a potential incident and triggering automated remediation actions when applicable. In addition to automated security monitoring, service teams use analysis tools and dashboards for data correlation, interactive queries, and data analytics. These reports are used to monitor and improve the overall performance of the service.
How do Microsoft online services protect audit logs?
The tools used in Microsoft online services to collect and process audit records don’t allow permanent or irreversible changes to the original audit record content or time ordering. Access to Microsoft online service data stored in Cosmos or Kusto is restricted to authorized personnel. In addition, Microsoft restricts the management of audit logs to a limited subset of security team members responsible for audit functionality. Security team personnel don’t have standing administrative access to Cosmos or Kusto. Administrative access requires Just-In-Time (JIT) access approval, and all changes to logging mechanisms for Cosmos are recorded and audited. Audit logs are retained long enough to support incident investigations and meet regulatory requirements. The exact period of audit log data retention determined by the service teams; most audit log data is retained for 90 days in Cosmos and 180 days in Kusto.
How do Microsoft online services protect user personal data that may be captured in audit logs?
Prior to uploading log data, an automated log management application uses a scrubbing service to remove any fields that contain customer data, such as tenant information and user personal data, and replace those fields with a hash value. The anonymized and hashed logs are rewritten and then uploaded into Cosmos. All log transfers occur over a FIPS 140-2 validated TLS encrypted connection.
What is Microsoft's strategy for monitoring security?
Microsoft engages in continuous security monitoring of its systems to detect and respond to threats to Microsoft online services. Our key principles for security monitoring and alerting are:
- Robustness: signals and logic to detect various attack behaviors
- Accuracy: meaningful alerts to avoid distractions from noise
- Speed: ability to catch attackers quickly enough to stop them
Automation, scale, and cloud-based solutions are key pillars of our monitoring and response strategy. For us to effectively prevent attacks at the scale of some of the Microsoft online services, our monitoring systems need to automatically raise highly accurate alerts in near real time. Likewise, when an issue is detected, we need the ability to mitigate the risk at scale, we can't rely on our team to manually fix issues machine-by-machine. To mitigate risks at scale, we use cloud-based tools to automatically apply countermeasures and provide engineers with tools to apply approved mitigation actions quickly across the environment.
How do Microsoft online services perform security monitoring?
Microsoft online services use centralized logging to collect and analyze log events for activities that might indicate a security incident. Centralized logging tools aggregate logs from all system components, including event logs, application logs, access control logs, and network-based intrusion detection systems. In addition to server logging and application-level data, core infrastructure is equipped with customized security agents that generate detailed telemetry and provide host-based intrusion detection. We use this telemetry for monitoring and forensics.
The logging and telemetry data we collect enables 24/7 security alerting. Our alerting system analyzes log data as it gets uploaded, producing alerts in near real time. This includes rules-based alerts and more sophisticated alerting based on machine learning models. Our monitoring logic goes beyond generic attack scenarios and incorporates deep awareness of service architecture and operations. We analyze security monitoring data to continuously improve our models to detect new kinds of attacks and improve the accuracy of our security monitoring.
How do Microsoft online services respond to security monitoring alerts?
When security events that trigger alerts require responsive action or further investigation of forensic evidence throughout the service, our cloud-based tools allow for rapid response throughout the environment. These tools include fully automated, intelligent agents that respond to detected threats with security countermeasures. In many cases, these agents deploy automatic countermeasures to mitigate security detections at scale without human intervention. When this response isn't possible, the security monitoring system automatically alerts the appropriate on-call engineers, who are equipped with a set of tools that enable them to act in real time to mitigate detected threats at scale. Potential incidents are escalated to the appropriate Microsoft security response team and are resolved using the security incident response process.
How do Microsoft online services monitor system availability?
Microsoft actively monitors its systems for indicators of resource over-utilization and abnormal use. Resource monitoring is complemented by service redundancies to help avoid unexpected downtime and provide customers with reliable access to products and services. Microsoft online service health issues are communicated promptly to customers through the Service Health Dashboard (SHD).
Azure and Dynamics 365 online services utilize multiple infrastructure services to monitor their security and health availability. The implementation of Synthetic Transaction (STX) testing allows Azure and Dynamics services to check the availability of their services. The STX framework is designed to support the automated testing of components in running services and is tested on live site failure alerts. Additionally, the Azure Security Monitoring (ASM) service has implemented centralized synthetic testing procedures to verify security alerts function as expected in both new and running services.
Related external regulations & certifications
Microsoft's online services are regularly audited for compliance with external regulations and certifications. Refer to the following table for validation of controls related to audit logging and monitoring.
Azure and Dynamics 365
External audits | Section | Latest report date |
---|---|---|
ISO 27001 Statement of Applicability Certificate |
A.12.1.3: Availability monitoring and capacity planning A.12.4: Logging and monitoring |
April 8, 2024 |
ISO 27017 Statement of Applicability Certificate |
A.12.1.3: Availability monitoring and capacity planning A.12.4: Logging and monitoring A.16.1: Management of information security incidents and improvements |
April 8, 2024 |
ISO 27018 Statement of Applicability Certificate |
A.12.4: Logging and monitoring | April 8, 2024 |
SOC 1 | IM-1: Incident management framework IM-2: Incident detection configuration IM-3: Incident management procedures IM-4: Incident post-mortem VM-1: Security event logging and collection VM-12: Azure services availability monitoring VM-4: Malicious events investigation VM-6: Security vulnerability monitoring |
May 20, 2024 |
SOC 2 SOC 3 |
C5-6: Restricted access to logs IM-1: Incident management framework IM-2: Incident detection configuration IM-3: Incident management procedures IM-4: Incident post-mortem PI-2: Azure portal SLA performance review VM-1: Security event logging and collection VM-12: Azure services availability monitoring VM-4: Malicious events investigation VM-6: Security vulnerability monitoringVM |
May 20, 2024 |
Microsoft 365
External audits | Section | Latest report date |
---|---|---|
FedRAMP (Office 365) | AC-2: Account management AC-17: Remote access AU-2: Audit events AU-3: Content of audit records AU-4: Audit storage capacity AU-5: Response to audit processing failures AU-6: Audit review, analysis, and reporting AU-7: Audit reduction and report generation AU-8: Time stamps AU-9: Protection of audit information AU-10: Nonrepudiation AU-11: Audit record retention AU-12: Audit generation SI-4: Information system monitoring SI-7: Software, firmware, and information integrity |
July 31, 2023 |
ISO 27001/27017 Statement of Applicability Certification (27001) Certification (27017) |
A.12.3: Availability monitoring and capacity planning A.12.4: Logging and monitoring |
March 2024 |
SOC 1 SOC 2 |
CA-19: Change monitoring CA-26: Security incident reporting CA-29: On-call engineers CA-30: Availability monitoring CA-48: Datacenter logging CA-60: Audit logging |
January 23, 2024 |
SOC 3 | CUEC-08: Reporting incidents CUEC-10: Service contracts |
January 23, 2024 |
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for