CrowdStrike Overhauls Testing and Rollout Procedures to Avoid System Crashes

3 weeks ago 6
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

CrowdStrike says it has revamped several testing, validation, and update rollout processes to prevent a repeat of the embarrassing July outage that caused widespread disruption on Windows systems around the world.

In testimony before the House Subcommittee on Cybersecurity, CrowdStrike vice president Adam Meyers outlined a new set of protocols that include carefully controlled rollouts of software updates, better validation of code inputs, and new testing procedures to cover a broader array of problematic scenarios.

“Our threat detection configuration information, known as Rapid Response Content, is now released gradually across increasing rings of deployment. This allows us to monitor for issues in a controlled environment and proactively roll back changes if problems are detected before affecting a wider population,” Meyers said.

Back in July 2024, a routine content update to CrowdStrike’s flagship Falcon platform led to sensor malfunctions across numerous Windows systems.  In his testimony, Meyers explained that a sensor configuration update triggered a logic error that blue-screened critical computer systems around the world.

In response, Meyers said CrowdStrike has introduced new validation checks to help ensure that the number of inputs expected by the sensor and its predefined rules match the same number of threat detection configurations provided. 

“This is designed to prevent similar mismatches from occurring in the future,” he stressed.

Meyers told the hearing that CrowdStrike software engineers have enhanced existing testing procedures to cover a broader array of scenarios, including testing all input fields under various conditions to detect potential flaws before rapidly-released threat detection configuration information is sent to the sensor.

CrowdStrike has also made tweaks to provide customers with additional controls over the deployment of configuration updates to their systems, Meyers said.

Advertisement. Scroll to continue reading.

He said the company has added additional runtime checks to the system to ensure that the data provided matches the system’s expectations before any processing occurs. This extra layer is meant to reduce the likelihood of future code mismatches causing catastrophic system failures.

The July outage has also led to plans by Microsoft to redesign the way anti-malware products interact with the Windows kernel in direct response to the global IT outage in July that was caused by a faulty CrowdStrike update. 

Technical details on the changes are not yet available, but Microsoft is promising “new platform capabilities” in Windows 11 to allow security vendors to operate “outside of kernel mode” in the interest of software reliability.  

“[We] explored new platform capabilities Microsoft plans to make available in Windows, building on the security investments we have made in Windows 11. Windows 11’s improved security posture and security defaults enable the platform to provide more security capabilities to solution providers outside of kernel mode,” Weston said in a note following a summit with EDR vendors.

Related: CrowdStrike Dismisses Claims of Exploitability in Falcon Sensor Bug

Related: CrowdStrike Releases Root Cause Analysis of Falcon Sensor BSOD Crash

Related: Microsoft Says 8.5 Million Windows Devices Impacted by CrowdStrike Incident 

Related: CrowdStrike Says Logic Error Caused Windows BSOD Chaos

Related: Bad CrowdStrike Update Linked to Major IT Outages Worldwide

Read Entire Article