On Functional Safety in the Age of Continuous Deployment

This week I hosted a workshop on continuous deployment of software subject to functional safety standards. We agreed to keep it low profile on who participated, but several of the large companies in automotive, aeronautics, industry and defense were present, including OEMs and tier 1 suppliers. It was a good group that was dominated by functional safety experts with a smaller number of agile experts sprinkled in between for good measure.

From a business perspective. It is obvious why we want continuous deployment. It’s all about shortening the feedback loop between the customer and the company providing the product. By measuring how the product performs in the field and how customers use the product, we can use that data to improve the functionality in the product. With new software deployed every few weeks, the improvements, though small in every sprint, lead to a large cumulative effect. With continuous deployment coming to basically every connected product, customers of course increasingly expect their product to get better over time. And, purely practically, the cost of product recalls disappears if we can fix the issue with a software update that is distributed, basically for free, to products in the field. So, we better make use of that capability and opportunity.

At first sight, the two topics of continuous deployment and functional safety seem to be at complete odds with each other. Agile is all about small, incremental steps on a sprint based iteration. Functional safety is about testing the heck out of the complete system, collecting all the necessary evidence that the system is safe, getting it certified by an external assessor and then preferably never touching the product again as every change requires redoing the safety certification.

During the workshop, however, it became clear that there are concrete ways to integrate functional safety into agile development practices. Continuous, or at least sprint-based releases of software, can be accomplished even if it requires functional safety assessment and certification for every sprint. However, work is needed in several areas. For the companies building these systems changes are needed in system architecture, development processes and automation. For assessor, the way assessment and certification is performed needs to change, but we leave this out of the scope of this article.

The system architecture of software intensive systems tends to be highly interconnected with numerous dependencies throughout the system. The reason for this is twofold. First, traditionally the primary driver for these systems was the bill of materials (BOM). The belief was that any small saving we can achieve in the BOM justifies any R&D investment that we need to make to accomplish it. And especially for products produced at high volume this was true for many years. However, in many companies the cost of software R&D is becoming so high that it tends to be on par with the BOM cost and consequently, we need to make more intelligent decisions. Second, for some reason, system, mechanics and hardware architects seem to have put managing complexity at a much lower priority than software architects.

Going forward the current approach to systems architecting and engineering is no longer feasible. If we want to deploy software continuously throughout the lifetime of the product, we need to put excess resources in the system at the time of design. Otherwise we can’t even provide the first upgrade. Second, we need to modularize our architecture so that safety related functionality can be assigned to independent subsystems allowing safety assessment to be simplified.

The development process, especially the sprint activities, is the second area that needs extension. Work on functional safety, such as hazard analysis and evidence that shows that known hazards have been addressed appropriately, needs to take place every sprint and need to be added to the backlog. The real change is, however, even more fundamental: the work on functionality safety needs to satisfy two characteristics, i.e. it needs to be iterative and it needs to be cumulative. This requires that for each item on the backlog, iterative hazard analysis takes place that identifies new hazards created by this item (if any) and the potential implications on the already identified hazards. In addition, the activities required for functional safety need to be cumulative in that we shouldn’t be required to redo all the unaffected work, but only address the affected areas.

Finally, one area that is undoubtedly required is automation. The current functional safety certification process requires vast amounts of documents that today, to a large extent, are manually created. It is clear that for a sprint based deployment and certification approach to be possible, manual document creation is unfeasible and we need to automate the generation of these documents. However, the focus should be on creating a database where the data required for these documents is stored. The documents can then be generated whenever necessary.

Concluding, also software that is subject to functional safety standards can be deployed continuously and we’ve started to work our way towards making this a reality. During the workshop, we agreed to organize a follow up workshop later this year. If you are interested to learn more and potentially join, please send me an email on jan@janbosch.com.