
The distinction between development and operations has been identified as obsolete in many companies that have adopted continuous deployment. The term DevOps was coined to refer to teams that are responsible both for development and operations.
Although DevOps sounds really easy in theory, it’s surprisingly multi-faceted in practice. Since software started to be included in embedded systems products, the development organization has always had a role in operations as problems that were identified in the field and couldn’t be solved by customer support and operations were escalated to R&D. In that sense, DevOps is less of a binary issue but rather a gradual shift in responsibility between teams.
The challenge most organizations deal with in this regard is scale. When we have several teams, we need to organize them such that we gain the development efficiency we seek by adding more teams. Fundamentally, there are two ways of organizing teams: in component teams and feature teams.
The principle of component teams assumes that the architecture of the system is used to structure teams, meaning that each team is responsible for one or a group of components. When there’s an issue after a deployment of the system, many assume that it’s simply a matter of pointing to the team that caused the issue, but in practice, it often isn’t that easy. Many issues that are found after deployment tend to be caused by interactions between different components and teams. Consequently, figuring out which team should fix it is often not trivial. So, as a minimum, there needs to be a ‘triaging’ team that determines the root cause of the issue and appoints a team to solve it.
In the case of feature teams, each team can make changes in any component involved in the realization of the feature the team is currently working on. Consequently, we frequently have multiple teams touching the same component during a sprint, which can further complicate the identification of the root cause of issues found in the field. Of course, we expect a continuous integration and test system to be in place, but the issues that slip through to the field are the ones most intricate to identify and fix.
Furthermore, many assume the main focus is defects where certain features don’t work or work incorrectly. In practice, however, many systems are surprisingly large and complex and customers often track operational KPIs indicating the performance of the product or system. When the system is working as expected, but some of the KPIs have shifted compared to the previous release, customers will often report issues and ask the company to figure out the root cause. This is of course especially the case when a KPI gets worse, but not always. For instance, in some cases, a higher system performance causes bottleneck issues downstream.
Finding the root cause of KPI shifts of a system in the field between releases can be surprisingly involved and time-consuming. This task can’t be assigned to a component team as this team only has in-depth knowledge of the components it’s responsible for.
Organizing around the operations and issue management, therefore, requires some decisions. One alternative in case of feature teams is to appoint one or a few teams for issue management for some time and have the teams rotate through this task. For instance, a feature team may manage issues for one quarter before being replaced by another team. Although this may not be the most popular activity, it will give teams a much deeper understanding of the system in operation and the differences between different customer deployments, which often is advantageous.
With component teams, there will be a need for a separate team to do root cause analysis on the issues reported from the field. This can be organized as a separate full-time team or as a virtual team where representatives from all or most component teams are appointed for a part of their time. In the case of a separate, full-time team, the difference between traditional and DevOps operations is of course virtually non-existent.
When adopting DevOps, it’s key to incorporate all relevant skills into teams, meaning that these teams tend to become more cross-disciplinary than traditional development teams. For instance, domain experts, customer support, product management, infrastructure and other skills may be required to operate effectively. In the end, intra-team coordination is orders of magnitude more efficient than cross-team coordination. DevOps practices require merging of development and operations into a unified process and often cross-disciplinary teams. In practice, the challenge is when organizations are large and consist of many teams. In that case, dealing with issues that slipped through to the field can prove surprisingly challenging both in identifying the root cause and in determining the best way to resolve the problem. Consequently, close collaboration is required, not only between development and operations but also between teams. In the end, to use a quote by Jez Humble, DevOps isn’t a goal but a never-ending process of continual improvement.
Want to read more like this? Sign up for my newsletter at jan@janbosch.com or follow me on janbosch.com/blog, LinkedIn (linkedin.com/in/janbosch) or X (@JanBosch).