From Agile to Radical: experiment

Image by Gerd Altmann from Pixabay

One of the worst misconceptions in software engineering is the assumption that if we build software based on a requirement specification, test it according to the spec and deliver it to our customers, we’ve delivered value to these customers. This may be the case when a small team of consultants develops software for a single, competent customer, but when things scale in terms of the number of customers, teams and features, it rapidly becomes much less clear what constitutes value.

Our research, as well as research by others, shows that roughly half to two-thirds of all new features are never or hardly ever used. Consequently, the R&D effort invested in building these is a complete waste. How do we end up in a situation where half or more of all new features are simply waste? What happens in companies that causes work to be prioritized in such a way that we do the wrong things?

This was the core of the [product management post series](https://bits-chips.nl/article/strategic-digital-product-management/) where we explored this challenge as well as solution approaches. However, in my experience, at heart, the main issue is that the so-called experts in the company or at customers simply have an incorrect mental model concerning the impact of the activities they’re advocating. Assuming good intent, which I think is reasonable in most contexts, they push for R&D efforts that they think will have the best possible impact on the outcomes. The problem is that these assumptions are mostly, or at least partially, wrong.

There are at least three reasons why our beliefs about the impact of R&D efforts are incorrect: fuzzy definitions, lack of feedback loops and politics. First, most companies I work with, when asked what value their offering provides to customers, respond with rather imprecise descriptions. These descriptions tend to fall into the “worthwhile many versus vital few” trap in that the aspects offered up by the people I talk to are relevant but not always the highest-priority “vital few” factors. In addition, the descriptions tend to be qualitative in nature and not very precise in definition. For instance, one company I work with referred to “customer confidence,” meaning the customer wouldn’t have to worry about the problem their offering claimed to solve. Finally, there often is no willingness to trade off between various factors or aspects. For instance, how much “customer confidence” are we willing to sacrifice to gain market share? Or how are price and quality related to each other?

Second, although we tend to make predictions about the impact of new functionality, sometimes very precise ones, we seldom go back afterward to determine whether they were indeed accurate. Instead, we’re busy pitching the next set of features we want to see built and pitching for these to be prioritized. Without a feedback loop, there’s no learning. Consequently, we’re stuck in our flawed world model and never get corrected.

Finally, the third reason is that companies are made up of humans and humans constantly maneuver a social grid where most people in our network need to get what they ask for at least occasionally or periodically to make sure that they don’t become our enemies or at least detractors. That means that in many contexts, I’ve noticed that ideas, concepts and features get prioritized even though almost everyone knows that they won’t have any impact whatsoever and are, in fact, pure R&D waste. The work is only prioritized to placate certain individuals. As long as the game is played based on these principles, it’s impossible to increase R&D effectiveness.

To address this, we need to accept a very basic principle: most of the time, we have very little, if any, idea about the impact a feature or function will have. Our offerings tend to operate in quite complex contexts with other systems and are used by humans who are difficult, if not impossible, to predict and who have a huge internal discrepancy between what they say they do (espoused theory) and what they actually do (theory-in-use). It feels incredibly uncomfortable, especially as we’re expected to be experts in our field and our reputation depends on acting like one, to admit that some things simply are unknowable. It is, however, the starting point of any recovery process. Until we admit that the impact of our R&D efforts is unknowable, we can’t make progress toward resolving this issue.

Once we admit that things are unknowable, the next step is to stop viewing the world in terms of requirements and instead start to look at it in terms of hypotheses and experiments. Rather than prioritizing requirements, our role is to collect and define hypotheses. These tend to be of the form “if we build this, the effect will be that.” The next step is to prioritize our hypotheses for evaluation. Evaluation takes place through experiments.

Although we tend to think about experiments in terms of the scientific method and hard, quantitative, data, in this context it’s preferable to look at them as techniques to iteratively build more confidence in the hypothesis. So, initially, an experiment may take the form of “present the idea to ten customers and gauge interest to determine if there’s sufficient interest.” In this case, we can beforehand state that at least six have to claim that this is of interest and something they’re willing to pay for.

Once we’ve established that customers say they’re interested in a specific feature, the next step becomes measuring their actual behavior. Here, an experiment can be a small-scale prototype or an A/B test where only a few people are exposed to the feature. This allows us to measure at a small scale whether people do what they say they would do. If this is also successful, based on success criteria developed before the experiment is conducted, we can scale things up.

A scaled-up approach can, for example, be a full-fledged A/B test where a significant percentage of customers and users is exposed to the new feature with the intent of measuring engagement and behavior. If a large-scale A/B test is successful, we’ve established that the feature is viable and relevant and that it needs to be productized (beyond the point that the A/B test required).

Half of all R&D effort is waste because we prioritize the wrong things for development. This is because we’re not clear on what we’re seeking to accomplish, we don’t learn from past mistakes and misconceptions and we’re subject to politics and social forces, causing us to prioritize efforts we know are wasteful from the beginning. Instead, we have to start with the belief that the impact of new features and functions is, by and large, unknowable. That insight then leads us to work with hypotheses and experiments instead of requirements. Rather than one experiment, we encourage a sequence of experiments that incrementally build up confidence in the validity of the hypothesis. To end with a quote by John Finley: “Maturity of mind is to endure uncertainty.”

Want to read more like this? Sign up for my newsletter at jan@janbosch.com or follow me on janbosch.com/blog, LinkedIn (linkedin.com/in/janbosch), Medium or Twitter (@JanBosch).