
For most of the history of software engineering, we’ve operated under a deceptively simple model. Engineers specify behavior. Systems execute it. When the behavior is wrong, engineers fix it. When requirements change, engineers rewrite it. Between releases, the system is inert. It doesn’t learn from what it observes in production. It doesn’t adapt to how users actually behave. It doesn’t improve on its own.
That model is becoming obsolete. Not because software engineers are being replaced, but because the systems they build are fundamentally changing in character. A new generation of software systems learns continuously from the data generated in operation, improves its own behavior through reinforcement learning and structured experimentation, and in some cases generates and deploys its own code at runtime.
The discipline of software engineering is shifting from building software to building learning systems. That shift is as consequential as any transition the field has seen since the move from waterfall to Agile. This was the central argument I made at the 2026 International Conference on Software Engineering in Rio de Janeiro, and I want to try to unpack it here with the help of a few companies that are already making it concrete.
The distinction between a system that uses AI and a system that learns continuously is worth drawing carefully. Most organizations have done the former. They’ve embedded AI models into their products, improved their predictions and deployed smarter features. Very few have done the latter: building production systems that close a feedback loop between deployment and improvement without requiring engineers to manually collect data, retrain models and redeploy.
The difference isn’t one of degree; it’s architectural. A learning system treats every production interaction as a training signal, every deployment as an experiment and every failure as feedback. Building one requires different design patterns, different infrastructure and a different understanding of what software engineering is fundamentally for.
Four techniques are converging to make this possible at production scale. The first is reinforcement learning. Unlike supervised learning, which requires labelled training data prepared in advance, RL allows a system to learn by doing: taking actions, observing outcomes and updating its behavior in the direction of better results. In domains involving sequential decisions, from content recommendation to logistics routing to network optimization, this is a profound capability shift. A system optimizing delivery routes with RL doesn’t need an engineer to tell it what the right route is; it discovers what works by running routes, measuring outcomes and continuously improving.
Anyscale, the company behind the Ray distributed computing framework, has made this kind of production-grade reinforcement learning operationally accessible through RLlib. Companies including Grab, Ericsson and JPMorgan run live RL workloads on this infrastructure and Physical Intelligence uses it to train robotic systems. What Anyscale has demonstrated is that RL is no longer a research curiosity constrained to game-playing benchmarks; it’s an operational discipline running at production scale in consequential business processes.
The second technique is federated learning. It addresses a structural problem that has constrained AI for its entire commercial history: Most of the world’s most valuable data can’t be moved. Medical records are bound by privacy regulations. Financial transactions carry contractual confidentiality. Sensor data from industrial equipment belongs to customers who won’t share it. Automotive telemetry sits on vehicles distributed across dozens of countries. The conventional approach to training AI models requires centralizing data, which makes it inaccessible for the vast majority of high-value enterprise applications. Federated learning inverts this entirely. Rather than moving data to the model, the model is distributed to where the data lives. Training happens locally. Only the mathematical updates to the model, not the underlying data, are shared back to be aggregated into a shared global model.
Flower Labs, a Cambridge and Berlin-based company that emerged from research at the University of Cambridge and is now backed by Felicis, Mozilla Ventures and Hugging Face’s CEO, among others, has built the most widely adopted open-source framework for federated learning. Nokia, Porsche, Samsung and Brave are production users. The framework has been used in experiments with up to fifteen million clients simultaneously. For software-intensive companies operating in sectors with distributed, sensitive or regulated data, including automotive, healthcare, financial services and industrial IoT, Flower represents a path to training AI systems on the data that was previously out of reach. The competitive implication is significant: The organizations that learn to train on their private distributed data will have AI systems that their competitors, relying only on public or centralized data, simply can’t match.
The third technique is runtime code generation and self-healing. This is the least commercially mature of the four, but arguably the most intellectually significant for the software engineering discipline specifically. Traditional software fails in one of two ways: It either crashes visibly, which is detectable and fixable, or it degrades silently, which is not. A learning system can do something different: When it encounters a runtime failure it hasn’t seen before, it can generate a handler for it dynamically, deploy the fix and continue operating without human intervention.
Research groups at different universities have demonstrated this with prototype systems that use large language models to generate exception handling strategies at runtime, drawing on the context of the error, the state of the program and the intended behavior. On the commercial side, the architecture of systems like Cursor and Windsurf is beginning to point in this direction, though they remain developer-assisted rather than fully autonomous. The gap between research and production deployment here is still wide. But the direction is unambiguous: Systems that can modify themselves in response to failure are qualitatively different from systems that require human intervention to recover.
The fourth technique, systematic A/B experimentation, may be the most underestimated. Most organizations run experiments episodically, when a product manager has a hypothesis worth testing. The fastest-learning organizations have made experimentation continuous and structural: Every deployment is an experiment, every system decision is a testable hypothesis, and the pipeline between observation and improvement runs automatically. This isn’t merely a tooling change; it represents a different theory of how software systems improve.
Kameleoon, founded in Paris and now one of Europe’s leading experimentation platforms, has operationalized this for the web and full-stack layers, combining feature flagging, A/B testing, multivariate experiments and real-time AI-driven targeting in a single platform. The insight it has built into its architecture is that experimentation and personalization aren’t separate activities: The signal from each experiment improves the targeting for the next one and the behavioral data from targeting informs what to test. Organizations running thousands of experiments per year learn from their systems at a categorically different rate than those running dozens.
What makes these four techniques more than an interesting collection of tools is what happens when they’re combined into a coherent learning architecture. Reinforcement learning requires a reward signal, and in production systems, that signal comes from continuous observation of user and system behavior. Federated learning requires distributed data assets, but the model it produces can itself be continuously improved through RL and experimentation. Runtime code generation provides a mechanism to close failure loops faster than any human development cycle allows. And systematic A/B experimentation validates every change against real behavior before full deployment, creating a data flywheel that compounds the quality of every subsequent learning iteration. Each technique amplifies the others. Together, they define what it means to build a learning system rather than a static system.
The implications for the software engineering discipline aren’t trivial. Building a learning system requires capabilities that most engineering organizations haven’t developed. Reward function design, the specification of what an RL system should optimize for, is a new and genuinely difficult engineering discipline that sits at the intersection of product thinking and ML engineering. Data governance and privacy architecture are prerequisites for federated learning, not afterthoughts. Monitoring a system that modifies its own behavior requires different observability tools than monitoring a system that executes fixed code. And the organizational structures that emerged from project-centric development models are often poorly suited to the continuous, data-driven improvement cycles that learning systems demand.
Organizations that take this transition seriously won’t simply have better features; they’ll have a compounding structural advantage that widens with every deployment cycle. Every interaction their system has in production generates data. Every piece of data improves the model. Every model improvement makes the next experiment more informative. The learning accelerates over time. Organizations that remain on static systems face a widening capability gap that can’t be closed by hiring more engineers or shipping faster. To end with Donella Meadows: “You can’t understand a system from within the system.”
Want to read more like this? Sign up for my newsletter at jan@janbosch.com or follow me on janbosch.com/blog, LinkedIn (linkedin.com/in/janbosch) or X (@JanBosch).Categories