Software development is a process of iterative discovery. Developers at the beginning of a project cannot know exactly what to write, because no design can tell them how user needs will evolve or how product owner requirements might change over the life of the product. Agile development philosophies embrace the unknowable by encouraging us to just get started, build something, and then continue to iterate as we learn.
Developers refer to the need to revisit previously written code as “technical debt,” a term coined by computer scientist Ward Cunningham in 1992. Like financial debt, technical debt has an interest rate and grows over time. Just as interest on financial debt adds to the amount of money that must be repaid on a loan, interest on technical debt adds to the amount of code that will need to be revised in the future.
Technical decisions made during software development are always made with imperfect information about the future of the product. Each of these choices can be thought of as technical debt and will start accruing interest. Sustainable agile software development requires a strategy for consistently paying down this interest.
Whenever developers look back on software we wrote in earlier phases of development, we see all the ways our previous knowledge of user needs and stakeholder requirements was incomplete. By taking the time to refine our code and pay down our tech debt, we can apply our growing understanding of the product to stabilize what we’ve already written and prepare for the product’s future.
The process of going back and improving previously written code is called refactoring. Refactoring is the process of reworking the code to make it more resilient and extensible, without disrupting any of its existing functionality. Although refactoring makes the code easier to build on in the future, it’s not always easy to convince product owners to make time for refactoring because it doesn’t add any new features to the product.
But refactoring isn’t just a matter of aesthetic preference for beautiful code. Technical debt resulting from previous decisions can create inefficiencies that drag down a development team’s velocity. If left to grow, interest on the product’s technical debt will eventually reach the point that massive investments of time and effort are required just to keep it functional. This is why it’s critical for a development team to make time regularly to improve earlier code and keep their project up to date.
The interest rate
Unfortunately, tech debt doesn’t only accumulate as engineers write code. It continues to accumulate even if no new code is written at all. Even in the absence of ongoing development, technical debt grows with time. There are a couple reasons for this.
Very little code used today is written from scratch. Instead, software products are dependent on code libraries and packages that are continually being updated to add features, squash bugs, and remediate security vulnerabilities. Sometimes, changes to these external dependencies cause breaking changes or packages get deprecated and discontinued. In addition, every software product has to be hosted somewhere. Operating systems are continually being updated, and cloud hosting providers change their infrastructure products often. The world our software lives in continues to evolve, even if our software product remains unchanged.
Because the software world itself is dynamic, if a team walks away from software that is in good working order for 6 months, they’ll return to discover that they cannot simply resume adding to it where they left off before. The technical debt will have continued to accumulate in their absence, and it will have to be paid down before they can add anything new.
An Ad Hoc example
Here at Ad Hoc, we’ve built an internal application called the Connector App that sends notifications for internal recruiting and people management processes. The Connector App had been working reliably for years and had been maintained and improved consistently. Business pressures caused the engineers who had been maintaining it to be placed on other projects. It continued working for about six months without any changes or updates. But then a worker process in the application started failing occasionally, requiring us to manually restart the app. When we started investigating why the app was failing, we discovered that its technical debt had been quietly accumulating interest.
The Connector App runs on AWS EC2 instances and uses Que, a Ruby background worker, to send email responses to applicants and handle various onboarding and offboarding tasks. The Que worker failing resulted in notifications not being sent that should have been. As a company that prides itself on the resilient infrastructure we build for our customers, we knew we had to do better than manually restarting processes.
We initially considered a temporary workaround, using systemd to restart Que whenever it failed. But we discovered that the application was running on Amazon Linux 1, which had reached end-of-life in 2020 and does not have systemd. Our original decision to use Amazon Linux was a good decision at the time, but as it was now nearing end-of-life it became technical debt. By not paying it down by continuing to iterate on the app, we’d allowed that debt to grow. Now it had reached the point where we had no choice but to re-engineer our deployment solution.
At that point, our options were to try to make Amazon Linux 1 work, migrate the app to an Amazon Linux 2 instance and proceed with our original plan to use systemd, or find some other option. We rejected Amazon Linux 1 because it had reached its end-of-life and that would continue to cause problems moving forward. Moving to Amazon Linux 2 would have been an easier solution and would have solved the immediate issue, but we decided that we wanted a more modern solution that would give us greater flexibility in the future.
Moving to Docker
We ultimately decided to migrate the application to run on Docker using Debian as the base. Although this wasn’t the simplest solution, it had many advantages for us. We already used Docker for development on this project, so this gave us a unified experience. It also solved the problem of the operating system being end-of-life, thereby paying down some of the technical debt for this project. Finally, using containerized applications decouples the application from the host platform, and if we decide to move the Connector App to another platform in the future (such as container orchestration services), it will be far easier because Docker containers are very portable.
Instead of increasing our technical debt further by choosing the quickest solution, we decided to service the debt by improving the infrastructure and deployment process of the application. Putting the effort in now to do it right means that future changes will be easier, even though it wasn’t the easiest option to implement in the short term. Ultimately, putting the Que process under systemd supervision, we were able to give our engineers time to diagnose and fix the issue, while not impacting the functionality of the application.
This change required multiple engineers. We had to completely rework our build and deployment processes. None of the needed changes were that big individually, but collectively they required multiple days of effort to accomplish. The interest on our technical debt was accumulating the entire time the application went unmaintained. If this application’s technical debt had been continually paid down as each issue arose over time, no massive intervention would have been required.
Budgeting for tech debt
When systems enter Operation & Maintenance, they begin to no longer support the evolving needs of their users or interact cleanly with other systems as explained in Ad Hoc Playbook play #12. Technical debt continues to accrue as we saw in the example of our Connector App. This example shows how important it is to continue to pay down technical debt for all of your applications. But developer time and energy are finite — how much of each should we spend to manage that debt?
One of the principles in the Agile Manifesto is: “Agile processes promote sustainable development.” The only way we’ve found to keep a sustainable, constant pace in development is by setting aside a portion of every team’s velocity for technical debt. Budgeting for your technical debt prevents the interest rate from adding to the principal, which helps to avoid needing major maintenance interventions.
But what exactly is the interest rate on this debt? How much of our velocity should we dedicate to keeping it under control? In the DevOps Handbook, Gene Kim suggests that a good starting point is devoting 20% of engineering time to pay down technical debt:
The deal [between product owners and] engineering goes like this: Product management takes 20% of the team’s capacity right off the top and gives this to engineering to spend as they see fit. They might use it to rewrite, re-architect, or re-factor problematic parts of the code base…whatever they believe is necessary to avoid ever having to come to the team and say, ‘we need to stop and rewrite [all our code].’ If you’re in really bad shape today, you might need to make this 30% or even more of the resources. However, I get nervous when I find teams that think they can get away with much less than 20%.
—Gene Kim, The DevOps Handbook
Technical debt isn’t a bad thing — iterative development wouldn’t be possible without it. But teams do need to plan how to pay down their debts. There’s no math that proves that 20% is the right amount of energy to devote to tech debt, but it’s a good number because it equates to about one day per week or about one week per month, which is a good place to start.
At Ad Hoc, the exact amount of velocity our teams spend on addressing tech debt in any given sprint may be higher or lower than the 20% that Gene Kim recommends, but what matters is that each team explicitly sets aside the capacity for it. When teams focus only on adding new features, they’re falling behind on their tech debt interest payments, and that debt will catch up with them sooner or later.
The level of effort spent on tech debt doesn’t have to remain constant, either. Another of the principles in the Agile Manifesto is that “At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.” This means that the team should decide together how much of their work should be reserved for managing technical debt.
Technical debt in government
The fact that tech debt continues to accumulate interest on products that aren’t actively being developed is a particularly acute problem in government. Government acquisitions and procurement processes often treat service delivery as a project rather than as a product.
Project management is perfect when a large effort over a specific timeframe is required, followed by an indefinite period of ongoing maintenance. The construction of a new building or the purchase of a new fleet of vehicles might fit this pattern, but web applications do not. They must be continuously delivered and iteratively improved using stakeholder feedback — the product management approach.
Unmaintained applications, APIs, and websites keep accumulating more and more interest on their technical debt, until they eventually become unusable. By instead treating digital services like other public services, as ongoing efforts that are never complete, the government can deliver experiences on the same level as the best commercial software products.
At Ad Hoc, we pride ourselves on bringing the modern skills necessary to help agencies transform their public services into digital services with agile development teams and a product management mindset. By regularly addressing existing technical debt as part of the product life cycle, and by having a sustainable plan to service that debt going forward, we enable the kind of digital transformation that will benefit everyone who uses government services.