Thoughts
Reading
Articles/Blogs/Essays
-
Martin Fowler:
-
LeadDev:
- What leaders get wrong about latent cost of technical debt: Problem: "Accruing technical debt isn’t necessarily a negative issue, and usually starts with good intentions. For instance, you may have made the call to accept a tech debt trade-off with the view to finishing a feature or meeting a deadline. But what happens when this becomes the norm – an ingrained component of the culture? I have seen the process of teams normalizing broken tests, sluggish builds, and tribal knowledge to the point that no one asks questions about it anymore. The longer this debt remains unaddressed, the more it interferes with the velocity of development and cognitive load, demotivating team members." ... Operational instability: "Technical debt is not always about messy code; it is instability that sneaks into operations. Even the slightest quality problem in a regulated industry can turn into an existential threat. For example, an FAA audit conducted on Boeing found that the airplane company failed 33 out of 89 quality audits of their 737 Max production, because they were not following approved manufacturing processes. What is the cause of most of these failures? Missing documentation, haphazard tooling, and a lack of process; technical debt at scale. Similar to the weak APIs and undocumented programs in our world, these were not merely bugs, but time bombs that had systemic implications. The parallel fragility in software teams is that the mean time to recovery (MTTR) is slower, and more often, rollbacks or on-call escalations are required. You may not be building planes, but when a service fails and no one understands who owns the fallback logic or how it is tested, you are closer to the risk than you imagine." ... The silencing effect of accumulated complexity: "Systems are bound to become more complex as they evolve. However, when technical debt accumulates without oversight, such as a lack of documentation, unclear ownership, or postponed refactoring, it accelerates complexity with no structural checks in place. When I was working in a fintech company, we learned that more than 40% of their microservices lacked identifiable owners, alongside the fact that uneven test coverage was rampant. Although the engineering department had grown at a high rate, no one bothered to assess the structural debt being incurred; issues such as tightly coupled services, legacy monoliths with hardcoded integrations, or ownership gaps that made critical systems unmaintainable. These findings illustrated how entrenched silence was in the team culture. Engineers cease to raise issues since this is how things are. New employees do not challenge inconsistencies since they presume it is deliberate. This normalization is what makes technical debt so dangerous; it becomes unseen, but highly influential." ... Strategic cost isn’t just financial: "In addition to operational anarchy, debt constrains strategic options. It traps teams in unstable architectures, makes experimentation less desirable, and change more expensive. ... The moral is obvious: technical debt minimizes optionality. It denies organizations the flexibility to react to threats or innovate promptly when needed." Approach: Start with full-scope visibility: "[R]unning engineering surveys, conducting service maturity audits, and analyzing operational metrics. These efforts often revealed debt artifacts like unowned scripts, deprecated libraries, or undocumented APIs – elements that rarely show up in standard project tracking tools. Without a structured inventory like this, teams often focus their efforts on the most obvious pain points, such as slow tests or deployment delays, rather than the most strategically important ones. Full-scope visibility means going beyond surface issues to identify and document what’s genuinely slowing down delivery, scaling, or incident response. A more modern strategy for understanding the scope of your tech debt issue incorporates telemetry-driven scans. These will be able to surface broken pipelines and flaky tests. It’s also important to gather qualitative feedback: developer pain points, support tickets, and onboarding feedback. If new engineers repeatedly encounter setup failures or unclear integration steps with a specific legacy module during onboarding, that module is a visibility gap. It’s not just a one-time inconvenience; it reflects debt that directly affects developer experience and onboarding velocity. These recurring issues should be logged and scored, as they indicate systemic friction with measurable impact." Score by impact, not just frustration: "Not every debt is alike. An abandoned configuration file is an inconvenience to engineers, but a closely-coupled authentication system that drags every product update has significantly steeper consequences. I suggest a light scoring model that is determined by three factors: Severity: What is the downstream risk of this debt going unaddressed? Frequency: How frequently does it create issues? Strategic impact: Does the debt limit your ability to scale systems, like handling more users, data, or teams? Does it impede your ability to adapt your product direction, e.g., shift to a new architecture, integrate with new services, or launch a different feature? With a simple scoring system (e.g., 1-5), you will have a shared language to compare debts between teams and make decisions on what to work on first. Debt isn't solved by scope; it's solved by relevance." Choose the right fix: Refactor, replace, or bypass: Refactor when the debt compounds daily costs; things like developer frustration, poor test coverage, or sluggish performance. For instance, in one of our back-end services, a shared utility function was frequently modified and regularly broke downstream dependencies. A simple refactor to isolate concerns reduced change failure rates by over 30% in just two sprints. Replace when you’re scaling past its original intent, e.g., hardcoded workflows or in-memory stores. At a previous role, our real-time analytics relied on an in-memory store that had no sharding or durability guarantees. It worked at launch, but as our usage scaled 10x, data loss and throttling became common. We replaced it with a distributed store designed for high throughput and persistence. Bypass when the effort-to-impact ratio is too high, fix only what’s necessary, and document the rest. One team I worked with had a legacy admin portal with hardcoded permissions logic. Rewriting it would have taken months, but it was rarely used. We documented its quirks, added a banner to warn users of limitations, and created a wrapper for the one feature it still supported.
The lesson: don’t assume all tech debt deserves your best engineering. Sometimes, clarity and containment are more valuable than cleanup." Accountability is a team sport: "The ownership of the debt cannot rest with tech leads alone. Teams require incentives and rituals, such as quarterly debt review, shared dashboards, and making ownership based on service health scores. An organization I consulted with tied debt scores directly to performance reviews at the top management level, not as a stick, but as an indication that quality was not a choice. Within two quarters, they saw a 25% increase in resolved debt items and a measurable drop in incident frequency across critical systems, indicating that visibility and ownership alone can drive behavior change."
Tags:
reading
development
business
Last modified 15 September 2025