Refactoring Massive Codebases

by Paco Mendes & Alberto Venturini

As successful software systems mature, maintaining their ability to adapt to market demands presents some unique engineering challenges. In massive codebases, manual refactoring becomes prohibitively expensive. As legacy patterns and quirks accumulate, new tooling is required to measure, analyze, and apply refactorings that enable businesses to remain competitive.

Codebases with millions of lines of code and configuration are typically maintained by multiple teams with aligned, but differing, business objectives, technical skills, experience, and opinions; over time, these differences increase the complexity of the system. While we all strive to achieve good code coverage, large systems are often poorly tested and fragile. Despite these challenges, these systems are essential to business success and must be continually evolved to meet new business needs.

In the video below, first presented at ScaleConf 2020, we address common challenges and share techniques that we have used to automate refactoring and effectively deal with technical debt in massive codebases.

Themes we discuss include:

  • Aligning refactoring efforts with business objectives.
  • Explaining the benefits of refactoring and justifying the costs.
  • Strategies for accelerating the refactoring of large codebases
  • Modeling large codebases to understand the risk and cost of refactoring
  • Identifying code that can be safely deleted during refactoring
  • Applying automated refactoring