Cleaning-up Tough Rails Projects
November 2, 2017
Whether we're onboarded as a new employee, or as a consultant to clean things up, we all come across the Rails project that has no tests, was built by non-Rails developers, and/or is already falling into serious disrepair. We may even find ourselves the author of such projects, but what we though would be a short-lived disposable thing actually took on a permanent status in the company's portfolio.
This article addresses those projects, and how we can approach their cleanup pragmatically.
Why Rails projects go bad
"Bad Rails projects" (those without tests, nobody knows how it works under the hood anymore, styles/assets are a mess, etc.) manifest for a number of reasons, some of which actually make sense:
- A fire-and-forget project without tests / without properly constructed styles / etc.
is the fastest way to reach a Minimum Viable Product (MVP) – quick iteration
of this sort minimizes up-front investment, and you lose a lot less if you
discover it's a bad dead-end idea
- But, as you now see, if the project does live onward, technical debt must be addressed
- The project may have been cooked up on short notice using contractors who couldn't have been properly vetted
No matter the case, the perfect time to fix things is now. If the project means something to the business, it deserves to be easy to read (i.e. easier to onboard other developers to work with it), stable, up-to-date, and ready for the new features it will inevitably need, while securing an irreversible quality foothold in the features it already supports.
Start with the process
Try to define how the project operates
The first thing to do is to get a handle on the project's contribution process. This will tell you a lot about the codebase. Code is often a reflection of the organization it serves, and the processes (or lack thereof!) around the project being worked on. If this was a "get it done now" corner project that happened to come into heavy use, it's not unusual to find that:
- There is no automated testing / Continuous Integration
- There is not a well-defined code deployment process
- There is not a clear means to bring code into the codebase such that it
meets quality standards; the toughest of projects will have everyone working
- Features added may not well-defined, or fractured. Features that seem small may blow up at the project management level, and developers may find that a 1 day task takes 4 because requirements shift under them.
- Assets (styles, JS) are in a state of disarray
It's important to map out how the organization runs the project, even if it's not ran at all. Only then can you start to make suggestions to the organization to explain where they're at now, where they should be, and why it is worth the investment to get there.
This is certainly worth its own blog post, so let's jump right into code!
Talk to other developers
If this project already has developer(s), you'll want to have 1 on 1 discussions with them. You'll want to quickly identify whether or not the project is broken because of organizational issues (e.g. lack of communication from developers to management), or if it is truly just in a state of disrepair due to lack of proper technical leadership.
As the newcomer (hopefully a management-elected leader) to making a project better, it's your place to:
- Hear, acknowledge, and put a plan of action together to take serious the advice that existing developers can offer. Be sure the advice-givers are given a chance to work on their suggestions if you decide to implement them. Foster personal investment and a sense of ownership in the product with developers.
- Work on any communication issues between management and developers. If management has a "resource" mindset toward development talent, this is an easy way to demotivate, and may be why the project has converged on being operated poorly. You can counter this by taking on a people-management role, push for 1 on 1s, and ask for increased authority in operating the project.
I'm going to assume for most of this article that you're just working with a Rails project that doesn't have any other developers. Integrating into (and leading) an existing team is a topic unto itself!
Peering into the code
Cleanup the README.md
If the README is untouched (i.e. a template), just delete everything in it. If any of it is out of date (i.e. instructions won't work), or just hint at future work, you can delete that too. Outdated/lying documentation often provides negative value, and at best can be archived away in a corner.
Then, begin setup of the project and document very carefully what you had to do to get there. This is valuable; it is going to save you in the future if you need to do a reinstall, and helps onboard other developers later.
If you are a part-time contractor, this has value for you as well: better to have clean, centralized, curated notes for the project than a wild mess of pseudo-notes locked away on your machine. You'll be context switching in and out of the project, so it helps to keep a central README you can just open back up to remind yourself of how the essentials work.
Areas you'll want to be sure are covered – even if you can't describe them right away:
- How does a developer contribute to the process? What is the workflow? You can link out to an external wiki here – we just want a new developer to know what
- How do you setup to start developing?
- What services might you need access to? Heroku? Sendgrid?
- How does code deploy?
- How can you carry out certain operations tasks, like cloning a stage/prod database to your local machine?
- How do you run the tests?
- What common errors may you hit, and how do you fix them? ("Troubleshooting")
- What's the important business terminology? What language do you and the stakeholders need to speak? Processes? Think along the lines of Domain Driven Design here insofar as needing to establish terminology.
Fortify your position with tests
Are there tests? Slim-to-none-at-all tends to be a common pattern. In this scenario, you are in a tough position. You cannot safely – and professionally – add new features without potentially breaking aspects of the system you may not even understand yet. It is far too easy to make things worse if you buy in to stacking on more code at this phase before gaining a solid footing.
The first step is to sit down with a major stakeholder and map out all of the user stories that involve significant interaction with the system.
That includes, at least,
- Logging in
- Signing up
- Resetting a password
- Changing my user profile
- … Other critical features / stories that exemplify the value this website provides – ask what aspects of the site must obviously, absolutely work no matter what.
Gherkin is a great syntax for capturing these user stories in a structured format, and maps well to writing tests later. Work with business users to build these stories after first familiarizing yourself with the system by simply reverse engineering how it works.
Now, begin writing integration tests (Rails Feature tests) that externally confirm correct system behavior using a headless web browser. Be sure to test edge conditions – ferret out all the logic branches implied by a signup process, all of the validations in use, etc., and cement them in tests.
These tests are worth their weight in gold – once written, you have the following:
- A working 80%-of-what-you-need-to-know sense of what this codebase does and how it does it
- Tests for the most critical paths through the application
- A safe foundation to begin adding additional features or modifying existing ones
- An established communication pattern with a stakeholder
- Proof that the external black-box behavior of the system is correct – unit tests are valuable, but are secondary to first verifying externally-observed behavior – can you prove the product delivers its core value without exploding?
Be sure to explain what is happening along the way. This is a phase in the project where the business won't see anything tangible from your efforts. The value you offer here is this:
- New features – or existing – can be worked on and deployed without disturbing current users with bugs. What is the acceptable business risk to important user-flows suddenly breaking? This work mitigates that risk.
- You are documenting how the system works – if you took solid Gherkin-style
user story notes, those can go into the codebase as documentation, or
better, straight into integration tests inside of
itsyntax. The system is becoming self-documenting, which speeds up the introduction of future developers if the company needs to scale up.
We've completely avoided unit tests so far. This is on purpose: unit tests often provide far less business value in this situation than integration tests. Unit tests are valuable in maintaining a high degree of software quality, but on projects of the sort you've picked up, it usually isn't tenable to entirely reverse engineer the system to produce unit tests in a timely fashion. That, and there's a good chance you will want to take existing code and refactor it significantly – when you get down at that micro-level, you can decide if you need to write a unit test before your refactor, or after that. We certainly want to add unit tests for future code we create or change, but it's not so important at the outset.
Cleanup the Gemfile
Lock versions, but allow for minor updates
Be sure the
Gemfile is locked to minor versions of everything that is used.
That means if you see the following:
You should run
bundle, pull the version of
1.2.3), and lock
it so that it will receive only non-breaking minor version updates:
gem 'foo', '~> 1.2.3'
That's equivalent to just writing
~> 1.2, but by adding the minor version (
we also imply the last-known-working minor version in case a quick revert is
Now, you can run
bundle update down the road and will be fairly certain
that this won't break large portions of the app. We'll get into long-term
maintenance (i.e. upgrading gems beyond minor versions) later.
If someone who already has the project installed (or even a remote server)
has an existing
Gemfile.lock, it is very much a good idea to start
using that instead of doing a full reinstall. You do run the risk of
accidentally installing a version of a gem that is newer than others
may be using if the version was not pinned, and this could
break the project.
Eliminate git references to repositories you don't own
If there are Gemfile references to github/bitbucket/etc. and it's to repositories the company does not own, fix these immediately!
Simply clone the target repository under your organization's account, and then point to that.
This protects you from the inevitable future where that person/maintainer
deletes the project and you suddenly can't
bundle install anymore!
Lock git references to specific commits
Have a reference like this?
gem 'administrate', github: 'thoughtbot/administrate', branch: 'master'
It would be better to point to a regular RubyGem, but maybe there was a bleeding-edge feature that just absolutely had to be used in this project.
Again, find a recent
Gemfile.lock to help specify a commit hash to point to.
By doing this, you aren't tracking each and every change that happens to appear in
breaking your application. Fix like so:
gem 'administrate', github: 'thoughtbot/administrate', branch: 'master', ref: '568722'
If you don't know which commit to point to, this could be a problem. It means
the Rails app you just did a
bundle install for is probably already broken!
In this case, lookup the git history of the Gemfile to see what date the gem
was added, then pick a reasonably-nice-looking commit in the target gem's
repository near that date.
Sort the gems by functions
I like to group gems by their function in the system, e.g.
This forces me to lookup what each gem does (to help become acquainted with the project), and helps future authors quickly figure this out too.
Replicate production to stage, and development
If not setup yet, be sure to establish a staging server. Ideally, this is a clone of production's database with cleanup / sanitizing applied afterward. You may not be able to do this depending on PII regulation around the product you're working with, or may need to perform additional sanitizing before this data can be moved from the production environment.
Second to that, it is very valuable to be able to pull this "sanitized production" down to your own machine as well. Again, regulatory environment plays a part here.
You can write a home-brewed bash script for the database cloning with the help of google in most cases. For Capistrano, I enjoy capistrano-db-tasks
The advantage to this:
- You can test production-targeted featured on staging with real data – nothing beats a test against real data by a human doing QA.
- You can develop against knowledge of production data in development instead of sneaking into the production Rails console, where you're one mis-type away from catastrophe. In the real world, the easiest way to solve most support issues is to recreate in a development environment with production data.
- For some types of products, there is non-transactional "structural data"
that is necessary for the product to work out of the box. For example,
model instances called
Templatethat store email templates are very helpful to have available in development, even though they live as editable things in the production database.
These types of projects usually end up with a copy-pasted styles file from another web property the company owns. This brings in "static" styles that affect the cleanliness of the HTML the Rails project implements (e.g. abusing HTML semantics).
Before you start cleanup, take screenshots of all the pages. I'd even suggest opening them all in new tabs in a separate browser, so you can inspect the styles later to debug issues that arise as you begin to make changes. Or, similarly, be ready to rewind to an older git commit with the old styles.
Your first step is to simply decompose the giant styles file into smaller files. For example,
typography- All font-sizing, letter-spacing, etc.
structure- Core styles (colors, hover behavior, etc)
lists- List styling
footer- Nav and footer styling
- … And any others you find natural!
While doing this, be sure to use line-by-line git blame to spot places where others in the Rails project have been tweaking the styles, and may have dropped a style definition in the wrong place. You can go ahead and move it during this process.
Also move third-party CSS/JS to a
vendor/ folder. If
.min files were
added to the project, try to find the originals (ideally not minified,
but acceptable if left as so). Document where the file comes from as a comment.
Vendored JS/CSS is very difficult to upgrade later without documentation
explaining where it came from!
After doing this, destroy the giant all-in-one styles file, and compare the website before-and-after to verify you mapped over correctly.
- Remove the unnecessary styles (if this is a copy-pasted from another web property, it is sure to contain lots of cruft the Rails project is not using)
- Assess responsiveness. If the original styles handled this poorly, this is something to bring up and address later.
- If applicable, assess the handicap accessibility of the website, which typically includes taking it for a spin with a screen-reader. Also could unpack into additional cleanup work.
Do one last pass to be sure the website styling is intact.
Assuming you've done all of the first steps, you're now able to move into writing new features, or modifying existing ones. The code you produce should be held to a new standard of quality:
- If possible, it is subjected to team review; this both verifies that your code makes sense, fits in with the team's product knowledge, and for less-skilled developers (either professionally, or just in the project), gives them a good chance to learn
- Pull requests should require that a test suite passes
- Unit tests should be written for most models/services, including the ones you are merely refactoring – write unit tests for those before making changes, then refactor afterwards only once you are certain you understand what that class is doing.
There's certainly much more to running a well-oiled machine, but that too is another subject!
Keep the dependencies updated
Very shortly after all this cleanup, and after writing some new code, you will need to help the organization understand the value of version upgrade maintenance. I am referring to the following:
- Keeping Rails up to date
- Keeping gems up to date (always do this with a Rails upgrade!) – push them up not by just minor versions, but by major versions!
- Keeping infrastructure dependencies up to date (whether that's moving to newer Heroku stacks, postgres/MySQL versions, new Linux versions, etc.)
It seems plainly clear that this is a good thing to developers, but you need to be able to very clearly articulate this to the business:
- Up to date dependencies are a lower security risk. Updates – especially minor ones – often fix critical security flaws.
- Up-to-date software takes advantage of new feature-sets in libraries it depends on. Old, out-of-date software relies more on "rewrite the wheel" development because it can't upgrade to better-featured new versions of libraries that support the features we need right now.
- Newer versions of libraries / frameworks are more relevant to today's workforce. Keeping projects in old versions of Rails requires that new hires become familiar with a framework that reduces the available set of functionality.
Keeping things up to date is largely an exercise in risk mitigation from the perspective of the business. Not all businesses will want to prioritize this, but it is your duty as a professional to still explain the necessity here.
It does come at a baseline cost that the organization must accept:
- A dedicated portion of developer-time should be set aside for upgrading dependencies, especially as new security patches are published.
- The organization will need to periodically fund larger-scale efforts to upgrade the Rails code as dependencies require it.
Refactoring is something of an art, in that knowing when to do it and how to do it is often not all that scientific. As the project progresses, a conscious effort must be made by the organization regarding what to refactor, and when.
Refactoring is often left up to developers. This is usually fine, provided that a code review process gives everyone a chance to chime in. But, some refactors are large enough that they need to be discussed with a project manager, and possibly management.
In most cases, "bad code" with "good tests" is perfectly acceptable and will serve a business for years without needing to be refactored. As a professional, you want to carefully weigh the needs of the business vs. what we know to be "technically correct."
In a similar vein, it's all too tempting to madly write new frameworks and abstractions where copy-and-pasting is actually a reasonable thing to do. Often, refactoring is a last resort only once you've determined that you fully understand a problem, and need to take that understanding and map it back into the code – to simplify the code and make it more digestible. If you're not sure if a refactor is necessary, then it probably isn't. If you repeatedly run into trouble understanding code or the concepts that drive it, then a refactor is probably in order. Sometimes a refactor is as simple as destroying premature abstractions/APIs and removing levels of indirection from code.
This is itself its own topic, but instilling a respect for refactoring at the organization level is an important take-away in keeping your newfound project up to date and clean.
Write a user manual
This feels old-fashioned, and may feel like a distraction, but if the software is complex enough for stakeholders or developers to regularly forget how to do things, then creating a manual (ideally in the codebase, presented on the web front-end) is a good solution here. Think of it more like a wiki to knowledge-share between stakeholders and developers, and to also come to agreement on what "do important business thing X" really means in concrete terms of the software. This manual is also a great place to document the hidden workings of the application – the complex behavior that could be implied by dozens of services working in concert!
By putting the manual in the codebase, you can easily subject it to code review, too.
If the project is particularly complex, take some time to learn about Domain Driven Design. If a project is arguably many "sub-projects", and you suspect it could be decoupled into separate gems, then DDD is a great framework for deciding how to isolate the gems, but allow for (loose) coupling, and also create a shared set of terminology that shareholders/developers can use to discuss the project unambigiously.
This was all high-level, but I hope that you find it a helpful reference to starting on an existing Rails project as an outsider. Keep in mind, as a fresh "outsider" being made responsible for this project, you're a fresh set of eyes on existing company processes – you'll "easily" spot issues to begin fixing. Be sure to communicate what's missing upstream, and get management-level support for your cleanup of the project, even if it's cursory. Projects are a reflection of how the organization runs and collectively understands software, so teach (to technical and non-technical stakeholders alike) as much as possible to ensure the future viability of your Rails project!
By Daniel Starling
Software consultant in Portland, OR
Software consultant in Portland, OR