The Goal: Code Nirvana
Imagine a world where you can release your app without lifting a finger to test it. Your code flows from your mind to your fingertips on the keyboard, then to your version control system, which triggers a set of tests, ensuring your vision is achieved without breaking any existing functionality. Some moments later, after the compilers have finished their magic, prompted by your continuous integration pipeline, a shiny, new package appears: your team’s latest creation. Something akin to “code nirvana” — the state of enlightenment, as a team of engineers, in which newly added features produced no bugs, suffering, or regressions. Sounds nice, doesn’t it?
I have been working on a team that longs to thrive in this enlightened state of code nirvana. We’re not there yet — our tests are long, bulky, and sometimes we’re not even sure what they’re supposed to test. Our pipelines — especially the emulators we try to run on them — are flaky. We use unreliable third-party systems in our tests, which leads to failures that leave us uncertain if we broke something, or if something else was broken, and we we’re merely innocent bystanders. So how do we arrive at code nirvana from this land of despair?
Seeing the forest from the trees
For us, the first step was to take a step back and examine the current state of our tests. We noticed a few things:
A flaky emulator in our CI pipeline
This is a common issue for Android app development teams. The Android emulators run well on our hardware, but when we try to run them in the cloud, they are slow or unusable — when we can even get them to start at all. The reason is that the agents we used to run the tests don’t support full virtualization. We were trying to run our tests on virtual machines, which, in turn, don’t support full virtualization themselves. Android emulators are virtual machines, after all.
Reliance on third-party sandbox environments
This one may be less common, but it was a real problem for us. When our tests were originally written, they were intended to be a full end-to-end test suite, triggering actual calls to the APIs we relied on, and ensuring that the integration worked properly. The problem became the reliability of those third-party systems. We were relying on real data, which was subject to change on a whim, or systems that were the sandbox environment of our partners, sometimes meaning they would be unavailable for days at a time.
This one is likely another common issue and reminds me of a great article by Martin Fowler on the concept of “test cancer.” Fowler describes the problem like this: “Sometimes the tests are excluded from the build scripts, and haven’t been run in months. Sometimes the ‘tests’ are run, but a good proportion of them are commented out. Either way, our precious tests are afflicted with a nasty cancer that is time-consuming and frustrating to eradicate.” Our code was fraught with test cancer — most of our automated tests still technically ran, but their results were ignored entirely.
Unclear purpose of tests
One factor that led to our blissful ignorance of test results was that, often, we weren’t sure what they were supposed to be testing. It’s easy to ignore the result of a test if that test doesn’t provide value to your team or your client. So the tests were there, and they were running, but they weren’t telling us what we really needed to know — could our users use our app how they needed to?
After taking stock, it was clear that we needed to make some changes. So, like any good engineering team, we made a plan. We evaluated our problems and sought solutions that would make our tests work for us to achieve code nirvana, instead of keeping us in the land of despair.
The first problem was that we had a low confidence level in our automated tests, so manual testing was necessary for a release. In the state of code nirvana, manual testing is at a minimum. Achieving a higher level of confidence would also mean that if a test was broken, we knew we had a problem, and something needed to be fixed. Additionally, we wanted our tests to be lightning-fast, so we could immediately know whether there was an issue. Lastly, we wanted the process of adding new tests to be simple. We didn’t want to worry about whether a feature was already tested because we’d know, and we wanted to be able to add tests for untested features with a low level of effort and complexity, meaning our tests would be scalable.
To get there, we decided a few changes were in order.
Remove tests that didn’t prove valuable
This part was relatively easy, as it mostly involved deleting a bunch of code (one of the greatest feelings in the world, IMHO). We assigned one of our test engineers to the honorable task of evaluating the tests in our automation suite against what tests we ran manually every time we did a release. The cross-section of that comparison allowed us to find the tests that actually showed us something important about our application. The other tests were dead weight, and we were happy to cut them loose.
Lift and shift user interface validation to unit tests
When we removed our low-value tests, we found that many were trying to validate our app’s user interface. They checked that headings had the correct wording, inputs were labeled correctly, and the like. In the interest of speed and clarity, we decided to lift and shift these tests to our unit testing suite. This is one of the more involved tasks our new approach demands, but we think it will be useful. Unit tests are easier to maintain than automated tests, and it is much more useful to test different screens in isolation, where we can pass state into the views and ensure that the view is rendered correctly according to that state. It also leaves the automated tests to the task they are best suited for — to validate complete flows instead of individual views.
Run our tests using real devices in the cloud
Running our tests on real devices was one of the most important pieces for our new approach. Without a reliable way to run our tests in a pipeline, our tests would never be as useful to us as we wanted (dare I say “needed”?) them to be. Moving our automated tests to a service that provided a surefire way to run them was a necessity.
Provide fake implementations of third-party services
The fundamental question that led us to this decision was this: what did we want our automated tests to test? At their conception, it was decided that they should be true end-to-end tests, checking our application against our third-party services, and ensuring the integration was holistically sound. Ultimately, however, we determined this approach wasn’t working for us. It was too easy to lay blame for failures on third parties, meanwhile leaving us uncertain if our work was up to par. Removing the dependency between our tests passing and our partner’s services being available, and in the state our tests expected, meant we could be certain that our work was up to the mark.
We still have a way to go, and a fair amount of work to do, before we can say we’ve achieved true enlightenment, but we think we’ve formulated a solid plan, and we’re taking steps in the right direction. Ultimately, what will get us there is focusing on a small set of high-value automated tests that run reliably in our pipelines. And while the workload demanded by some aspects of our new approach will require us to get our hands dirty to make our tests healthy again, we think the outcome will be worth it. Not only will we be able to release with a smaller manual testing lift, but we’ll have confidence that the features in a release are healthy every time we have a new build. With our newfound confidence, we hope to soon leave behind the dreaded land of despair, and look forward to seeing what we can achieve next.