You’ve spent months setting up your test automation, making it perfect, making it automatic so you can press a button and find out if your build passes. And what happens? Failures! Not real failures, that would be fine—that’s the whole point of testing. No, you’re getting flaky failures where something went wrong with the test itself. Aargh!
Did a browser crash or a phone lose connectivity? Not your fault, but now it’s your problem. Were there race conditions (thread.sleep) or elements not visible yet when the automation tried to click a button? There’s a long list of things that can go wrong that have nothing to do with the code changes being tested.
So what can you do besides throw up your hands and cry?
You can try to fix the tests. Certainly, you want to make the tests as reliable as you can, but some failures are out of your control. Even mighty Google which automates everything says that 16% of their tests have some level of flakiness, and Microsoft says that getting to 90% reliability took a lot of manual work. 90%? You need to be at 100%.
You can delete the offending tests. Hmm. Tempting. But there’s a reason you created the tests and if you eliminate them, you risk missing real bugs and defeat the point of testing.
You can manually review all the failures. Yuck. Broken builds, big waste of your time. What was the point of setting up all that automation?
So what do Google and Microsoft do? It turns out that it’s not all that difficult to tell flaky failures from real ones. The same test passes on a second try, or the failure looks similar to previous flaky failures, or the failure happens in a way that real failures don’t. So Google and Microsoft use a post-processing module to separate the real failures from the flakes. Then they quarantine the flaky failures rather than quarantining the tests themselves.
Automatically identifying and quarantining flaky failures prevents the build from breaking, avoids wasting QA team time, and eliminates developer frustration. It means your CI is green unless there is a real failure. And it means your test automation can run automatically, without your intervention. So you can spend your time on more important things than sorting through test failures.
Curious? Microsoft wrote a nice report about how they solved the problem of flaky tests:
You’re probably thinking, that’s great for Google and Microsoft, but you don’t have a spare team of machine learning specialists embedded in your QA group with a year or two to build this capability. Well, you’re in luck, because Appsurify has done it for you. So now all it takes is an hour to add Appsurify to your toolchain and the days of tearing your hair out over flaky days are over.