Automated UI tests are flaky by their nature. They are not unit tests, which are stable by their nature. If you try to fix a UI test, that fails 2 times while 10 times passing, you’ll find yourself in the Heisenbug hell. Even throwing man-years on debugging those flaky testcases, you can only reduce the flakyness of your testsuite, but you can’t dissolve it. When even Google’s ‘world class engineers’ can’t beat that beast – we think we can? You can continue playing that game, but from this day on, I am out!
What does one FAILed UI test tell us? Nothing! If only one UI test PASSed out of hundered runs (given the same version of AUT) – the testcase is PASSed! Because it practically can’t PASS even once, if there would be a functional bug on its way. We don’t know why the other 99 runs have FAILed. Perhaps because the server of the testenvironment was under varying load (which BTW also lies in a testservers nature), perhaps because UI element x sometimes is faster than the UI element y, the test-driver itself is flaky or perhaps because the devil’s in the code. Even if the root cause is really a race-condition in the AUTs code: You won’t convince the dev to spend time on it, with your testruns telling him ‘sometimes it works, sometimes not’, as this is no reproducable base for his debugging.
CI-Tools have to adapt: Re-run is the magic word. If an automated UI testcase FAILes in the nightly testrun, rerun it. Repeat that till dawn. If it won’t PASS once in that night – then, and only then, it’s really FAILed and you have to analyze it in the morning.
These thoughts are nothing less than a paradigm change in UI testautomation. In the aseptic world of quality assurance you were teached that tests have to be stable, accurate and repeatable to be reliable. Now, you have to convince yourself that a FAIL is not an alert. It’s like the change in mind you have to execute coming from classical logic to fuzzy logic. Reality is fuzzy, UI tests are flaky.
Honor to my great test automation collegues @ GfK.