in automated-testing ~ read.
How an automated UI test helped us defuse a time bomb in our code

How an automated UI test helped us defuse a time bomb in our code

Picture this: your latest feature is almost ready for production; all unit tests are passing in the build server, and all is working smoothly in the staging environment. Are you done? Not quite. It's time to create a UI test! A very recent experience at work illustrates how such a test can help you find bugs before the code makes it to production.

Why UI tests?

I must admit I arrived late to the test automation party. I had this "I know better" attitude common amongst some developers, which manifested itself in a strong resistance towards new and different working practices. Thanks to the persuasion efforts of my senior colleagues I grudgingly adopted unit tests in my day-to-day work. Fast forward 5 years, and I am now a firm believer in testing automation as a key part of software stability. A good automated test suite may not help you deliver working software quicker in the short term, but it will greatly prevent errors when you extend it or rewrite it further down the line, speeding up delivery in the mid-to-long term.

UI tests are at the top of the test pyramid: they are expensive to write and slow to run (and they're not the most popular piece of work amongst developers). That's why we write few of them and have them smoke test the happy path through features.

We use Selenium for this kind of tests. Personally, I find this library a bit clunky and sometimes unreliable, but it seems to be one of the best technologies for programmatically driving a web browser. We run our UI test suite against a special test environment, which gets torn down and re-created every time a new version of our software is built.

The test we wrote for this feature is quite simple: load the page (initially with no comments), click the "Add comments" button, type some text, click "Save", and re-read the DOM to make sure that the comment is there.

Surprise!

To my astonishment, the tests were revealing a 500 internal server error. Comments were created successfully in the database, but when the browser queried the server for the newly created data, the error was generated.

Unfortunately, the error body just contained the following text: {message: "An error has occurred."}. There was no stack trace dump or any additional information to give us a clue of what was failing. The message was being obfuscated by some element of our test environment.

Digging deeper

I wanted to understand whether the issue was caused by the test itself (ie: a bad test) or whether it was a genuine bug in our code. I pointed my browser at the automated test environment and, yes, the error was genuine.

After chasing a couple of false leads, I learnt that the full error message, including the stack trace, is visible if you use the browser on the web server itself. We call that "security" and apparently it's a good thing.

I logged onto the server, connected to the comments page locally, replicated the error, and managed to get the full trace. The exception stack (a thing of beauty!) is shown below:

<Error>
    <Message>An error has occurred.</Message>
<ExceptionMessage>
    Value was either too large or too small for an Int32.
</ExceptionMessage>
<ExceptionType>System.OverflowException</ExceptionType>
<StackTrace>
    at System.Convert.ToInt32(Int64 value)
    at System.Int64.System.IConvertible.ToInt32(IFormatProvider provider)
    etc...
</StackTrace> 

The "aha!" moment

The error message is quite self-explanatory. Digging into the repository code we found this line of code:

IComment Comment = new Comment(reader.GetInt32(reader.GetOrdinal("Id")));

In the database, the "Id" field is stored as a bigint, which is a 64-bit integer. We were reading this value and storing it in a 32-bit integer. This is a potential time bomb. The C# compiler is clever enough to perform an implicit cast, but such operation will overflow if the value is too high.

Just to double-check that this was the issue, I checked the IDs of existing comments in the database:

Aha! I can see id's around 4.2 billion, which is greater than the maximum value for a signed 32-bit integer.

The fix was as simple as replacing the line above with:

IComment Comment = new Comment(reader.GetInt64(reader.GetOrdinal("Id")));

Once built and deployed, the UI test started passing happily.

It would take a very long time for our users to create over 2.1 billion records. In all likelihood, this wouldn't happen in the next few years. But the potential for disaster is there.

Conclusion

An automated UI test exposed a latent issue in our code. Strictly speaking, it was the act of testing in a different environment, with a different database, that revealed the bug.

Regardless of the specifics, automated UI testing saved the day here. If we had skipped this step, this issue could have laid dormant in production until it would explode in the customer's face in years' time. Moral of the story? Invest some time in an automated test pipeline. It will prevent things blowing up in production.

Setting up such a pipeline is not an easy task. This is not easily achievable by a single, heroic developer; it's a team effort that takes a lot of time. It requires buy-in from management and an investement in resources, but it's well worth the effort.