The Danger Of Written Test Procedures
There might be good reasons for and against writing down test procedures. I’ll explore the later in this article. I’ll try to show it on a real example (real being thought up for the purposes of this article, but I’ve seen very similar examples before) and include a bunch of considerations that might be worth exploring in a real world scenario.
To start with a little bit more context, let’s define what I mean by test procedure, written test procedure, and the like.
A test procedure is a particular way of going about testing. That typically translates into a list of steps a tester is asked to follow. Then we can say that this list of steps drive (to some extent) the testing.
A written test procedure is then a test procedure commited to a file, so it exists in a textual form. Examples of this could be an internal Confluence page, a Word document, and all sorts of other ways testers are asked to use.
Some people might also use the following terminology:
- to write a test
- to write a test case
- to write a check
- and likely a bunch of other word expressions
However, for the purposes of this article, I’ll consider all of these expressions analogous.
Let’s now see one example. I’ve created this one for the purposes of this article but I’ve seen similar examples before. I hope that you can relate to this as something that you have come across as well.
Here comes the written test procedure:
1. Choose a random item.
2. Add the item to a shopping cart.
To add a bit more context, let’s say this is an e-commerce platform, an e-shop where people can buy things. Again, it’s just an example that’s easy to imagine because people have some exterience with online shopping systems. If you work on a different system, feel free to create a similar example relevant to your situation.
If I’m given such a test procedure, I’d really not be happy. So for the rest of this article, I’ll be talking about why I’d not be happy.
A few questions and possible risks come to mind:
- What does random mean? Random is really not random in many cases. I think this would simply lead to choosing an item that’s closest to the tester on the UI (for example right on the homepage as opposed to in a category), which is everything but random.
- How should I choose the item? Can an item be added to a cart from different places? E.g. from a product detail page as well as from a listing page.
- Is one way of choosing preferable over other ways? Is one way more used with real customers? What are the ways we want customers to be able to use? Are there any ways we want to block from using? E.g. do we want to build a platform that considers accessibility? An example of that might be being able to control the system using a keyboard only.
- What does an item mean? E-shops sell all sorts of products, they might differ in their properties, in how they are represented in databases, in how the system works with these representations. I’ve seen e-shops successfuly add some items into a cart while fail when adding different items into the same cart. It’s not given that when one item can be added into a cart, it will be automatically true for all items.
- It seems to me that the test procedure suggests I should choose an item that’s also in stock. How about if I choose one that’s not in stock at the moment? Can that still be added to a cart? Why? Why not? Perhaps it can still be added but the context changes and instead of buing it straight away, it will be reserved and shipped when in stock again at some point in the future.
- How about items that are discounted? Should they be included in the pool of items I’m “randomly” choosing from? Or perhaps it’s a new test idea that’s worth experimenting with.
- How about a choice of a device? Many e-shop platforms are built with responsiveness in mind. That means they could be used on various devices, screen sizes, operating systems, etc. It’s a good idea to gather evidence about how the system actually behaves in different environments.
- What and why should happen in the background? Should the system subtrack the number of items in stock? Or should this happen only after the order is completed?
Many questions and I’ve barely moved passed the first point :) Let’s continue:
- Should I add just one item? Why not more? Why does it seem to suggest that adding one item equals adding more items? The system might very well behave differently depending on the quantity of items being added to a cart. An example could be discounts — a customer gets a discount for higher quantity of the product.
- What does a shopping cart mean? There might be more instances of it. E.g. when a customer logs in, the shopping cart might be personalised. Or perhaps there are already other items in the shopping cart, which might play a significant role. Simply put it, it says a shopping cart, not the shopping cart, so I don’t really know what it means.
- Should we log user’s actions? Peerhaps we want to monitor users more closely so we can use the data for marketing purposes. That leads to all sorts of ethical questions that are usually nowadays translated into GDPR (in Europe) and informing users about these practices before we gather any personal data.
I think I can continue like this for more, but I’ll limit myself here. The point is not to come up with a finite set of questions and ideas. That is not really possible in the first place. The point is to get to the habbit of asking the questions and coming up with test ideas that might be relevant in a particular situation.
At this point, I think it’s a little more clear why it might not be a good idea to create written test procedures in some situations like this example one. Let’s list some of the reasons:
- It’s difficult to capture examply what is meant by such steps. The steps are many times ambiguous, too general, they omit important aspects, they can’t really capture tacit knowledge. That can lead to more confusion than not having any written procedure at all.
- Written test procedure might limit the thinking a tester puts into the task. Some testers might simply take the two steps, build a mental model based on that, perform the testing, and mark the test procedure as passed. All the questioning might be left out because the tester was given these steps and not given an open-ended task.
- It’s not easy to follow procedures all the time. If this is all you are asked to do, you might find it boring. And people don’t like boring tasks. So you might start thinking about different things, daydreaming, etc. You might also start thinking about other ways of performing the steps, which is paradoxically exactly what you should do. But you should do it consciously, not as a by-product.
- Written test procedures tend to get obsolete quickly. Systems change, documentation not so much. It’s expensive to keep documentation in sync with the actual system.
- If the intention of written test procedures is to guide junior testers, it’s still bad. Junior testers need to learn to think about their testing. That’s the first and foremost task they have. It’s better to give junior testers challenges, do pair testing with them, debrief with them after their test sessions.
How about if you create an automated test procedure in this situation? You turn those two steps into an automated check that you can run, or that’s run automatically in a pipeline. It seems like an ideal solution in this example. Or does it not?
Well, all those points and questions I had don’t really go away then, do they? By automating the test procedure, you saved your effort and time on future executions, but otherwise you have not answered any of those questions.
I think this is when the real danger comes in.
When you actually work with the system, there is at least some chance you will not stick to the exact steps (humans are not good at that, although some are better than others), therefore adding some variation to your testing. That can uncover serious problems. Real customers don’t follow a set of steps when online-shopping, they vary their steps.
When you automate these steps, you completely take the variation away. The scripted test procedure won’t add any variation to its execution. And won’t even notice things that it is not programmed to notice at the time of creating the scripted test procedure. A typical example is that when going to the cart, your script checks there’s a certain button, or that a url has changed and is now
/cart or something. But let’s say that a frontend developer changes CSS files and it breaks the menu. Unless you check for this, which means more coding at this point, the scripted test procedure will never notice it.
That said, there are better and more useful ways of applying automated checking than using it for user interfaces. That “user” in UI exists for a purpose.
Have I gone too far with this example? From my experience of what I’ve seen I have not. This is how many testers operate on a daily basis. It’s sad but it is a reality.
All that said, I’m all for automated checks. However, they need to be applied in specific situations where it makes sense. And they are limited in what information they bring to us. Over-relying on them is not a good idea.
And when it comes to written test procedures, I’m not a big proponent of that. There might be some specific situations, like a heavily regulated industry, when it’s required to create some written test procedures. But in other situations, I’d most likely be against that because I don’t really think there is any real benefit.