Musing on Testing

Table of Contents

Purpose #

We use tests in order to get confidence in the product:

Validation: to prove that the product does what it’s supposed to do.
Regression: to prove that changes to the product do not break existing behaviors.

Tests can also provide documentation for how the product is expected to be used, and what assumptions are involved in using it.

Traditional approaches to testing often focus on the confidence aspects, with the documentation is often just a by-product, often overlooked and unrefined, while some approached like TDD (Test Driven Development), often put the documentation part ahead.

purpose of testing diagram: documentation (usage, assumptions), confidence to use (validation), confidence to change (regression).

Requirements #

Correctness #

The test should pass if and only if the product behaves correctly

Tests should fail by default.

Specificity #

The test should target a specific behavior, and make it clear what it verifies, and what failed in case of failure.

Isolation #

The test should not have an impact on the environment in a way that can impact the behavior of the product.

Test should be able to run in any order, ideally in parallel.

Some examples:

Run each test against a dedicated DB.
Run each test in a separate transaction scope (e.g. using repeatable read isolation level).
Use test specific entities.
Run each test in an isolated environment.

Reproducibility #

You should be able to reproduce test results, given the same starting conditions.

Given the same starting conditions, test should return predicable, consistent results for every execution, unless the logic being tested is modified.

Explicitly define the starting conditions.
For tests the rely on randomized data: store the seeds used to generate it, and make it simple to reuse the seeds to reproduce the test.
Use time service abstractions / mocks to control dates and times.

Some tests (e.g. functional verifications) should be fully deterministic. Some tests might (e.g. fuzzing, load and stress tests, exploratory tests) rely on more stochastic behavior. Ensure that you have enough information to understanding the variables and the invariants in the tests, so that failures can be appropriately reproduced, addressed, and the fixes validated.

Tensions #

A good test needs to balance the tension between competing objectives:

High relevance #

How much confidence does the test provide, what is being tested and how much it resembles the real world usage.

Fast feedback #

How quickly the can test provide meaningful results

Time to author
Time to execute

Low maintenance #

How resilient is the test to changes in the product.

Approaches #

Test behaviors, not mocks.
Test names should describe the behavior under test, not the outcome.

Observations #

Debugger is a time sink, if it’s even an options. Test results and logs should provide enough information to understand what is going on without need to debug. Using the same observability tools (traces, logs, metrics) in the development process (e.g. test failure analysis) as in production, will also help drive improvement for observability overall.
Guarantees are better than assertions - an impossible code path is better than a handled invalid path.
The quickest and most effective test is the compiler. Make the most of static analysis.
Style matters - make suspicious code stand out (e.g. yoda conditions, avoid negative conditions and ! operators, use early returns, avoid booleans, avoid nulls, use exhaustive checks, etc.)
Code coverage is not a measure of quality. It’s a development tool to identify gaps in tests, and unused code paths. Optimizing for high code coverage targets often leads to micro-optimizations of tests to test minute implementation details that don’t add value, and increase test fragility.
Code is a liability. Test code does not add value to the user, make it matter. Test behaviors, not mocks.
The Test Pyramid is a myth. Microtesting - How We Set Fire to the Testing Pyramid While Maintaining Confidence