When we write tests to check the correctness of our code, we're used to thinking of examples: base cases, boundary conditions, edge cases, and so on. We enumerate these examples, and write example-based unit tests that check given a specific example input, we get an expected output.
For example, if we're testing a function that reverses a list, we might write tests like:
def test_reverse_empty():
assert reverse([]) == []
def test_reverse_single():
assert reverse([1]) == [1]
def test_reverse_multiple():
assert reverse([1, 2, 3]) == [3, 2, 1]
This approach is fine, but has several problems. First, it's laborious: we need to manually generate these examples, and write tests for each one. Second, what happens if we don't think of some examples we really should be covering - we'll never write that test, and any related bugs will not be found. What about lists with duplicate elements? What about lists with negative numbers? What about very long lists? The combinatorial explosion of possible inputs makes it impractical to test them all manually.
This is where property-based testing can help us.
What is Property-based testing?
Property-based testing allows us to write down the properties that our code should always satisfy, rather than enumerating specific examples. A property is a statement that should be true for all valid inputs.
If we want to test a sorting function, we can identify several properties it should satisfy:
- The output array should always have the same elements as the input (just reordered)
- The output array should be in non-decreasing order
- Applying sort twice should give the same result as sorting once (idempotency)
- The length of the output should match the length of the input
Property-based tests let you write down these properties, and then the framework will:
- Generate random inputs according to a strategy
- Execute the code under test with those inputs
- Verify that the property holds
- Search for counterexamples - inputs where the property fails
- Shrink failing examples to find the minimal case that reproduces the bug
By focusing on properties rather than examples, we can test a much larger space of inputs automatically, often catching bugs that manual example-based tests would miss.
Property-Based Testing with Python
Hypothesis is a Python library for property-based testing that is based on QuickCheck from Haskell.
First, we should install Hypothesis following the documentation. In a virtual environment:
pip install hypothesis
How Hypothesis Works
When you write a property-based test with Hypothesis, the @given decorator takes a strategy that describes what kind of data to generate. Strategies are composable - you can build complex data types from simple ones. You can write your own strategies for your own custom data types.
By default, Hypothesis then runs your test function 100 times (configurable), each time with a different randomly generated input. If an assertion fails, Hypothesis will try to find a simpler input that also fails, a process called shrinking.
Also, Hypothesis maintains a database of previously found failing examples, so it can replay them in future test runs to ensure regressions don't reoccur.
Here's a simple example:
from hypothesis import given
from hypothesis import strategies as st
@given(st.lists(st.integers()))
def test_sort_preserves_length(xs):
sorted_xs = sorted(xs)
assert len(xs) == len(sorted_xs)
In this example, st.lists(st.integers()) is a strategy that generates random lists of integers. Hypothesis will generate many such lists (empty lists, single-element lists, large lists, lists with negative numbers, etc.) and verify that sorting doesn't change the length.
Strategies
Strategies define what data Hypothesis should generate. Hypothesis provides many built-in strategies, for example we just saw st.integers() which generates integers combined with st.lists() which generates lists.
There are also combination strategies, e.g. st.one_of(...) lets us randomly choose from one of several strategies.
You can also constrain strategies, for example st.integers(min_value=0, max_value=100) generates integers in the specified range.
Shrinking
When Hypothesis finds a failure, it doesn't just report the first failing input it encountered. Instead, it performs shrinking - it tries to find the simplest example that still violates the property. This makes debugging much easier.
For example, if your test fails on a list with 1000 elements [1, 2, 3, ..., 1000], Hypothesis might shrink it down to just [1, 2, 3] or even [1, 0] if that's what actually triggers the bug. The shrinking process tries various techniques:
- Removing elements from collections
- Making numbers smaller
- Simplifying strings
- Trying edge cases like empty collections, zero, or negative numbers
This gives you a minimal reproducible example that's much easier to understand and debug than a large, complex input.
More Examples
Here are a few more cases where property-based testing can be helpful.
Round-Trip Testing
One common pattern is testing that operations are inverses of each other. For example, encoding and decoding should round-trip:
@given(st.text())
def test_encode_decode_roundtrip(text):
encoded = base64.b64encode(text.encode()).decode()
decoded = base64.b64decode(encoded).decode()
assert decoded == text
Or for serialization/deserialization:
import json
@given(st.dictionaries(st.text(), st.one_of(st.integers(), st.text(), st.booleans())))
def test_json_roundtrip(data):
serialized = json.dumps(data)
deserialized = json.loads(serialized)
assert deserialized == data
Invariants
Properties that should always be true, regardless of input:
@given(st.lists(st.integers()))
def test_sort_is_ordered(xs):
sorted_xs = sorted(xs)
for i in range(len(sorted_xs) - 1):
assert sorted_xs[i] <= sorted_xs[i + 1]
@given(st.lists(st.integers()))
def test_sort_preserves_elements(xs):
sorted_xs = sorted(xs)
assert set(xs) == set(sorted_xs)
assert len(xs) == len(sorted_xs)
Tips and Best Practices
Configuration
You can configure how Hypothesis runs your tests:
from hypothesis import settings, given
from hypothesis import strategies as st
@settings(max_examples=200, deadline=500) # Run 200 examples, allow 500ms per test
@given(st.lists(st.integers()))
def test_blah(xs):
pass
Common settings:
max_examples: How many examples to try (default: 100)deadline: Maximum time per test in millisecondsseed: Random seed for reproducible test runs
For example, for reproducible test runs you can set the random seed like this:
@settings(seed=42)
@given(st.lists(st.integers()))
def test_deterministic(xs):
pass
Using assume() for Conditional Tests
Sometimes you want to skip certain inputs that aren't relevant to the property you're testing:
from hypothesis import assume
@given(st.integers(), st.integers())
def test_division(a, b):
assume(b != 0) # Skip test cases where b is 0
result = a / b
assert isinstance(result, (int, float))
assume() tells Hypothesis to skip this example and try another one.
Summary
Property-based testing helps you find bugs you didn't know to look for, think more rigorously about your code and the properties it should maintain, document that behavior as an executable specification, and catch and record regressions.
Check out the Hypothesis documentation for more.