How to write tests that need a lot of data?

Based on a true story.

Work in progress. This article is not finished yet.

Introduction

For 20 years I developed applications in Java. Over the years I have gained a lot of experience with writing tests (unit tests/integration tests). I have used different styles of writing tests. Some styles seemed promising at first, but turned out horrible later on. Often the cause of tests becoming horrible was the large dataset required by the test. A couple of techniques discovered by my team mates and I helped us to write clean tests that used a lot of data. In this post I want to share these techniques with you.

I present the techniques with code samples of a Warehouse Management System (WMS). As you can find on my website, I have worked for a paint factory for about 8 years and worked on a WMS, which later grew into an ERP. That application was developed in Java 8.

In March 2019 I started to work for a completely different company, Protix, and since that time I develop in Python 3.6. Since this post is a preparation for my presentation at PyCon DE & PyData Berlin, I show code samples in Python.

The code samples I show you are not from the real WMS I have worked on before. Nevertheless, the code samples are realist and complex enough to explain the techniques for writing tests that need a lot of data. That is why I say: this post is based on a true story.

Finally I want to thank one of my team mates at the paint factory: John Castelijn. John pointed me to some interesting presentations online that helped the team to improve writing tests. Together we improved the quality of our tests over time.

What is a WMS?

WMS is an abbreviation of Warehouse Management System. A WMS is the application used by planners and operators to run a warehouse. Imagine a warehouse storing thousands of pallets. Each pallet contains many items, for example 200 cans of paint.

Photo by Ruchindra Gunasekara on Unsplash

A warehouse is divided in several areas:

Everything in the warehouse has an id:

Here is an example of a barcode and text of an id of a bin in the pick area:

A WMS gives planners and operators an overview of the current stock and helps to execute the workflows in the warehouse. The main workflows in a WMS are:

What am I not going to explain about tests?

I assume that by now you have learned the following rules about writing tests:

What is a lot of data in a test?

When learning Test-Driven Development many people practice implementing small algorithms or data structures, for example, a stack. This is what unit tests for a Stack class look like:

class TestStack(TestCase):
    stack = Stack()

    def test_empty_stack_is_empty(self):
        self.assertTrue(self.stack.is_empty())

    def test_empty_stack_count_returns_zero(self):
        self.assertEqual(self.stack.count(), 0)

    def test_pop_from_empty_stack_raises_exception(self):
        self.assertRaises(StackIsEmptyError, self.stack.pop)

    def test_stack_with_one_item_is_not_empty(self):
        self.stack.push('foo')

        self.assertFalse(self.stack.is_empty())

    def test_stack_with_one_item_has_count_one(self):
        self.stack.push('foo')

        self.assertEqual(self.stack.count(), 1)

    def test_pop_from_stack_with_one_item_returns_item(self):
        self.stack.push('foo')

        self.assertEqual(self.stack.pop(), 'foo')

    def test_multiple_items_on_stack_are_popped_in_reverse_order_of_push(self):
        self.stack.push('foo')
        self.stack.push('bar')

        self.assertEqual(self.stack.pop(), 'bar')
        self.assertEqual(self.stack.pop(), 'foo')

These tests use at most two elements of data: the strings foo and bar. It is very easy to provide such few elements in tests.

Most developers work on applications that store data in a database. Over time, the database consists of tens to hundreds of tables. Implementing new features might introduce a new table or add a few columns to an existing table. To test the new feature, requires data in tables. If you are lucky, your test only needs data form one or two tables. If you are unlucky, you need data from tens of different tables.

Depending on the process under test, different locations need to be created. For example, to test stocking, a relatively large bulk area is needed, for example with 100 bins. To test replenishment, a small bulk and pick area can suffice. For picking, a relatively large pick area must be used. See the following table to get an impression of the number of locations and bins that are needed to test a process. Note that apart from this, we always need items, users and at least one forklift.

process bulk pick staging others (problem, wrapping)
# objects 20-100 10-20 20-30 1-5
stocking XXXXX XXXXX
picking XXXXX XXXXX XXXXX
audit XXXXX XXXXX
replenishment XXXXX XXXXX XXXXX
loading XXXXX XXXXX XXXXX

The Good, the Bad and the Ugly

I have seen three ways to set up test data in a database:

  1. Per test case, use business logic to set up the data
  2. Per test case, fill the database directly using SQL statements
  3. For all test cases, fill the database only once and use that database for all the tests

For the Subject Under Test (SUT) it does not matter how the data was set up, as long as the data is present.

The following table shows for each of these ways how well they score on several aspects:

Aspect Use business logic Fill database directly (per test case) Fill database directly (once for all test cases)
Test data correct, realistic and complete? ++ -
Impact on test case execution time - + ++
Easy to deal with changes in structure of database? ++
Generates more data than needed by the test? - ++
Size of code that sets up the data – (in case of many steps) ++ - (if extra data is needed for test case)

Technique 1: Test Data Builder

One of the features of our WMS is that when a pallet gets moved to a problem location, the pallet gets blocked.

def test_pallet_moved_to_problem_location_gets_blocked(self):
    forklift = Location.objects.create(
        id='FORKLIFT01', sequence=0, level=0, type=LocationType.FORKLIFT, blocked=False)
    problem_location = Location.objects.create(
        id='PROBLEM', sequence=0, level=0, type=LocationType.PROBLEM, blocked=False)
    item = Item.objects.create(description='can 1 liter white paint')
    pallet = Pallet.objects.create(location=forklift, blocked=False)
    pallet.items.add(ItemOnPallet.objects.create(pallet=pallet, item=item, quantity=100, batch='2019401234'))

    self.service.move_pallet(pallet, problem_location)

    self.assertTrue(pallet.blocked)

The arrange part is the largest part of the test. Creating a location requires to fill in a lot of parameters, most of which are not even relevant for the test.

By extracting the code to create a pallet, forklift and problem location to a separate module, called test_data_builder, the test code gets smaller. Also the fact that we move a location from a forklift to a problem location stands out more clear now:

from wms.tests import test_data_builder as tdb

def test_pallet_moved_to_problem_location_gets_blocked(self):
    pallet = tdb.create_pallet(tdb.create_forklift(), items={tdb.item_1(): 100})
    problem_location = tdb.create_problem_location()

    self.service.move_pallet(pallet, problem_location)

    self.assertTrue(pallet.blocked)

Here is the code from the test_data_builder module:

def item_1():
    return Item.objects.get_or_create(description='White paint (1 liter)')[0]


def create_forklift(**kwargs):
    _id = kwargs.get('id', 'FORKLIFT01')
    return Location.objects.create(id=_id, sequence=0, level=0, type=LocationType.FORKLIFT, blocked=False)


def create_problem_location(**kwargs):
    _id = kwargs.get('id', 'PROBLEM')
    return Location.objects.create(id=_id, sequence=0, level=0, type=LocationType.PROBLEM, blocked=False)


def create_pallet(location: Location, **kwargs):
    global _next_batch
    pallet = Pallet.objects.create(location=location, blocked=kwargs.get('blocked', False))
    if 'items' in kwargs:
        for item, quantity_with_optional_batch in kwargs['items'].items():
            if isinstance(quantity_with_optional_batch, tuple):
                quantity, batch = quantity_with_optional_batch
            else:
                quantity = quantity_with_optional_batch
                batch = f'{_next_batch:06d}'
                _next_batch += 1

            pallet.items.add(
                ItemOnPallet.objects.create(pallet=pallet, item=item, quantity=quantity, batch=batch))

    return pallet

See how some functions create a thing and return the same thing the next time the function is called? That is useful for things that are constant for your tests. item_1() will always return the same item. The exact details of the item are not relevant for most of the test. The Test Data Builder makes it easy to use a small set of items.

Other functions create a new thing every time you call them. Pallets are typically created specifically for a test. The function create_pallet() takes an optional items argument, which can be either a tuple of items and quantities or a tuple of items, quantities and batches.

The code in the Test Data Builder can get a bit messy, like create_pallet() shows. Note that you typically write the functions in the Test Data Builder once, and modify/extend it a couple of times, but you used the functions hundreds of times. So while writing functions for Test Data Builder, focus on clean and simple test code.

Functions in Test Data Builders have the following properties:

A trick to ensure the function is simple to use is to first use it in a test and then implement it.

When using the Test Data Builder in a test should result in simple and clean code. However, tests should not depend on a specific default value. If a test needs some pallet that is full, and a Test Data Builder function named create_pallet() is called without specific items, then this function must be called with an argument that makes clear that the pallet is full: create_pallet(items={tdb.item_1(), tdb.item_1_full_pallet_quantity}. If you need full pallets in many tests, then you had better create a new function named create_full_pallet() with an optional parameter to override the item.

Technique 2: Visualize the data

The second technique is illustrated by testing an important feature of stocking. Remember that stocking is the process of putting full pallets, which just arrived from the factory, in bins in the bulk area. The diagram below shows a side view of an aisle in the bulk area. The WMS will tell the forklift driver which bin should be filled first with the next pallet.

Filling the aisle level by level from bottom to top is not efficient, because the horizontal distance the forklift has to travel increases a lot. In real life there are more than 5 bins at one level. Filling the aisle column by column from left to right is also not efficient, because forklifts move faster horizontally than vertically. So the optimal way to fill the aisle is under an angle.

Here is the code of two test cases that test the correct bin is returned to stock a pallet in an empty aisle. In the first test the pallet is not blocked, in the second test the pallet is blocked.

def test_stocking_pallet_in_empty_aisle(self):
    pallet = tdb.create_pallet(tdb.create_forklift(), items={(tdb.item_1()): 100})
    for sequence in range(1, 6):
        for level in range(0, 5):
            tdb.create_bulk_location(aisle=self.AISLE, sequence=sequence, level=level)

    destination = self.service.get_stock_location(pallet, self.AISLE)

    self.assertEqual(destination.id, f'AISLE001A')

def test_problem_pallet_does_not_get_stock_location(self):
    pallet = tdb.create_pallet(tdb.create_forklift(), blocked=True, items={(tdb.item_1()): 100})
    for sequence in range(1, 6):
        for level in range(0, 5):
            tdb.create_bulk_location(aisle=self.AISLE, sequence=sequence, level=level)

    destination = self.service.get_stock_location(pallet, self.AISLE)

    self.assertIsNone(destination)

See how using a Test Data Builder in nested loops build an empty aisle in just 3 lines of code. This way of generating data gets messy when certain bins in the aisle are already occupied by pallets.

The diagram above inspires me to represent the aisle textually as a multiline string. For example, the following string represents the empty aisle and the asterisk indicates the bin where the next pallet must be stocked:

"""|     |
   |     |
   |     |
   |*    |"""

Here is the code that uses such textual representations to specify the aisle that has to be configured and indicates the expected bin where the next pallet must be stocked.

@parameterized.expand([
    ('empty aisle',
     """|     |
        |     |
        |     |
        |*    |"""),

    ('one pallet present',
     """|     |
        |     |
        |     |
        |o*   |"""),

    ('two pallets present',
     """|     |
        |     |
        |*    |
        |oo   |"""),

    ('three pallets present',
     """|     |
        |     |
        |o    |
        |oo*  |"""),

    ('four pallets present',
     """|     |
        |     |
        |o*   |
        |ooo  |"""),

    ('no free bin in aisle',
     """|ooooo|
        |ooooo|
        |ooooo|
        |ooooo|"""),

    ('first gap is filled',
     """|     |
        |     |
        | o   |
        |o*o  |"""),

    ('blocked bin is skipped',
     """|     |
        |     |
        |     |
        |x*   |"""),

])
def test_stocking_pallet(self, _, bulk_aisle_map: str):
    """
    Test if a non-blocked pallet gets the correct stock location within an aisle.
    :param _: is added to the name of the test. Not used otherwise.
    :param bulk_aisle_map: two-dimensional map of the aisle. The pipes indicate the start and end of a level
    in the rack. The meaning of the characters between the pipes is:
    o: a pallet
    *: empty bin, this is the expected location
    x: empty bin, blocked
    :return:
    """
    expected_location = self._create_locations_in_aisle(bulk_aisle_map)
    pallet = tdb.create_pallet(tdb.create_forklift(), items={tdb.item_1(): 100})

    destination = self.service.get_stock_location(pallet, self.AISLE)

    if expected_location:
        self.assertEqual(destination, expected_location)
    else:
        self.assertIsNone(destination)

def _create_locations_in_aisle(self, bulk_aisle_map):
    lines = re.findall('[|][^|]+[|]', bulk_aisle_map)
    lines.reverse()
    expected_location = None
    level = 0
    for line in lines:
        for sequence in range(1, len(line) - 1):
            bulk_location = tdb.create_bulk_location(aisle=self.AISLE, sequence=sequence, level=level)
            if line[sequence] == 'o':
                tdb.create_pallet(bulk_location, items={tdb.item_1(): 100})
            if line[sequence] == '*':
                expected_location = bulk_location
            if line[sequence] == 'x':
                bulk_location.blocked = True
                bulk_location.save()
        level += 1
    return expected_location

The logic of parsing a multiline string and generating the data is implemented by _create_locations_in_aisle(). This code is a bit complex, but once this method has been written, generating test cases becomes a piece of cake. You can pair program with your product owner to write more test cases.

An other situation where this technique can be applied is when you test actor based code. You could use one string per actor to indicate at which moment in time a specific message is sent to that actor. Another string could be used to describe the expected messages sent by a specific actor:

in_1: "1        4          3     "
in_2: "  a        b      c       "
out:  "   (a,1)    (b,4)    (c,3)"

Technique 3: Workflow

The third technique is explained is illustrated by testing parts of the picking workflow. The picking workflow is illustrated by the following activity diagram:

The WMS implements the following methods that are used by the terminal during the picking process:

To properly test these methods, we need an order an a pick list and a pallet on a forklift that contains the items that have already been picked. There are many scenarios we want to test:

The more picked items are required by the arrange part of the test, the more lines of code are added to the arrange part. This reduces the readability and maintainability of these tests.

Since the picking workflow is well defined, it is quite easy to write a Workflow class that will use the business logic to get the database in a specific state.

class PickWorkflow:

    def __init__(self, generate_pick_bins=False, forklift: Optional[Location] = None, **kwargs) -> None:
        self.service = Service()
        if not forklift:
            forklift = Location.objects.filter(type=LocationType.FORKLIFT).first()
        self.forklift = forklift or tdb.create_forklift()
        self.order = tdb.create_order(**kwargs)
        self.pick_list: Optional[PickList] = None
        self.pick_pallet: Optional[Pallet] = None
        self.picked_pallets = []

        if generate_pick_bins:
            for order_line in self.order.lines.all():
                location = tdb.create_pick_bin(item=order_line.item, max_quantity=order_line.quantity)
                tdb.create_pallet(location, items={order_line.item: order_line.quantity})

    def generate_pick_list(self) -> 'PickWorkflow':
        if self.pick_list:
            raise Exception('A pick list has already been generated for the order.')

        self.pick_list = self.service.create_pick_list(self.order)

        return self

    def pick(self, items: Dict[str, float]) -> 'PickWorkflow':
        self._ensure_pick_pallet_is_created()
        for item_id, quantity in items.items():
            location = self.service.get_next_pick_location(self.pick_list, None)
            self.service.pick(self.pick_list, location, self.pick_pallet, tdb.item_by_id(item_id), quantity)
        return self

    def _ensure_pick_list_is_generated(self) -> None:
        if not self.pick_list:
            self.generate_pick_list()

    def _ensure_pick_pallet_is_created(self):
        if self.pick_pallet:
            return

        self.pick_pallet = tdb.create_pallet(self.forklift, pick_list=self.pick_list)
        self.picked_pallets.append(self.pick_pallet)

Here are examples of tests that use this PickWorkflow class:

class TestPicking(TestCase):
    service = Service()

    def setUp(self) -> None:
        tdb.create_basic_warehouse()

    def test_pick_part_of_pick_list_get_pick_quantity_for_item(self):
        self._create_pick_bins('P_1: 100xitem_1')
        workflow = PickWorkflow(items={'item_1': 5})\
            .generate_pick_list()\
            .pick(items={'item_1': 2})

        quantity = self._get_pick_quantity('P_1', workflow)

        self.assertEqual(quantity, 3)

    def test_pick_complete_pick_list_get_pick_quantity_for_item(self):
        self._create_pick_bins('P_1: 100xitem_1')
        workflow = PickWorkflow(items={'item_1': 5})\
            .generate_pick_list()\
            .pick(items={'item_1': 5})

        quantity = self._get_pick_quantity('P_1', workflow)

        self.assertEqual(quantity, 0)

    def test_get_pick_quantity_when_nothing_picked_yet(self):
        self._create_pick_bins('P_1: 100xitem_1')
        workflow = PickWorkflow(items={'item_1': 5}).generate_pick_list()

        quantity = self._get_pick_quantity('P_1', workflow)

        self.assertEqual(quantity, 5)

    def test_get_pick_quantity_for_bin_that_has_less_than_remaining_quantity_to_be_picked(self):
        self._create_pick_bins('P_1: 3xitem_1')
        workflow = PickWorkflow(items={'item_1': 5}).generate_pick_list()

        quantity = self._get_pick_quantity('P_1', workflow)

        self.assertEqual(quantity, 3)

    def test_get_pick_quantity_for_empty_bin(self):
        self._create_pick_bins('P_1: 0xitem_1')
        workflow = PickWorkflow(items={'item_1': 5}).generate_pick_list()

        quantity = self._get_pick_quantity('P_1', workflow)

        self.assertEqual(quantity, 0)

Focus on clean test code

When writing tests, it might happen that you write the same lines of code in a couple of tests. When extracting this code to a method, make sure that using the extracted method is as simple as possible.

Imagine that you want to extract the two lines calling move_pallet() and register_audit_result() from this method:

def test_move_pallet_to_dock_with_differences_found_during_audit(self):
    pallet = tdb.create_pallet(tdb.create_bulk_location())
    service.move_pallet(pallet.id, tdb.create_audit_location().id))
    service.register_audit_result(pallet.id, True)
        
    self.assertRaises(Exception, self.service.move_pallet, pallet.id, tdb.create_dock().id)

Don’t do it like this:

def test_move_pallet_to_dock_with_differences_found_during_audit(self):
    pallet = tdb.create_pallet(tdb.create_bulk_location())
    self._audit_pallet_with_differences_found(pallet.id, tdb.create_audit_location().id)

    self.assertRaises(Exception, self.service.move_pallet, pallet.id, tdb.create_dock().id)

But do it like this:

def test_move_pallet_to_dock_with_differences_found_during_audit(self):
    pallet = tdb.create_pallet(tdb.create_bulk_location())
    self._audit_pallet_with_differences_found(pallet, tdb.create_audit_location())

    self.assertRaises(Exception, self.service.move_pallet, pallet.id, tdb.create_dock().id)

Conclusions

Three techniques to generate test data have been explained:

When using these techniques focus on ease of use and readability for the tests.