I learned a lot of great things during physics grad school, but code design and testing were not among them. Fortunately, Athena, now I’m learning! And I’m sure you’re excited to be learning too.
Okay, I see from that look on your face that you’re not even sure what testing is. Fortunately, during this post, I’ll explain that, as well as:
- Why testing is useful to software engineers and data people
- What Python testing frameworks exist, and their pros and cons
- How I use testing frameworks in my code, and why I usually use pytest
So what is testing? Well, coding is a little like chasing a string toy in that you rarely get it right on the first try. And, like chasing a string toy, there are all kinds of ways for your code to go wrong.
Unlike chasing a string toy, however, when you get your code wrong it isn’t always obvious. Testing is when you run your code using specific values, to check that it handles those values correctly.
In this post, I will be focusing on automated testing, which allows you to run lots of different tests with a single command. I will be demonstrating how my testing works by running code from a recommendation system I’m writing. I’ll write another blog post about that in the future, but for now, if you’re interested, you can look at the source code on my GitHub.
So why would an aspiring data scientist/software developer like myself be interested in automated testing? Why would I write a whole bunch of tests, instead of just running the code using specific values whenever it happens to come up? Honestly, until recently, that’s what I did do, and it wasted a lot of time. Writing tests for each part of your code might take time upfront, but in the long run it saves time since you don’t have to think of a bunch of test cases and set up the code to run the test cases and then run the test cases and figure out if the code handled them correctly, every time you want to test your code. You can just sit back, type a command, and watch your computer run the tests for you. It’s even easier than hiding behind the curtain while you wait to pounce on the string!
A testing framework is a set of code that’s designed to make it easier for you to write and run tests. In Python, the primary language I use, there are many testing frameworks. I’ll focus on four of them: unittest, doctest, pytest, and hypothesis.
unittest is Python’s built-in testing framework. This means that it’s a core part of Python, so you can pretty much guarantee that it will keep being updated whenever Python changes. To write a test in unittest, you have to create a test case class, which stores all the tests you want your computer to run. This test case class inherits from the unittest test case class, which basically means it shares all the general properties of unittest’s generic test case but also has a few tests specific to your situation. unittest, while relatively convenient and definitely reliable, is not a very flexible testing framework. You have to code an entire class when maybe only a few test functions would do the trick. It’s also sometimes considered bad practice to rely too heavily on inheriting from another class, as unittest requires you to do.
(Note: There is another testing framework, nose, that is similar to unittest but has better formatting and fixes a few of unittest’s problems. However, it is no longer being actively updated and doesn’t handle Python 3 at all. So I didn’t include it in my set of frameworks to investigate. It is worth mentioning, that an update to nose, nose2, is also available.)
doctests are also built into Python. Writing a doctest for a function is as simple as changing the docstring (the part of the function that describes the function’s behavior – not to be confused with a cat toy). If you use doctests in your code, the doctests will run automatically whenever you run your code, and will produce errors whenever the code produces an output the test doesn’t expect. In this way, it requires you to keep your docstrings updated whenever the code changes, which makes your functions much easier for someone else to read! Docstrings only handle exactly the output you give to them, though – even a slight difference in output and they will throw an error. Thus, they tend to be a poor match for a calculation that might include a small difference between an expected and actual floating-point value. They also don’t tell you when a test has passed – only when it has failed.
pytest is probably the most popular testing framework in the Python community these days. It’s being actively updated, and has nice formatting and helpful error messages. You can write tests for pytest in a class (like unittest) or as individual functions, depending on what suits the needs of your code base best. Also, if your code needs to make changes in your system (e.g., reading/writing files), pytest lets you write setup and tear-down methods without much difficulty. If you want a testing framework that will be guaranteed to work on very old Python code, unittest may be the answer. But pytest has one more perk – it can run your old unittest code, with error messages that are usually more helpful (and definitely more colorful!) than unittest would provide.
hypothesis is quite different from the other three frameworks discussed so far. Rather than testing specific cases, hypothesis randomly generates a set of possible cases to test, and they pass or fail based on whether they meet certain conditions. hypothesis tends to produce edge cases – cases that you as a coder might not consider because they are atypical inputs (e.g., empty lists). It can be very helpful for catching bugs in weird cases you might not think of, but might not work as well when it needs to test how code interacts with external files or databases. You can run it with pytest, so you also get pytest’s helpful formatting and error messages, or if you prefer you could run it with most other Python testing frameworks. (hypothesis technically has its own test runners, ie, functions that run tests. But these seem a bit more unwieldy than running hypothesis via another test framework.)
So after all that testing, my favorite is…pytest! I love the clean formatting and colors in pytest’s output, and like that it is clear about which tests pass and which tests fail. It’s easy to write, has clean and expressive ways to organize your code, and has helpful error messages. That said, I’m starting to appreciate the value of using a hybrid approach, with different kinds of testing for different kinds of problems. pytest is good as a general-purpose testing framework, but for a very small piece of code with predictable outputs, doctest is the better choice. If your code doesn’t interact with the external world too much, and you are worried your test cases might not be sufficiently comprehensive, hypothesis makes an excellent choice. (And some people swear by hypothesis even in cases when you are doing lots of reading/writing to your system or an external database; see Brennan Holt Chesley’s Generative Testing talk for details.)
So Athena, I hope you’ve enjoyed this overview of Python testing frameworks, and you’ve picked up a thing or two about the benefits of testing your code! Once you get in the habit, it’s really not so much harder than chasing that string toy. And there’s a multitude of tools you can choose to pursue your testing (and string-chasing) goals.