I Explain Statistics To My Cat

How to Hire an Engineer Like Me

Posted on July 4, 2018 by eqfdatadiary

Hello Athena! It’s been an awfully long time since I’ve posted new educational content for you – sorry about that. I found it to be surprisingly difficult to keep up with writing and projects while interviewing for jobs. But – hooray! – I’ve recently been hired as a software engineer at Wise.io/General Electric to integrate machine learning models into our customers’ businesses. You won’t be hearing many specifics about those projects – GE has trade secrets and all that – but I plan to keep sharing the basics of what I’m learning along the way.

This blog post, though, is a reflection on my interview process: what happened, what I looked for in a team, what kinds of interviews discouraged me, positive interview experiences that kept me going, and what I’ve learned from the whole deal. I particularly want to emphasize how the structure of different kinds of interviews affected me as a female junior engineer. Plenty of companies are looking to hire great talent, and plenty of talented female junior engineers are looking to get hired. I hope the ideas in this post are helpful to both of those groups.

The Statistics†

Engineering interviews are pretty involved. There are usually at least four stages: a resume screen, a phone call with a recruiter or manager, a technical phone interview, and an onsite comprised of 4-6 interviews. Some companies give a take-home assignment instead of or in addition to a technical phone interview.

Many companies rejected my application outright because they wanted to hire senior engineers. That said, I found companies were MUCH more likely to be interested in my application (e.g., contact me for an interview) if I was referred by an employee or by the Recurse Center jobs team.

Type of application	# Interested	% Interested
Online	1 / 9	11.1 %
Referral	13 / 34	38.2 %
Total	14 / 43	32.6 %

There were a lot of reasons interviews didn’t work out, some of which had to do with how I stacked up to a company’s interview standards, and some of which had nothing to do with my interview performance (e.g., company had moved forward with someone else, or realized mid-process that they wanted someone more senior).

Evaluating The Interviewers

Interviews are a two-way street. I asked a lot of questions of my interviewers! This gave me an opportunity to connect with my interviewers as people (after all, they’re potential co-workers) and also to get a sense of the company itself.

For me, it is very important to work in a collaborative, growth-oriented environment, so I spent a while trying to make sure that each company I interviewed with was a good fit. When a job posting described an ideal employee as being a “rock star” or “code ninja”, or if the listing required a large number of years of experience (4+) for a job description that seemed fairly entry-level, that suggested to me that the company might be individualistic or uninterested in helping employees grow into the role.

Similarly, when I interviewed and asked about opportunities for continuing education (conferences etc.), I looked for an enthusiastic response with specific details about how the company fostered employee learning. A tepid response suggested to me that growth may not be an important part of the organization (worse was when the interviewer was confused by the very premise of a desire to learn).

Of course, these signals aren’t always reliable – there were lots of times I chose to be flexible on some of these specifics based on more reliable information such as the recommendation of a close friend. Some of my initial reactions turned out to be unfair, so try to find people who are as close to the position as possible. And during the question-asking, stay as positive as possible so you convey enthusiasm to your interviewer, even as you’re vetting what they say about their organization.

The Hard Stuff

I put a lot of time and thought into understanding the companies at which I was interviewing. Similarly, companies rely on certain signals from their candidates in order to make a hire, presumably to ensure that the candidate is a well-adjusted and technically competent worker.

In my experience, though, some interviews tended to rely on signals that may not have much to do with aptitude, and that could especially negatively affect junior or minority candidates. For instance, I sometimes encountered interviewers that judged candidates primarily on code speed. Companies that do this may reject a good candidate who is nervous, or worse, implicitly penalize clear communication. In a pretty good article about best hiring practices, Marco Rogers, engineering lead and hiring manager for Lever, describes another example of this problem:

“No matter how repeatable and standardized your interview process gets, it’s driven forward by humans — and with that comes judgment and bias…For example, some engineers discount candidates if they don’t name their variables well in a technical exercise. The truth is that’s really common. So it’s important to level: ‘Is naming variables well in code really an important thing? Yes, it is. Is it a really coachable and fixable thing? Yes that’s true. So it shouldn’t disqualify people. That’s why we do code reviews. We teach this.’”

All that to say, it’s important for interviewers to check their biases. Especially among junior candidates, unfamiliarity with certain terms, poor variable naming, or code speed may be stronger signals of confidence or experience than of coding ability.

Other companies didn’t seem to think through the interview process at all. I sometimes hear stories of people getting rejected from jobs after they failed to solve an NP-complete problem in polynomial time. While I didn’t run into that specific situation, I did run into interviewers who expected me to produce a very specific solution and penalized anything outside their expectations. This tendency served to diminish my creativity in the interview process, and on the hiring end, would make it hard to determine how a candidate thinks about a new problem.

The Good Stuff

On the other hand, I found even small changes to an interview process could make it much more friendly, and produce more reliable signals! In this section, I’ll describe some of the specific measures interviewers took that helped me present at my best.

One of the most helpful things an interviewer could do to make an interview better was to build trust at the very beginning of the interview. Interviewing is a very vulnerable experience, so interviews that directly and proactively addressed my nervousness with kind assurances were very refreshing. Examples of kind assurances:

“Feel free to take your time solving this problem.”
“This interview is mostly about understanding how you think, so don’t worry about getting every detail right.”
“I’m excited to see how you approach this.”

In some sense, it was weird that these assurances worked – no matter what the interviewer says, you’re still going to be accepted or rejected based on their evaluation of your interview performance – but they did help me feel like the interviewer saw me as a person and not just as a candidate. This made me less nervous, and more likely to code at my best.

I also appreciated companies that structured the interview process to learn something new or to let candidates choose how to show off their skills. One interviewer gave me a choice between an “easier” and a “harder” interview question, and assured me that which one I chose had no impact on his evaluation (it helped me relax so much that I ended up solving both). Another team asked me to give a technical talk in a field in which I’d excelled. A few companies carefully structured interview problems to start easy and give the me lots of room to go deep in understanding them. These interviewers tended to respond enthusiastically or compassionately when I said I didn’t know how to do something, a clear signal that the team valued learning and growth.

Finally, some companies were very explicit in emphasizing non-technical skills/attributes and emotional intelligence just as much as coding ability. One hiring manager made it clear he’d rather hire a less competent but emotionally mature person than a “brilliant asshole”. A few companies spent large parts of the interview asking me about my learning style or about something interesting I’d explored in the past month. These interview elements assured me that a company was evaluating me as a whole person rather than just as a “code ninja”.

To be clear, I did not pass all these interviews! My pass rate for interviews I liked vs. interviews I didn’t was actually about the same. But nearly all the companies with friendly interviews also seemed like places I’d be really happy to work. I don’t think it’s a coincidence that the companies putting thought into an empathetic interview process left a good impression on me 🙂

Takeaways

Interviewing wasn’t my favorite process, but it did give me lots of things to think about, either when I’m looking for a new job or when I’m the person interviewing a candidate.

For interviewees:

Acknowledge that the person interviewing you is a real human with real feelings. Connecting with them as a person will probably make them feel more comfortable, and will give you more information about whether their company is a good fit for you.
Interviewing is wild and lots of places reject you for lots of reasons. Don’t take it too personally. Improve in the areas you can, and don’t worry too much if you get rejected for thing that are unreasonable or beyond your control.

For interviewers:

Acknowledge that the person you’re interviewing is a real human with real feelings. Connecting with them as a person will probably make them feel more comfortable, and will give you more information about whether they’re a good fit for your company.
Good interviews make your company a more desirable place to work, so put thought into the process. Questions that help people learn or that give them a way to show off their skills will help you see what they’re capable of doing.

So keep this in mind, Athena, next time you’re interviewing. But given how you’re currently occupied, I suspect you won’t have to worry about this for a while.

† This section of the blog post is heavily inspired by Harold Treen’s interview reflection, and its format is used with his permission.

Bangladesh: An Interlude

Posted on February 1, 2018February 1, 2018 by eqfdatadiary

Hello Athena! You’ve probably been wondering why I haven’t posted any educational material in the last two months (and why I haven’t been in the apartment!) I was out of town to visit Dhaka, Bangladesh – two of my close friends got married there. I couldn’t possibly fit everything in a single blog post, but here’s a smattering of photos and impressions.

I spent most of my time in Dhaka City, the capital of Bangladesh. There are about 20 million people in the greater Dhaka area, making it the 14th largest urban area in the world. It’s also by many measures the most densely populated city in the world.

In many ways, Dhaka reminded me of New York City, with tall buildings and tiny shops lining the streets and smog and very assertive drivers. Dhaka has an impressive collection of universities and a stunning botanical garden. You can be stuck in hours-long traffic jams, and the streets are filled with all manner of vehicles – buses, cars, bikes, motorcycles, rickshaws, CNGs (compressed natural gas vehicles, which look like tiny open-window three-wheeled cars) – as well as pedestrians and livestock and stray dogs. As in New York, one of my proudest accomplishments while in Dhaka was figuring out the public transportation!

I found people in Dhaka City to be overwhelmingly friendly. While Bangla† is most commonly spoken, English is one of the national languages so people typically study it in school. But it’s fairly rare to see someone who speaks English as a first language. I sometimes found myself surrounded by groups of friendly people, wanting to take selfies or practice English with me or help me get wherever I was going.

Most of Bangladesh is near sea level, so water travel is an important feature of the Bangladeshi economy. For this reason, the port of Sadarghat, where Dhaka City meets the Buriganga River, gets a lot of traffic.

My friend’s family brought me and a few other wedding guests to see Rajbary, a town a half day’s drive outside Dhaka. In that area, we saw the home of Nobel prize-winning poet Rabindranath Tagore and Bengali folk singer/philosopher Lalon. Unfortunately, I didn’t bring my phone for those, but pictures wouldn’t have done justice to Tagore’s art and poetry or Lalon’s music anyway.

We spent a morning visiting Sonargaon, capital of the Bengal empire during the Moghul period (13th-17th centuries). It’s about an hour’s drive southeast of Dhaka City.

Near Sonargaon is Panem City, a hub of trade during the Moghul period. Visitors are free to walk through the fifty-two buildings. A kind archaeologist showed us around, and it was incredibly cool to stand in the unrestored ballrooms and bedrooms of a city hundreds of years old. There weren’t many informative signs, and there were a bunch of goats wandering on the periphery. This lack of curation, the past co-mingling with the present, made it one of the more breathtaking encounters with history I’ve ever had.

Stained glass in a 16th century Moghul building in Panem City.

Later in the week I visited Lalbagh Fort, a Moghul military stronghold and mausoleum, now located in the southwest part of Dhaka. It’s one of the quieter places I found in Dhaka City and appears to be a prime spot to take a date 🙂

Money goes farther in Bangladesh than it does anywhere else I’ve traveled. Near Lalbagh Fort is a wonderful restaurant where I ordered a hearty and delicious vegetable breakfast for 23 Taka (the US equivalent of $0.28). The restaurant staff were super-accommodating, despite the complications presented by my limited Bangla vocabulary.

Bangladesh is majority-Muslim, but in southeast Dhaka you can find the Armenian Church of the Holy Resurrection. During the 1700s it was built to benefit the collection of Armenian Christian merchants and traders living in Dhaka City, and after the British colonized Bangladesh they used the church for their own religious services.

I spent one afternoon with a university student I met on the bus – he was worried I was going the wrong way and decided to spend his day off exploring the city with me. He and I swapped details about the governmental structures of our respective countries. Bangladesh is a parliamentary democracy, so representation in the government is exclusively related to which political party gets the most votes. He expressed concern that this can sometimes squelch the voices of political minorities.

The university student and I went to the stunningly beautiful Tara Masjid (Star Mosque), but at the time I visited there were prayer services happening and I didn’t want to disrespect the folks who were there to worship by taking a picture. Instead I’ll shamelessly include a photo from the Internet.

I saw many more amazing sights and met lots of awesome people – if you’re curious, you should definitely ask me for more stories! I’m excited to travel to Bangladesh again some day – hopefully after learning some more Bangla – but for now, Athena, I’m happy to be home with you.

† Bangla is the sixth most commonly spoken language in the world, the first language of Bangladesh and of several districts in eastern India. It has a rich literary tradition and over forty distinct letters (I think? Nobody could really tell me how many letters are in the Bengali alphabet since several of them are deprecated.) I had a lot of fun learning words and phrases in Bangla – my friend had tried to teach me some in the US, but I found it a lot easier to learn Bangla after spending time in a context where people regularly speak it.

Blameless Postmortem 1: The Dataviz Disappointment

Posted on November 27, 2017November 29, 2017 by eqfdatadiary

Athena, have you ever tried to hunt a feather toy, and completely missed it? Maybe you spent a half hour chasing after a toy that you never successfully caught? The experience of failure, in toy-hunting and in software development, can be immensely frustrating, but it’s difficult to learn from these experiences unless you set aside your feelings and consider what went wrong.

That’s the idea behind a blameless postmortem. A blameless postmortem is a document that a person or team writes in response to a project that just didn’t work. It highlights the things that went wrong, with the intent of learning from those experiences rather than of passing judgment.

Recently I had an experience in which I tried to build a data visualization using a new library, an experience which ended in a fair bit of frustration. So here’s a blameless postmortem describing that experience and what I’m learning from it. For context, I am pulling the format of the postmortem from Dan Puttick’s excellent blog post.

Background

I am currently applying to a company (let’s call it TeachCo.) that values education. I wanted to highlight my education experience in my application, so at the suggestion of Jared Garst, I decided to build an interactive data visualization of my teaching experience. I wanted a visualization that could convey how long each teaching experience lasted, the age(s) of the students I taught, and the subject I was teaching. It would also be nice to include information about how many students I was teaching during each experience, and, if relevant, a brief description of the technologies I used. Clearly, this is a lot to include in a visualization.

I’ve had a reasonable background creating data visualizations, but most of them have been within the context of scientific analyses and none of them involved interactive displays. Much of my visualization work involved the robust but sometimes-painful library matplotlib, or for astronomy-related figures, the robust and less-painful library astropy.visualization. The visualization I wanted to present to TeachCo. was fairly different from anything I’ve done before – it involved properly displaying both categorical and numerical data, and it needed to be an exploratory visualization rather than displaying plots for scientific explanation.

The Incident

I decided to create a visualization that put dates on the x-axis (allowing me to demonstrate how long I had been teaching) and ages of students on the y-axis, with different colors representing different subjects I was teaching. I included a projection of the x-axis to highlight that I have been continuously teaching since high school, even if specific jobs only occurred on a short-term basis. I knew that seaborn (a wrapper for matplotlib) included a straightforward and nice-looking projection plot object with a fairly simple API. Using seaborn, I created the first draft of my plot.

I next tried to introduce interactivity using mpld3, a library that adds interactive widgets to matplotlib objects. mpld3 works by converting the underlying matplotlib code into d3, a JavaScript library considered to be the gold standard for browser-based data visualizations. mpld3 had exactly what I needed – when a user hovers over an area of the visualization, mpld3 is capable of showing text or of highlighting related data on the visualization. However, as I discovered after searching StackOverflow and the mpld3 issue tracker, mpld3 does not support axis customization. I couldn’t create axis labels and it rendered years as floats, making them much more difficult for a user to interpret.

Aftermath and Response

By the time it became clear that mpld3 would not render my axes correctly, I had spent about four hours working on my visualization. I responded to the problem by spending another two hours trying over and over to make the plot work with some combination of mpld3/matplotlib/seaborn, libraries that were clearly insufficient for the task I was trying to accomplish. I wanted to submit my application to TeachCo. that day, but I would need to use a different library if I wanted to include the visualization in my application. I decided not to give in to sunk cost fallacy and simply submitted the application without the visualization.

Ultimate Causes

There were two root causes to this problem: my choice of data visualization libraries and my approach to them.

I chose the combination of mpld3/matplotlib/seaborn because these libraries seemed most familiar to me. I had worked with matplotlib/seaborn before, and mpld3 appeared to modify matplotlib in a fairly straightforward manner. I wanted to submit my data visualization as part of a job application, and I didn’t want to spend too long working on that particular application. However, these libraries did not play well with each other, and did not solve the problems they needed to in order to create a satisfactory visualization. Moreover, mpld3 is not very robust – indeed, the primary developers have abandoned this project in favor of contributing to another data visualization library. Even after this became evident, I kept trying to fix the problem with the incorrect tools rather than using different tools.

The other root cause was that I prolonged a negative attitude toward data visualization. One of the things I enjoy about coding is that if I encounter a problem, I know there is a specific reason why that problem exists, even if I don’t understand the reason. The more I learn about my code or about the library I’m using, the more likely I will be able to solve the problem. But sometimes there is a trade-off between understanding a library and getting a job done quickly, and I notice that trade-off more prominently when using data visualization libraries. Past problems I’ve solved haven’t required much understanding of my visualization libraries and have usually occurred during a time crunch (e.g., I’m trying to finish a paper to submit to an academic journal). So the commands/objects I use to create visualizations feel like black boxes. I don’t understand them very well, and they usually frustrate me. Rather than rethinking that approach to data visualization libraries, I kept being frustrated that the libraries didn’t work the way I wanted them to.

Analysis and Prevention

The most immediate solution to this sort of problem is to use a different library for interactive exploratory data visualizations. Several Python libraries are specifically designed for this sort of problem, including Bokeh, Dash, and altair. I could even learn some JavaScript and use d3 (or its slightly friendlier cousins, Vega/Vega-Lite) to avoid the problems of a visualization library that may be unfinished or half-baked. Clearly, using the right tools is a more important part of the process than I had previously thought.

One of the lessons I’ve learned as part of this process is that I need to be more careful about making sure that any new library I include in my workflow is robust and well-maintained. I chose mpld3 because it was closest to the library with which I was most familiar, and I didn’t ask the right questions before using it in my code. If I had read the mpld3 issue tracker or documentation in more detail before using the library, it would have been fairly easy to figure out that it was the wrong tool for my project.

A larger problem was that I wasted several hours trying to make my visualization work even after I knew the tool wasn’t right. I didn’t use another library because I assumed a defeatist attitude (e.g., “it is impossible to understand the internal logic of data visualization libraries”), rather than approaching the code with the learning-oriented mindset I apply to other programming tasks.

This experience has convinced me that I need to spend some time truly understanding my data visualization libraries. I find it much easier to learn about my tools when I am trying to solve a specific problem, so I am planning to complete this data visualization task using another library some time in the next few weeks. Stay tuned, Athena!

Recurse Center: The Laundry List

Posted on November 15, 2017November 16, 2017 by eqfdatadiary

If you’ve been wondering what I’ve been up to at the Recurse Center, here’s a list of my weekly updates! I wrote each update immediately after the week ended, so my progress was fresh on my mind. Lightly edited from the version posted on the Recurse Center messaging interface.

tl;dr

Pair-programmed with 20 people
Wrote 20 modules and 12 additional test modules
Contributed to 8 repositories
Visited 8 churches, 10 parks/memorials, and 5 museums
Wrote 7 blog posts
Watched 17 PyCon talks
Gave 3 “lightning presentations” (5 min)
Posted 127 tweets
Finished reading 5 books
Started reading another 5 books
Made 3 batches of ice cream

Week 1 (Aug 14-20)

Built a (super duper) toy recommender system. It doesn’t work yet. But it’s getting there 😀
Made my first HTTP request and started to read about data APIs
Looked into the data API for Charity Navigator, which (I think?) will be a good data set for my project
Watched some Dave Beazley and Raymond Hettinger Python talks
Set up a statistics/ML study group with Karen Ellenberger and Jayant Jain
Talked about testing with Nathan Weeks Shane and Kate Ray
Paired on a computer vision project with Wesley Aptekar-Cassels
Ate delicious salad with Kate Murphy and Kate Ray
Threw off my sleep schedule by staying up way too late reading a book
Found a church I really like (yay!)
Planned out a few projects/goals (in no particular order)
- Building the charity recommender system
- Completing the Mode Analytics SQL tutorials
- Learning about Python generators
- Reviewing basic statistics and ML concepts
- Building Verbal Infusions, a TwitterBot that generates herbal tea copy
- Reading/contributing to Python’s scikit-learn and scikit-image libraries
- Pair with at least one person every day

Week 2 (Aug 21-27)

Watched James Powell’s PyData talk about generators. SO GOOD. It made me think about how to structure code for reproducibility.
Got my API code up and running, and now I can make HTTP requests to the Charity Navigator database.
Refactored my API code so that it uses generators to handle pagination in the CN data API.
Reminded my very rusty brain how classes work in Python.
Refactored my recommender system code so that it’s better-structured for reproducibility.
Started studying up on divide-and-conquer algorithms (yay mergesort)
Started studying intermediate/advanced Python concepts with Brennan Holt Chesley, Dan Luu, and Michael Noronha
Paired on a Pandas project with Kate Ray
Talked with Jake Hickey about image processing
Attended the ACT-W conference in Boston and got way too excited about writing a stock market predictor 😀 😀 😀

Week 3 (Aug 28-Sep 3)

Watched a bunch of talks
- Andrew Knight’s talk on testing in Python was a super-helpful overview of testing frameworks in Python and how people think about testing
- Dan Crosta’s talk on testing helped me make sense of some of the contradictory advice about testing I’ve encountered
- Colton Myers’s talk on decorators was helpful in starting to learn…what decorators are
Built some (very tiny) test suites for my recommender system in unittest, doctest, pytest, and hypothesis
- Paired with Wesley Aptekar-Cassels to build the unittest and also learned random facts about Python classes
- Paired with Elias Dorneles and Anja Boskovic to build the hypothesis framework and also learned random facts about vim
Continued the intermediate/advanced Python study group, learned about Python’s package structure
Paired on a Pandas problem with Parker Higgins
Continued stats study with Jayant Jain
Paired with Jake Hickey on diffusion algorithms in image analysis and learned SO MUCH about vim, C++, and system memory
Went to the generative testing talk given by Brennan Holt Chesley
Did a bunch of algorithms study and tried to figure out how quicksort works

Week 4 (Sep 4-10)

Figured out how quicksort works! And learned about a related randomized selection algorithm.
Blogged about my testing experience, with many cat pictures.
Continued the intermediate/advanced Python study group, learned about profilers and a little about parsing/context-free grammars
Continued stats study with Jayant Jain, paired on building probability distribution simulators
Paired with Jake Hickey on diffusion algorithms in image analysis, talked about floating-point arithmetic
Bought an emotional support plant 😀

Week 5 (Sep 11-17)

Read up on a lot about statistics (regression, classification) and floating-point arithmetic.
Spent a bunch of time looking over the recommender system, and blogged about it (yay cat pictures!)
Built some more probability distributions with Jayant Jain
- We modeled a bunch of major discrete probability distributions
- We discussed how continuous distributions are built in scipy.stats and came up with real-world continuous distribution examples
Presented about generators at the Thursday evening activities
Did practice interviewing with Kenneth Alexander Durril and Abraham Hmiel
Weekend shenanigans! Commentated Movie Night, walking in Prospect Park with Allie Crevier, made ice cream with Brennan Holt Chesley and then ate it at Sundae, Sundae, Sunday!

Week 6 (Sep 18-24)

I made a WHOLE BUNCH OF PROGRESS on the herbal tea parody Twitter Bot!
- Built a recursive breadth-first web crawler/scraper to obtain a bunch of herbal tea descriptions. (Parker Higgins helped a bunch.)
- Cleaned the HTML data with Beautiful Soup and Kenneth Alexander Durril, creating a text corpus for the Twitter Bot.
- Refactored my code a bunch so now it’s pretty and readable.
- Tried out a profiler and found out that my web scraper really needs some concurrency because wow network requests take a lot of time.
Speaking of refactoring, I watched two great PyCon videos about refactoring!
- Brett Slatkin’s video about when to refactor for readability
- Jack Diederich’s video asking people to please stop writing unhelpful classes
Wrote a blog post about HTTP requests.
Learned about some basics of linked lists during interview practice with Christian Ternus and Brennan Holt Chesley (among others)
Had a meeting with Nancy Thomas about my jobs profile and setting up job interviews.
Visited Coney Island with Kimberly Michelle McCarty and Justyna (JJ) Janczyszyn
Visited the Cloisters with Natalia Rodriguez and ate some DELICIOUS Venezuelan sandwiches!

Week 7 (Sep 25-Oct 1)

Finished the herbal tea parody Twitter Bot, at least enough that I’m ready to put it down and prioritize other projects.
- Read through the tweepy source code and wrote the Twitter Bot infrastructure.
- Wrote the code to make the Markov-generated tweets, and then refactored and improved everything.
- Set up a cron job so the Twitter Bot posts new herbal tea description parodies daily.
Learned that debuggers are the greatest – I wrote a blog post about that and gave a presentation.
Paired with Jake Hickey to implement Cascading Fast Explicit Diffusion
- And learned a ton about writing for the GPU in the process!
- Also I wrote more or less my first C++ code and I think I’m in love. SO MANY DETAILS. SO PRECISE.
Spent a day practicing Git basics with https://learngitbranching.js.org/, I highly recommend.
Visited a friend in Philly and saw lots of historical things. Fun fact: did you know that one dude once held 95% of the US government’s loans to wage the War of 1812? https://en.wikipedia.org/wiki/Stephen_Girard

Week 8 (Oct 2-8)

Fixed my algorithms code, which was immensely satisfying.
Paired with Omar Bohsali to read an articles about asyncio, and play around with concurrency ideas a bit.
Read a bunch about tail call optimization, probability, coroutines, and asyncio.
Refactored my web scraping/crawling code to make it more generalizable, and then started making it asynchronous.
Did more thorough study of graphs, including BFS and DFS.
Wrote most of a blog post about generators.
Worked with Abraham Hmiel to read through referee comments on an astrophysics paper I’m trying to publish.
Had a much-needed visit to MA, during which I got to hang out with family and go apple-picking and watch the cranberry harvest (yay) and re-injured my wrist (boo).

Week 9 (Oct 9-15)

Watched some helpful PyCon videos:
- Dave Beazley on Python concurrency
- Dave Beazley on understanding the Python GIL
- Miguel Grinbert on async Python (SO HELPFUL)
- Yury Selivanov on async/await and asyncio
Wrote my asynchronous web scraper, and it appears to work! Paired with Kimberly Michelle McCarty, Laura White-Avian, and Anja Boskovic in the process 🙂
Practiced my vim skillz with Anja Boskovic
Worked some practice interview questions.
Practiced BFS and DFS
Went to the resume-writing/code review workshops with Emil Sit and got some very helpful advice on both.
Talked to folks at the job fair.

Week 10 (Oct 16-22)

Refactored and tested the asynchronous web scraper – it works now! Glory be!
Learned about a bunch of features of pytest (fixtures, mocks/patching, pytest.raises, parametrization, why you want to use CI). Turns out I really like testing, a lot.
Finished some practice interview questions, which gave me a much better sense of what data science interviews are like and which Pandas commands to have at my fingertips.
Updated my resume and jobs profile, and talked to lots of people at the RC jobs fair.
Compiled a list of places I might want to apply for jobs, and did a practice interview (during which I learned what a trie is! 😀 😀 😀 )
Had a really nice weekend showing my mom around New York City and playing games with Brian Glusman et al.
Agreed to an ill-advised incentive scheme for applying to jobs.

Week 11 (Oct 23-29)

Started scraping data for the charity recommender project, now I have an ugly mess of HTML to go through. (Yay! 😀 )
Paired with Julian Squires on fixie tries.
Finished my blog post about generators and a blog post about how I respond when I don’t want to code.
Worked more on practice-interviewing and algorithms (BFS/DFS and Dijkstra’s algorithm).
Paired with Jake Hickey on…mostly just installing stuff for the CFED project. And ate yummy food and talked about block chains (Kimberly Michelle McCarty) and Karatsuba multiplication (Kadeem Nibbs).
Finished the paper comment review I started with Abraham Hmiel.
Followed up with folks I met at the job fair, as well as a few connections in San Francisco.
Cleaned up my resume/LinkedIn/GitHub/Recurse Center jobs profile, so I’m all ready for job applications.

Week 12 (Oct 30-Nov 5)

Did some significant HTML parsing for the Charity Navigator project with Elly Kuhlman.
Paired on building tries with Julian Squires and Kadeem Nibbs.
Started to write a blog post listing everything I’ve accomplished at RC.
Paired with Jake Hickey on image compression.
Made and ate ice cream (White Russian flavored! Alcoholic and yummy!)
Nevergraduated!

Generators: A Socratic Inquiry

Posted on October 25, 2017October 25, 2017 by eqfdatadiary

In this post, Athena, I am going to try to work through some questions I’ve had about generators. I assume you’ve had these questions about generators, too, because you are a cat and it’s my impression that cats spend all day thinking about generators. I can’t imagine what else you would do with all that time you spend lounging under the bed.

Since you’ve spent so much time contemplating generators, I’ll let you ask the questions. Or at least, I assume these are the questions you would be asking, if I could just understand your meowing.

What’s a generator?

Suppose you wanted to write a function that calculates some value and then gives you that value. In Python, the keyword you would typically use at the end of that function is return: the function calculates the desired value and returns it to the user. After the value is returned, the function ends.

But what if instead of ending the function, we simply wanted to pause it and leave open the possibility of asking the function to give us a different value in the future? This is the premise of the yield keyword. When you yield a value, the function does not end, it simply pauses to wait for your next instruction. If you call a function with a yield keyword, it will continue executing until the next yield keyword, at which point it pauses again.

Essentially, a generator is any function that produces a value (or values) via the yield keyword rather than by returning the value (and thus ending the function). In Python, when you call a function containing the yield keyword, the function returns not a value but rather a generator object that can produce your desired value(s) when instructed to do so.

What are the benefits of generators?

In the canonical example of a generator, we could produce the first ten values of the Fibonacci sequence. We create a generator object fib(), and then use the islice() command to tell fib() to give us the first ten Fibonacci numbers.

fib_generators — I first came across this code in James Powell’s excellent PyData 2014 talk “Generators Will Free Your Mind”, which is available on YouTube. The IPython notebooks he used for the talk are available at http://gist.github.com/dutc.

The generator yields the value of the first stored Fibonacci number, and then when it is called again, it increments the values of both stored Fibonacci numbers. Note that after a Fibonacci value is yielded, the generator does not store or keep track of it. It simply produces the values and moves on. This is a HUGE benefit – generators are more memory-efficient than loops because they only produce the values the user wants to produce. For instance, if we only wanted the the twentieth Fibonacci number, the generator wouldn’t store any of the previous Fibonacci numbers, saving a great deal of memory.

Significantly, generators make no assumptions about which values the user would like to receive, but rather give users agency to specify them. James Powell explains the benefits of this setup far better than I could, in his excellent PyData 2014 talk “Generators Will Free Your Mind“.

Why would you want to use generators?

Generators are especially useful when making calls that require us to process a lot of data (but not necessarily store it in local memory). Databases and HTTP requests are probably the two most common such examples. Let’s give an example with HTTP requests, since I’ve discussed that concept in a previous post.

This is a generator that gives the user data from an HTTP request. Each time this generator is called, it makes a server request, and yields data to the user if the request is successful. If called again, it will update the server request parameters (e.g., it will ask the server for a different page of data) and repeat the process. It seems reasonable that we, the user, might not want to request every page available on this server. A generator allows us to specify which pages we want, and what we want to do with each page, rather than storing every page we encounter.

But isn’t that just a glorified for loop?

It kind of looks that way, and it’s often used in conjunction with a for loop, but no. To understand this we have to look at how Python makes function calls, which requires us to understand a computer’s memory.

A computer has two ways to store information: the stack and the heap. The stack follows a first-in-first-out protocol, which means that it acts on the last piece of information it stored. The heap, on the other hand, stores information wherever it happens to have space, and uses that information only when the need arises. This is very abstract, so let’s use an example.

stackheap — Remembering your favorite hiding spots, using a stack vs. a heap.†

Suppose you wanted to store information about your favorite hiding spots, so that you could visit those hiding spots later. You first notice you like hiding under the bed, then that you like hiding behind a chair, and finally that you like hiding in the closet. Later, when scared, you would go hide in the closet, because that is the last hiding spot you remember – this is like the stack. Alternatively, you could just pick the hiding spot that happened to be closest – this is like the heap.†

So how is that related to generators?

I’m glad you asked! Typically, the computer stores function calls in the stack, since it’s important to know the order in which your functions should execute. This is a useful system – when the computer makes a function call, it assigns that function call to a spot in the stack in order to execute it later. But the stack frame – the object that keeps track of the instructions your function gave to the computer – is stored in the heap, to be used only when called upon. Normally when a function gets called, the stack frame gets executed, and then it is removed from the heap because its data have been used already.

For generators, the stack frame is stored but not executed. When you call next() on a generator, the function calls are made in the order specified by the stack frame and the generator yields a result, but the stack frame is not deleted. This means you don’t need to evaluate generators in any particular order. You just call on the generator stack frame whenever you need it.†† Clearly, generators are executed in a different way than functions – and that’s pretty useful.

So, is it true that generators are the greatest?

Yes, Athena. Yes they are.

† This is not a perfect analogy, because the stack and the heap correspond to physical processes in the computer’s memory, and the heap is not random but rather has to do with where there is space in memory for storing an object of a given size. That said, this analogy will give enough of an idea of what’s happening to make sense of generators.

†† If you want more detailed information, check out this blog post about using generators to write an asynchronous web scraper.

How I Navigate Joyless Coding

Posted on October 24, 2017October 25, 2017 by eqfdatadiary

For most of my past ten weeks at the Recurse Center, I’ve been having a lot of fun exploring new ideas and projects, and eliminating a few bad coding habits I’d picked up during my time in academia. This time has been productive and enriching – I will be walking away from this program a much stronger developer than when I arrived. But the last week has been challenging for me. As I’m preparing myself to enter the job market, I’ve been approaching my code as “a fatiguing variety of things which [I] can barely keep together”† rather than as a joyful exercise in growing as a programmer. Here’s how I respond when I’m feeling frustrated or uninspired.

Pause. The first thing I do is check in with my body. Sometimes the solution is as simple as noticing I’m thirsty and taking a drink of water.

Take a walk. I’m not a programming machine, I’m a person who programs. Taking a walk can be a really useful way to remind myself that I don’t need to solve every problem immediately. Often, giving my brain a break helps it to relax enough to take a new approach to my project after I get back. Usually I go get bubble tea during my walks, because the shop is close and there’s nothing like giving myself something tasty to remind myself that programming is pretty sweet, too.

Refocus on learning. I tend to regain my inspiration when I’m learning something, and every challenge in my code is an indication that I have something to learn. Orienting myself toward learning can take a lot of different forms; some of my favorites include watching PyCon videos and writing down questions I have about my code. For instance, one of the problems I’m currently solving is how to clean a somewhat messy bit of JSON I’ve scraped from a website. As I refocused on learning, I realized that I’d never taken the time to properly figure out how JSON is formatted or how Python’s tools for parsing JSON work. Defining that set of questions gave me a concrete, learning-focused mindset with which to start tackling my problem again.

Give myself space to make mistakes. A surefire sign that I’m about to feel unmotivated to code is when I’ve stopped giving myself permission to write bad code. It’s really hard to write clean, organized, and well-designed code on the first try – that’s why refactoring and testing are such important parts of the process. And when I’m applying pressure on myself to write good code immediately, I’m much less likely to write anything at all. So when I’m feeling really stuck, it’s important for me to take whatever time I need to remind myself that I don’t have to be perfect, I don’t have to know everything already, it’s okay if I make a mistake. Mistakes are inevitable if I’m taking the risks necessary to become a better developer. Pointing that out to myself goes a long way toward re-engaging myself with my projects.

Talk with someone else. Often talking out a problem will bring it into focus. The Recurse Center places a lot of emphasis on pair programming for this reason. Communicating what’s going on, and asking for help when necessary, can turn a significant and deflating roadblock into a much more manageable challenge. And when I’m just feeling uninspired, talking about a problem with someone else can remind me of why I got excited about a project in the first place.

Remind myself of my goals. At the beginning of every new season of my life (a project, a job, etc.), I’ve developed a ritual in which I write down personal and professional goals for what I’d like to learn and do during that period of time. Some of these goals are very specific (e.g., “build a Twitter Bot that parodies herbal tea descriptions”) and some of them are much more ill-defined (e.g., “learn to take my emotions in stride without ignoring them”). When I’m feeling frustrated and other methods of engaging with a project aren’t working, it can be very helpful to return to my goals as a reminder of where I want to be growing during this project. That can help reformulate my questions about the project or even just remind me there was a reason why I started doing it.

Keep myself accountable. A occasional sense of ennui/laziness is pretty normal, even for projects I find super-exciting. One of my favorite ways to counteract my own inertia is to document my learning publicly. I’ve made a promise to myself that at the end of every day of coding, I must post something I’ve learned to Twitter. Regardless of how I’m feeling, I have a record of my own progress, and a way of ensuring that I don’t spend even a single day slipping into stagnation.

Just do it. Sometimes the best way to get back into the flow of programming is to open my IDE and start programming. I might try to take my coding five minutes at a time – usually the first five minute chunk is enough to push my brain back into my project. I have a playlist of fun dance music I listen to during these times too! There’s nothing like dancing in my chair to bring me back to the joy I experience while coding.

† Henri Nouwen, “Making All Things New: An Invitation to the Spiritual Life“

A Love Song to My Debugger

Posted on September 28, 2017September 28, 2017 by eqfdatadiary

I’m almost done with @VerbalInfusions, a project that generates parody herbal tea descriptions and posts them on Twitter. And today I’m one step closer to finishing the project than I was yesterday. Let me tell you a tale of how in ten minutes, one debugger turned discouragement into delight.

I built Verbal Infusions with a cool library called tweepy. Remember how we discussed server requests in the last post? You can think of an API as a set of commands that create server requests to communicate with a website. Websites like Twitter often have their own custom APIs. So when developers like me want to build projects like Verbal Infusions, we have to deal with Twitter’s API rather than our programming language of choice (e.g., Python). Unless, of course, a developer takes it upon themselves to write a library to translate the API into Python, which is exactly what tweepy does.

When I first built the project, it seemed to be working perfectly! Except for one problem: when I tried to publish a status to Twitter, there was an authentication error.

When an error like this arises, and I don’t immediately recognize what to do, my first thought is usually to Google the error to see if someone else has solved a similar problem. So I did. I found some solutions, but none of them seemed totally relevant to my situation.

At that point, I had a few things to consider:

There could be a problem with my code.
There could be a problem with the library I’m using.

Some of the posts online suggested that this error arose when Twitter changed their version of the API, so I decided to check to make sure that tweepy was up-to-date. It was.

The next step: check to see if I’d installed the most recent version. I had.

At that point, I needed to start delving into the source code to figure out exactly where the authentication went wrong. Usually when I’m trying to find a bug, I put a lot of print statements in my code to figure out what’s going on. This process is called logging. And it had worked fairly well for me up until that point. I spent some time looking over it, which was helpful for understanding how the wrapper worked, but still didn’t give me a great idea of where the problem might lie. Clearly logging was not going to help with this problem that might or might not be related to an outside library.

To be honest, Athena, by then it was getting late and I was frustrated with the whole thing. So I stopped working for the night and watched Gilmore Girls.

The next morning, the bug remained, and I still didn’t know where the authentication was going wrong, so I tried something I’d never done before – I installed a debugger. My debugger of choice was ipdb, which is described really nicely in this blog post. To use this debugger, you include a line in your code that sets a trace. When the code gets to the trace, it stops running and waits for you to give a command.

You can give commands to print all variables the code sees at that point in the execution (pp locals()). You can give a command to execute the code one line at a time (n). You can type any object name and see every attribute that object has. It’s amazing.

With just a little bit of digging, I found that my problem was really straightforward to fix! I’d just mixed up my API key and API secret (analogous to a username and a password) when I was reading in the file that contained them. One quick fix later, and I had a functional Twitter Bot!

Moral of the story: WHY HAVE I NEVER USED A DEBUGGER BEFORE. This was a life-changing experience. If all my bugs get solved at this rate, this one tool is going to make me a 20x more productive programmer, and a 100000000000x less frustrated one! My advice to you, Athena: use a debugger in your code. It makes everything better.

Server Requests

Posted on September 20, 2017 by eqfdatadiary

Last post, Athena, we talked about building a recommender system. I used a very small database for that recommender system in my example, because I was mostly trying to test to see that the recommender system was working. But where do you go to obtain real-life data? If that’s what you’re looking for, read on to find out about:

Possible sources of interesting data sets
How to access servers that have a public database
A short introduction to using Python’s requests library

If I know anything about you, Athena, I know that you are obsessed with head scratches. As soon as an amiable human arrives in the house, you are waiting in expectation for some good head scratching, the more the better. Likewise, for the typical data scientist, more data are usually better.†

Data scientists may find their data sets from a variety of different sources. For one, we may have collected the data ourselves! This is convenient because we know exactly what sorts of assumptions were made during the data collection, and we know exactly how the data are organized. (We’ll talk more about data organization and cleaning in a future post, although that’s a topic worthy of books.) Or we may have employers or co-workers who have collected the data for us, and it’s our job to clean up and analyze the data.

But what if we don’t own a data set, or otherwise have immediate access to it? If the data set is online, there are two primary ways to obtain it: through web scraping and through requesting the data from a server. Web scraping involves reading the data directly from a web page; we will save that topic for another post. Today we will focus on requesting data from a cooperative server using HTTP requests.

First let’s define some terms. HTTP stands for “HyperText Transfer Protocol”. This is the language of servers and websites talking to each other. An HTTP request is another way of saying “I’m asking your server for information”. Let’s go back to our analogy of head-scratching. Suppose you have had a long day in the house, and then suddenly, one of your humans comes home! You want to have your head scratched, so you might make a request to that human by walking up and meowing at them. But you don’t yet know how they will respond to your request, or whether they understood your request, or even if they heard it. That is like making an HTTP request – you have to wait for the server to respond before you know whether your request was understood.

The response is called the status code. A status code might let you know that your request was successful, that your request was denied, or that your request format is broken, among a number of possible messages. Here are some ways I’ve seen a server respond to an HTTP request:

Status code 200 OK: your server request was successful, and the server will respond by giving you data! This is like if the human you meowed at turns around and starts immediately scratching you.
Status code 400 Bad Request: your request was formed improperly and the server didn’t understand what you meant. If you meow at a human and they act confused, that is like a status code 400. You might need to form a different request in a form the server can understand, analogous to meowing and rolling over to show off your belly.
Status code 401 Unauthorized: the server understood your request, but you don’t have permission to access those data. Fortunately, you’ve never encountered the analogous head-scratching situation; you always have permission to ask for head-scratches from me 🙂
Status code 403 Forbidden: the server understood your request, but those data are forbidden from being accessed. This is like you meowing at me while I am on a video call. I understand that you want me to scratch you, but I won’t do it because my hands are 3000 miles away.
Status code 404 Not Found: the server understood your request, but still couldn’t find your data. This is like if the human you meowed at turns around to acknowledge you, but their hands are full and they’re not going to scratch you right now. But if they update the server later (or put down their groceries), those data (scratches) might become available.

So now that you understand the procedure for talking to a server, how might we actually write code to do so? In Python, the requests library is a great place to start. We might use code like this to make an HTTP request.

Notice how simple it is! All we have to do is give this function a URL and certain parameters (like our authentication keys and what database page we want). The requests.get() method returns a response object that has a status code (like we discussed above) and, if successful, also has some data to analyze!††

It’s so great to be using a library that makes HTTP requests just as easily as you make head-scratching requests. Without too much trouble, I can write a function that gets me data from an external server – and then I can sit back and relax.

† To be clear, this is NOT universally true! Sometimes people decide to blindly collect data when they should be reconsidering the methods they’re using to analyze those data. No amount of data collection will help them if those data are biased or their data analysis methods are somehow misguided. In analogy to scratching: if someone is trying to scratch your ears when really you want your whole head scratched, more scratches are not better – they’re just not doing it right.

†† Note that in my test_http_request() function, I am calling a function complete_http_request_generators() rather than the function http_request() I showed in the blog. The complete_http_request_generators() function behaves similarly to the http_request() function, except that it correctly handles calling multiple pages of the database. If you are interested in looking at the entire code, check it out on my GitHub.

Charity Recommender 1.0

Posted on September 15, 2017September 15, 2017 by eqfdatadiary

For the past few weeks at the Recurse Center, I’ve been building a charity recommender system to help me understand testing, HTTP requests, and various other useful programming concepts. I’ve already discussed the Python testing frameworks I’ve explored in another blog post, but I promised to provide an explanation of the project itself. So, Athena, here goes! In this blog post I’ll discuss:

What a recommender system is and what types of recommender systems exist
A brief overview of how content-based recommender systems work
How I implemented my toy content-based recommender system

We have to start with…what is a recommender system? My recommender system was based on recommending charities to people interested in donating, but the ideas and code apply equally well to, say, recommending cat food.† Suppose I knew a certain cat liked chicken-based cat food, and didn’t like pâté-style cat food. Suppose further that I had to choose between dozens of brands of cat food, each of which has dozens of flavors and textures, in order to find the best cat food for this cat. How would I ever find the right cat food?

I could just randomly choose a brand and flavor of cat food, but that doesn’t account very well for the information I already have about what the cat likes and doesn’t like. A recommender system is a program that keeps track of the cat’s likes and dislikes, and recommends cat food brands and flavors based on those preferences.

There are two major kinds of recommender systems: content-based recommender systems and collaborative filtering recommender systems. A content-based recommender system would only recommend cat food based on what qualities the cat liked and didn’t like in a cat food. A collaborative filtering recommender system would recommend cat food based on what cat foods were popular among other cats with similar cat food preferences. Content-based recommender systems tend to be ideal when you don’t have data about the preferences of a user base. Collaborative filtering recommender systems incorporate an item’s popularity into their final recommendations, so they really require a lot of users in order to be effective.

I implemented a content-based recommender system, so let’s talk about how that works. Let’s use this tiny example database of cat food.

One way to analyze a database like this is to create a “feature vector” for every type of cat food. (If you’re not familiar with vectors, here’s a quick refresher.) This vector would be composed of numerical quantities describing each category the cat food could fall into. To turn our categories of protein into numerical quantities, for instance, we might create four separate vectors to describe each of the types of protein used in cat food. We would have a vector describing how “chicken-y” the cat food is, where the “Chicken” feature would equal 1 if there were chicken in the cat food and 0 if there weren’t. We would repeat this for each of the possible protein options, and for each of our other categories, producing a database as shown.

Now that we have a nice way to describe our data, we would also need to create a user profile in the same format. Remember how the cat in our example likes chicken, and does not like a pâté-style texture? We would then make sure that the vector describing our user has a 1 in the “Chicken” feature and a 0 in the “Pate(y/n)” feature.††

We next need a way to compare how similar the user’s preference is to each cat food option in our database. Let’s visualize this:

The cat’s preference vector is in purple, the leftmost vector on the graph. There are two possible cat food options, the vectors in green and orange in the middle and on the bottom, respectively. When we look at the angles between the vectors, it is straightforward to see that the angle between the user preference vector and the Yummy™ Chicken vector is smaller than the angle between the user and the Yummy™ Beef and Pork vector. This makes sense; we already know the cat likes chicken, so we expect the vector describing the chicken cat food to be closer to the cat’s preference than a vector describing a cat food with no chicken.

The way I chose to mathematically encode this intuition is with the cosine distance, which is essentially a way of measuring the angle between two vectors. The mathematical relationship is expressed as

The numerator of this expression is the dot product of the two vectors, and the denominator multiplies the magnitudes of each of the vectors. The result is a measure of the distance between the vectors. After we have calculated the similarity, we will repeat this process with every cat food in the database, determining how similar it is to the user’s preferences.

Here’s the code I used to implement this idea.

Finally, once we have calculated all the similarities, we choose the cat food that is most similar to the user’s preferences and recommend that cat food to our chicken-loving, pâté-hating feline friend.

To take a closer look at how this sort of problem might be implemented, here’s a link to my first-draft charity recommender. So far, I haven’t built a recommender system that handles large datasets, nor have I decided how to pick the categories to use in my model’s feature vectors. Those ideas will be implemented in a future version of the project (unless I get distracted by a different project!)

† Athena, I should point out that while all my examples here are based on cat food, I originally wrote the recommender system to deal with charities! With that in mind, I consider it appropriate to include a shout-out to Lapcats, an organization that matches cats to adopt with humans who want to adopt them, and from which I adopted you! So Athena, if you happen to have a cash stream I don’t know about, you might consider directing some of it toward the organization that rescued you from a bad situation and brought you to my (very educational) adoptive care 🙂

†† You may be wondering: what do we do if the cat has no preference in a certain category – say, the cat might not have a preferred brand of food? That’s a great and difficult question to answer. For this example, I will give the cat’s user vector a 0.5 in each of the brand categories. I will also assign 0.5 to each of the protein options besides chicken, since knowing the cat likes chicken doesn’t tell us anything about whether the cat likes beef. But making a guess about the user’s preference without much initial information can be quite hard – it’s sometimes referred to as the “cold start” problem.

Python Testing Frameworks

Posted on September 7, 2017September 11, 2017 by eqfdatadiary

I learned a lot of great things during physics grad school, but code design and testing were not among them. Fortunately, Athena, now I’m learning! And I’m sure you’re excited to be learning too.

Okay, I see from that look on your face that you’re not even sure what testing is. Fortunately, during this post, I’ll explain that, as well as:

Why testing is useful to software engineers and data people
What Python testing frameworks exist, and their pros and cons
How I use testing frameworks in my code, and why I usually use pytest

So what is testing? Well, coding is a little like chasing a string toy in that you rarely get it right on the first try. And, like chasing a string toy, there are all kinds of ways for your code to go wrong.

Unlike chasing a string toy, however, when you get your code wrong it isn’t always obvious. Testing is when you run your code using specific values, to check that it handles those values correctly.

In this post, I will be focusing on automated testing, which allows you to run lots of different tests with a single command. I will be demonstrating how my testing works by running code from a recommendation system I’m writing. I’ll write another blog post about that in the future, but for now, if you’re interested, you can look at the source code on my GitHub.

So why would an aspiring data scientist/software developer like myself be interested in automated testing? Why would I write a whole bunch of tests, instead of just running the code using specific values whenever it happens to come up? Honestly, until recently, that’s what I did do, and it wasted a lot of time. Writing tests for each part of your code might take time upfront, but in the long run it saves time since you don’t have to think of a bunch of test cases and set up the code to run the test cases and then run the test cases and figure out if the code handled them correctly, every time you want to test your code. You can just sit back, type a command, and watch your computer run the tests for you. It’s even easier than hiding behind the curtain while you wait to pounce on the string!

A testing framework is a set of code that’s designed to make it easier for you to write and run tests. In Python, the primary language I use, there are many testing frameworks. I’ll focus on four of them: unittest, doctest, pytest, and hypothesis.

unittest is Python’s built-in testing framework. This means that it’s a core part of Python, so you can pretty much guarantee that it will keep being updated whenever Python changes. To write a test in unittest, you have to create a test case class, which stores all the tests you want your computer to run. This test case class inherits from the unittest test case class, which basically means it shares all the general properties of unittest’s generic test case but also has a few tests specific to your situation. unittest, while relatively convenient and definitely reliable, is not a very flexible testing framework. You have to code an entire class when maybe only a few test functions would do the trick. It’s also sometimes considered bad practice to rely too heavily on inheriting from another class, as unittest requires you to do.

(Note: There is another testing framework, nose, that is similar to unittest but has better formatting and fixes a few of unittest’s problems. However, it is no longer being actively updated and doesn’t handle Python 3 at all. So I didn’t include it in my set of frameworks to investigate. It is worth mentioning, that an update to nose, nose2, is also available.)

doctests are also built into Python. Writing a doctest for a function is as simple as changing the docstring (the part of the function that describes the function’s behavior – not to be confused with a cat toy). If you use doctests in your code, the doctests will run automatically whenever you run your code, and will produce errors whenever the code produces an output the test doesn’t expect. In this way, it requires you to keep your docstrings updated whenever the code changes, which makes your functions much easier for someone else to read! Docstrings only handle exactly the output you give to them, though – even a slight difference in output and they will throw an error. Thus, they tend to be a poor match for a calculation that might include a small difference between an expected and actual floating-point value. They also don’t tell you when a test has passed – only when it has failed.

pytest is probably the most popular testing framework in the Python community these days. It’s being actively updated, and has nice formatting and helpful error messages. You can write tests for pytest in a class (like unittest) or as individual functions, depending on what suits the needs of your code base best. Also, if your code needs to make changes in your system (e.g., reading/writing files), pytest lets you write setup and tear-down methods without much difficulty. If you want a testing framework that will be guaranteed to work on very old Python code, unittest may be the answer. But pytest has one more perk – it can run your old unittest code, with error messages that are usually more helpful (and definitely more colorful!) than unittest would provide.

hypothesis is quite different from the other three frameworks discussed so far. Rather than testing specific cases, hypothesis randomly generates a set of possible cases to test, and they pass or fail based on whether they meet certain conditions. hypothesis tends to produce edge cases – cases that you as a coder might not consider because they are atypical inputs (e.g., empty lists). It can be very helpful for catching bugs in weird cases you might not think of, but might not work as well when it needs to test how code interacts with external files or databases. You can run it with pytest, so you also get pytest’s helpful formatting and error messages, or if you prefer you could run it with most other Python testing frameworks. (hypothesis technically has its own test runners, ie, functions that run tests. But these seem a bit more unwieldy than running hypothesis via another test framework.)

So after all that testing, my favorite is…pytest! I love the clean formatting and colors in pytest’s output, and like that it is clear about which tests pass and which tests fail. It’s easy to write, has clean and expressive ways to organize your code, and has helpful error messages. That said, I’m starting to appreciate the value of using a hybrid approach, with different kinds of testing for different kinds of problems. pytest is good as a general-purpose testing framework, but for a very small piece of code with predictable outputs, doctest is the better choice. If your code doesn’t interact with the external world too much, and you are worried your test cases might not be sufficiently comprehensive, hypothesis makes an excellent choice. (And some people swear by hypothesis even in cases when you are doing lots of reading/writing to your system or an external database; see Brennan Holt Chesley’s Generative Testing talk for details.)

So Athena, I hope you’ve enjoyed this overview of Python testing frameworks, and you’ve picked up a thing or two about the benefits of testing your code! Once you get in the habit, it’s really not so much harder than chasing that string toy. And there’s a multitude of tools you can choose to pursue your testing (and string-chasing) goals.