Effective Snapshot Testing

Created
Sep 27, 2020 7:36 PM
Tags
devtesting
Type
Article

Snapshot testing can be useless, or super useful. Your choice. Let's talk about how to make them useful.

In that tweet, Justin shares a screenshot of a bunch of his thoughts on snapshot testing. For the sake of accessibility, I've typed out the entirety of what he wrote below. I'll then talk about some of my thoughts on what he says.

Takes on snapshot testing schemes to follow. There are numerous categories of failures surrounding snapshot testing. Most of this is informed by my experience in 2008–2011 when QA teams thought Selenium RC record playback scripts were a panacea, but I've seen the same thing with tools like VCR in Ruby, HTML fixtures in JS tests, and other attempts at "easy" controls over API & DB dependencies:
They are tests you don't understand, so when they fail, you don't usually understand why or how to fix it. That means you have to do true/false negative analysis & then suffer indirection as you debug how to resolve the issue
Good tests encode the developer's intention, they don't only lock in the test's behavior without editorialization of what's important and why. Snapshot tests lack (or at least, fail to encourage) expressing the author's intent as to what the code does (much less why)
They are generated files, and developers tend to be undisciplined about scrutinizing generated files before committing them, if not at first then definitely over time. Most developers, upon seeing a snapshot test fail, will sooner just nuke the snapshot and record a fresh passing one instead of agonizing over what broke it.
Because they're more integrated and try to serialize an incomplete system (e.g. one with some kind of side effects: from browser/library/runtime versions to environment to database/API changes), they will tend to have high false-negatives (failing test for which the production code is actually fine and the test just needs to be changed). False negatives quickly erode the team's trust in a test to actually find bugs and instead come to be seen as a chore on a checklist they need to satisfy before they can move on to the next thing.
These four things lead to a near total loss in the intended utility of integrated/functional tests: as the code changes make sure nothing is broken.
Instead, when the code changes, the tests will surely fail, but determining whether and what is actually "broken" by that failure is the more painful path than simply re-recording & committing a fresh snapshot. (After all, it's not like the past snapshot was well understood or carefully expressed authorial intent.) As a result, if a snapshot test fails because some intended behavior disappeared, then there's little stated intention describing it and we'd much rather regenerate the file than spend a lot of time agonizing over how to get the same test green again.

One thing I want to make clear before continuing is that snapshot testing is an assertion, just like the toBe in: expect('foo').toBe('foo'). I think there's sometimes confusion on this point, so I just wanted to clear that up.

Despite Justin's arguments against snapshots, I'd suggest that there is value in them if you use them effectively. With that in mind, I thought I'd share a few cases where snapshot testing really shines, things to avoid with snapshots, and things you can do to make your snapshots more effective:

Where snapshot testing shines

Error Messages and logs

If you're writing a tool for developers, it's a really common case that you want to write a test to ensure that a good error or warning message is logged to the console for the developers using your tool. Before snapshot testing I would always write a silly regex that got the basic gist of what the message should say, but with snapshot testing it's so much easier.

1exports[`importAll.sync uses static imports 1`] = `3import importAll from 'import-all.macro'5const a = importAll.sync('./files/*.js')9import * as _filesAJs from './files/a.js'10import * as _filesBJs from './files/b.js'11import * as _filesCJs from './files/c.js'12import * as _filesDJs from './files/d.js'

What makes this snapshot good is it can communicate the intent by the title of the snapshot and the before/after (with the separating the before from the after). What's cool about this is for any plugin using babel-plugin-tester I can just look at the snapshot file and get a great idea for how the plugin works. 😎

Have you ever shipped code that busted your app user experience because styling wasn't applied properly? I have. Writing tests to ensure this kind of confidence is really difficult. Even E2E tests can't reliably test this kind of thing. There are tools that will take visual snapshots and do visual diff comparisons. But these tools are difficult to set up, run, and are often quite flaky. On top of that, they're basically snapshot tests, so they suffer from many of the same things Justin calls out about snapshot tests too!

1exports[`enzyme.render 1`] = `21    Hello World, this is my first glamor styled component!  

This is nice because now if we change logic so some styles aren't applied properly we'll know about it. It does still suffer from the problems Justin mentions, but I think it's worth it.

Things to avoid with snapshots

HUGE snapshots

This is probably the biggest cause for all the things that Justin's talking about. When your snapshot is more than a few dozen lines it's going to suffer major maintenance issues and slow you and your team down. Remember that tests are all about giving you confidence that you wont ship things that are broken and you're not going to be able to ensure that very well if you have huge snapshots that nobody will review carefully. I've personally experienced this with a snapshot that's over 640 lines long. Nobody reviews it, the only care anyone puts into it is to nuke it and retake it whenever there's a change (like Justin mentioned).

So, avoid huge snapshots and take smaller, more focused ones. While you're at it, see if you can actually change it from a snapshot to a more explicit assertion (because you probably can 😉).

I should add that even huge snapshots aren't entirely useless. Because if the snapshot changes unexpectedly it can (and has) inform us that we've made a change with further reaching impacts than anticipated.

Making your snapshots more effective

Custom serializers

1const projectRoot = path.join(__dirname, '../../')3  test: val => typeof val === 'string',4  print: val =>6      .split(projectRoot)8      .replace(/\/g, '/'),

One of the most useful things that I've found with test maintainability is when you have many tests that look the same, try to make their differences stand out. This makes it easier for people coming into your codebase to know what the important pieces are. So you can try to split out the common setup/teardown into a small helper function to make the test have more differences and fewer commonalities to each other.

I've seen some tests where you take one snapshot of a react component before the user interacts and another after the user interacts. What you're trying to assert on is the difference between the before and after, but you get much more than you bargain for and this results in more false negatives that Justin's talking about. However, if you could serialize the difference between the two states, that would be much more helpful. And that's what snapshot-diff can do for you:

And the snapshot will be just the difference between the two (looks like a git diff).