Testing software is hard. Maintaining a fast, comprehensible, and meaningful test suite that can grow alongside an application for years is really hard.
In our experience working with dozens of teams, we’ve run into a bunch of the same testing problems over and over again. This talk is an effort to analyze the most common reasons that teams become disillusioned with their tests, and offer some targeted advice to help teams prevent these problems from ever materializing by tweaking their tools, workflow, and perspective.
The video above was recorded at RubyConf 2015 on November, 15, 2015 (incidentally, Test Double’s 4th anniversary!).
I referenced a few things in the talk that I ought to link to here:
- A talk on creatively making targeted test suites, Breaking Up (With) Your Test Suite
- A screencast on ‘Discovery Testing’
- A wiki of testing topics I assembled while preparing the talk
- A test double library for JavaScript that we maintain
Transcript of the talk
[00:00:00] Woo! Alright! High energy, I love it, alright. Doors are closing and now we're covered. Alright, great. Can we get my slides up on the monitors? Alright, great. Let me start my timer. Where's my phone? Oh! Who has my phone? Where's my timer? Alright, we'll start. Alright. There's this funny thing where every year conference season lines up with Apple's operating system release schedule and I'm a big Apple fanboy and so I like, on one hand, really want to upgrade and on the other hand, really want my slide deck to work.
[00:00:37] This year it was because they announced the iPad Pro. I was pretty excited. I was like, maybe this year finally OS 9 is going to be ready for me to give like a real talk out of and build my entire talk out of it. So this talk was built entirely in OS 9. So let's just start it up, see how it goes.
[00:00:52] I'm a little bit nervous.
[00:00:59] Alright, so it's a little retro. It takes a while to start up. I built my entire presentation in Appleworks. So I got to open up my Appleworks presentation. Okay, there it is. I got to find the play button. And here we go. And good. Alright. So this talk is how to stop hating your tests. My name is Justin. I play a guy named Searls on the internet and I work at the best software agency in the world, Test Double.
[00:01:23] Why do people hate their tests? I think a lot of teams start off in experimentation mode. Like they, everything's fun, free, they're pivoting all the time. And having a big test suite would really just slow down their rate of change and discovery. But eventually we get to a point where we're worried that if we might break things, and it's important things stay working.
[00:01:38] So people start writing some test suites, so they have a build, so when they push new code, they know whether they just broke stuff. But if we write our tests in a haphazard, unorganized way, they tend to be slow, convoluted, and every time we want to change a thing, we spend all day just updating tests. And eventually teams get to this point where they just, yearn for the good old days where they got to change stuff and move quickly.
[00:01:56] And I see this pattern repeat so much that I'm starting to believe that an ounce of prevention is worth a pound of cure in this instance. Because once you get to the end, there's not much you can do. You can say Hey, our test approach isn't working. And a lot of people would be like I guess we're just not testing hard enough.
[00:02:08] And. When you see a problem over and over again, I really personally, I don't believe that the work harder comrade approach is appropriate. You should be always inspecting your workflow and your tools and trying to make them better if you keep running into the same issue. Some other people, they might say, Okay let's just buckle down, remediate.
[00:02:24] Testing is job one. Let's really focus on testing for a while. But from the perspective of the people who pay us to build stuff, testing is not job one. It's at best job two. From their perspective, they want to see us shipping stuff. Shipping new features. And as the longer we go with that impedance mismatch, the more friction and tension we're gonna have.
[00:02:41] So that's not sustainable. I said we're talking about prevention, but if you're like working in a big legacy monolithic application, and you're not Greenfield, this is not a problem at all, because I got this cool thing to show you. There's this one weird trick to starting fresh with your test suite.
[00:02:53] That's right, you're gonna learn what the one weird trick is. Basically, just move your test into a new directory and then you make another directory and then you have two directories and you can write this thing called a shell script. Get this. That runs both test suites. And then eventually you port them over and you're able to decommission the old test suite.
[00:03:11] But, I hesitate to even give a talk about testing because I am the worst kind of expert. I have too much experience navel gazing about testing, building open sourcing tools around testing. I've been on many teams as the guy who cared just a little bit more about testing than everyone else. And lots of highfalutin philosophical and nuanced Twitter arguments that really are not pertinent to anyone's life.
[00:03:30] So my advice is toxic. I am overly cynical, I'm very risk averse and if I told you what I really thought about testing it would just discourage all of you. Instead my goal here today is to distill my advice down into just a few component parts. The first part, we're gonna talk about structure, the physicality of our tests, like what the lines and files look like on disk.
[00:03:49] We're gonna talk about isolation cause I really believe that how we choose to isolate the code that we're testing is the best way to communicate the concept and the value that we hope to get out of a test. And we're going to talk about feedback. Do our tests make us happy or sad? Are they fast or are they slow?
[00:04:02] Do they make us more or less productive? And keep in mind, we're like thinking about this from the perspective of prevention. Because these are all things that are much easier to do on day one than to try to shoehorn in on day 100. So at this point, in keeping with the Apple theme, my brother dug up this Apple 2 copy of Family Feud and it turns out it's really hard to make custom artwork in Apple Work 6.
[00:04:23] So I just ripped off the artwork from this Family Feud board. We're gonna use that to organize our slides. It's a working board. That means if I point at the screen and say, show me potato salad I get an X. But unfortunately I didn't have a hundred people to survey. I just surveyed myself a hundred times so I know all the answers already.
[00:04:37] First round we're going to talk about test structure and I'm going to say show me too big to fail. People hate tests of big code. In fact, have you ever noticed that people who were really into testing in TDD they really seem to hate big objects and big functions more than normal people. We all understand big objects are harder to deal with than small objects, but one thing that I've learned over the years is that tests actually make big objects even harder to manage, which is counterintuitive.
[00:05:01] You'd expect the opposite. And I think part of the reason is that when you've got big objects, They might have many dependencies, right? Which means you have lots of tests set up. They might have multiple side effects in addition to whatever they return. Which means you have lots of verifications. But what's most interesting is they have lots of logical branches.
[00:05:15] Depending on the arguments in the state, there's a lot of test cases that you have to write. And this is the one that I think is most significant. Let's take a look at some code. At this point I realized that OS 9 is not Unix. I found a new terminal. Actually, it's a cool new one.
[00:05:29] It just came out this week. So let's boot that up.
[00:05:41] Yep, here we go.
[00:05:46] Alright, we're almost there. It's a little slow. Alright, so this is a fully operational terminal. Alright, so we're gonna type in like arbitrary Unix command, that works fine. I'm gonna start a new test it's a validation method of a timesheet object to see whether or not people have notes entered.
[00:06:00] And we're gonna say if you have notes and you're an admin and it's an invoice week or an off week and whether you've entered time or not, all of those attributes. They factor into whether or not that record is considered valid. And at this point I'm writing, I wrote the first test but I'm like, oh I got a lot of other contexts to write.
[00:06:14] Let's like, let's start planning those out. I'm like, damn this is a lot of tests that I would need to write to cover this case of just four booleans. And what I fell victim to there is a thing called the rule of product. Which is a thing from the school of combinatorics and math. It's a real math thing because it has a Wikipedia page.
[00:06:29] And what it says essentially is that if you've got a method with four arguments You need to take each of those arguments and the number of possible values of each of them, multiply them together, and that gives you the total number of potential combinations, or the total number of upper bound of like test cases you might need to write.
[00:06:45] So in this case, with all Booleans, it's 2 to the 4th, so we have 16 test cases that we may have to write in this case. And if you're a team that's used to writing a lot of big objects, you're probably in the habit of thinking, oh I have some new functionality, I'll just add one little more argument, like what more harm could that do?
[00:06:59] Other than double the number of test cases that I have to write. And so as a result, as somebody who trains people on testing a lot, I'm not surprised at all to see like a lot of teams who are used to big objects want to get serious about testing and then they're like, wow this is really hard, I quit. If you want to get serious about testing and have a lot of tests of some code, encourage you, stop the bleeding.
[00:07:16] Don't keep adding on to your big objects. I try to limit new objects to one public method and at most three dependencies. Which, to that particular audience, is shocking. They'll the first thing they all say is but then we'll have too many small things. How will we possibly deal with all the well organized and carefully named and comprehensible small things?
[00:07:33] And, people get off on their own complexity, right? So they think that's what makes them a serious Software developers, how hard their job is. They're like, that sounds like programming on easy mode. And I'm like, it is easy. It's actually not rocket science to build an enterprise CRUD application.
[00:07:46] But you're making it that way. Just write small stuff. It works. Next up I want to talk about how we hate when our tests go off script. Code can do anything. Our program should be unique and creative special unicorns of awesomeness. But tests can and should only do three things, they all follow the same script. Every test ever sets stuff up, invokes a thing, and then verifies behavior.
[00:08:07] We're writing the same program over and over again. And it has these three phases, arrange, act, and assert. A more English natural way to say that would be given, when, then. And when I'm writing a test, I always intentionally call out those three phases really clearly and consistently. For example, if I'm writing this as a mini test method, I always put exactly two empty new lines in every single X unit style test that I write.
[00:08:27] One after my arrange, one after my action, and then it's really clear at a glance, what's my arrange, what's my act, what's my assert. I always make sure that they go in the correct order as well, which is something people get wrong a lot. If I'm using something like RSpec, I've got a lot of constructs available to me to specify what the intent is.
[00:08:42] So I can say let, and give a value to do a setup. So let says I'm setting up a new thing. I can use before to call out like, this is an action with a side effect, this is my act. And then that allows me to split up, if I so choose, those assertions into separate blocks. And so now at a glance, if somebody knows RSpec, they'll know exactly what phase each of those lines belongs in.
[00:09:00] I also try to minimize each phase to just one action per line so that test scoped logic doesn't sneak in. The late, great Jim Wyrick wrote an awesome Ruby gem, I hope you check it out, called RSpecGiven. I help maintain it now. He and Mike Moore ported it to as well. I ported it a few years ago to Jasmine and somebody else has taken it on and ported it to Mocha.
[00:09:17] It's a really cool given when then conscious testing API. And what it does is you start from the same place with RSpec as you may have been before and we'll just say given instead of let because that's more straight forward. When instead of before so it's, clear. But where it really shines is that you see the then, It's just a little one liner and I don't have a custom assertions API because it's actually interpreting the Ruby inside of that and able to split it up to give you great error messages.
[00:09:40] So it's a really terse and yet successfully expressive testing API. Now you don't have to use that tool though to just write your tests in a way that's conscious of given when then. They're easier to read regardless. They point out superfluous bits of test code that don't fit one of those three phases, and they can highlight certain design smells.
[00:09:56] For instance, if you've got a lot of given steps, maybe you have too many dependencies on your subject, or too complex of arguments. If it takes more than one when step, then it's probably the case that your API is confusing or hard to invoke. There's something awkward in how you use that object. And if you've got many then steps, then your code is probably doing too much, or it's returning too complex of a type.
[00:10:16] Next up, I want to talk about hard to read, hard to skim code. Some people are fond of saying that test code is code, but, test code is untested code. So I try to minimize it. I try to make it as boring as possible for that reason. Because what I find is that a good test tells me a story of what the code under test should look like.
[00:10:31] But if there's logic in the test, it confuses that story, and I'm spending most of my time reading that logic and making sure I got it right, because I know there's no test of that test. Test scoped logic, not only is it hard to read, but if there are any errors, they're very easy to miss. Maybe it's passing green for fantasy reasons.
[00:10:44] Maybe only the last item in this loop of data is actually executing over and over again. A lot of times, though, people have this impulse. They say, hey, I've got a lot of redundancy in my test. I could really drive this up by just generating all of my test cases. For example, this person did a Roman numeral Akata, and they want to they can see very clearly, oh, I could just have a data structure and make it really much more terse.
[00:11:03] Looping over that data structure and then generating using define method a new test method that will give a good message And it's a perfectly reasonable test and in this case it totally works fine But I still think it's problematic And the reason is that was that person experienced test pain and their reaction was to go and make the test cleaner that the Usually when we experience test pain the first thing I look is maybe there's something wrong with my production code that led me there And so if you look at that person's production code You can see all that data is hiding in ifs and elses.
[00:11:33] They've got all this really dense logic in there. I would much rather take a look at the same thing and extract the same sort of data structure from that. So that I can then, instead of having all that if and else, I'm looping over the same data structure and figuring out whatever rule I have to.
[00:11:47] So now I only need a few test cases. In fact, I can just keep adding additional keys to that hash. And now I've covered a lot of cases without needing a whole bunch of really explicit test cases. It's much cleaner this way. Sandy Metz who's around, is she here? Sandy, where are you at? Hey Sandy. So she's got a thing called the squint test.
[00:12:02] It helps her understand and cope with really big file listings and she can draw a few conclusions. I don't have anything nearly so fancy, but when I'm reading your test suite, I really hope that I'm able to, at a glance, understand what's the thing under test. And specifically, like, where are all the methods?
[00:12:16] And are they in order? And are they symmetrical? Is it easy for me to find all the tests of just one method? And I like to use, if I'm using RSpec, for example, I like to use context to point out every logical branch and all the subordinate behavior underneath each logical branch. It's very easy to organize this way.
[00:12:30] And when you do it consistently, it's easy to read tests. Additionally, like I said, arrange, act, assert should really pop in a consistent way. Now if I'm using an XUnit style testing tool like Minitest, At least I want to see a range, act, assert, really straightforwardly throughout every single file listing and the names of the tests should mean something.
[00:12:48] All right, next up. Let's talk about tests that are too magic. A lot of people hate tests that are too magic. Or not magic enough, as it turns out. Because all software is a balancing act. And test libraries are no different. Expressiveness of our testing APIs exists on a, along a spectrum. Smaller APIs generally are slightly less expressive than things that have a larger API.
[00:13:07] Because they have more features but you have to learn those features. And so if you look at something like Minitest, it's very cool, cause it's like, it's classes and methods. We know that. So every test is a class, we override setup and teardown to, to override that behavior, every new test is another method, assert is very easy to use Ryan's a funny guy, so he's got some fun ones like, I suck and my tests are order dependent to get some custom behavior.
[00:13:28] But when you compare that to RSpec, it's night and day. RSpec has describe and context in their synonyms. Subject and let in their similar. Before, after, and around, and each suite all for each of those. You've got it, you've got specify, which are similar. You've got object should have in all of those matchers.
[00:13:42] You've got expect to be in the mostly similar matchers. You've got shared example groups, tagging, advanced CLI features. There's a lot to learn in RSpec. Jim tried to have it both ways when he designed Given. He wanted a terse API, Given, When, Then with just a handful of other things he came across, and invariant, and his natural assertion API.
[00:14:00] So it's very terse, but it's also sufficiently expressive for most people's tests. Now, because it's not a stand alone testing library, you're still standing on top of all of Minitester RSpec, so it is still physically complicated. But it's really nice to live in from a day to day basis. And I'm not here to say that there's some right or wrong testing library or level of expressiveness.
[00:14:18] You just have to keep yourself aware of the tradeoffs, right? Smaller testing APIs, they're easier to learn, but they might encourage more one off test helpers that we write and that you carry that complexity. Whereas a bigger testing API, something like RSpec, might help you yield really terse tests, but to an uninitiated person, they're just gonna look like magic.
[00:14:34] And you have to eat that onboarding cost if somebody doesn't know RSpec. Finally in this category, people hate tests that are accidentally creative, because in testing consistency is golden. If we look at a similar test to what we had before, we're gonna use let to set up an author and a blog and a comment, but it's not clear at all what the thing under test is, so I'm gonna rename it subject.
[00:14:55] I always call the thing that's under test subject, and I always call the thing I get back from that thing that I'm gonna assert on, I always call that result. Or results. 100 percent of the time. So if I'm reading a really big nasty test, at least I know what's being tested and what's being asserted on.
[00:15:07] This is a surprisingly daunting task in a lot of people's test suites. So if you learn one thing today and you just start calling the thing that you're testing subject, this will have been worth all of this preparation. And when you're consistent, inconsistency can actually carry nuanced meaning. For example, if I've got a handful of tests here, I'm going to look at them and go, Oh, wait, there's something weird about test C.
[00:15:27] That implies that there's probably something interesting about object C. I should look into that. But, that's really useful. That speeds me up. But when every test is inconsistent, if every test looks way different, I have to bring that same level of scrutiny to each and every test, and I have to read very carefully to understand what's going on, understand the story of the test.
[00:15:44] As a result, like if I'm adopting your code, test suite, I would much rather see hundreds of very consistent tests, even if they're mediocre, even if they're crappy, than even a handful of beautifully crafted, brilliant, and artisanal custom tests that are way different. Because every time I fix anything, it's just a one off thing.
[00:16:03] Also, readers are silly, right? They've got this funny habit of assuming that all of our code has meaning but especially in testing, very often the stuff we put in our test is just plumbing to make our code execute properly. So I try to point out meaningless stuff to help my reader out.
[00:16:16] In particular, I make unimportant test code look obviously silly and meaningless to the reader. In this instance, I'm setting up a new author object and he's got a fancy name and a phone number and an email validatable but that's not necessary for this method. So here, I'll just change his name to Pants and I'll remove his phone number cause it's not necessary and I'll change his email to PantsMail and then I'll update my assertion.
[00:16:37] And now everyone in the room, before you might have assumed you needed a real author but you didn't, now everyone in the room could implement this method understanding exactly what it needs to really do. So test data should be minimal but also minimally meaningful, right? We're already through section one.
[00:16:51] We're through test structure. Congratulations, we did it. Let's move on to round two talking about test isolation. And the first thing I want to talk about that really cheeses me off is unfocused test suites. Most teams define success in Boolean terms when it comes to testing. They have one question.
[00:17:06] Is it tested? And if the answer is yes, then they feel pretty good about themselves. But I think we can dig deeper. What My question is, hey, is the purpose of each test readily apparent and does its test suite promote consistency? And very few teams can answer yes to this question. And when I raise the issue, a lot of people are like, consistent, but I've got tons of tests, all with different purposes, all testing all kinds of different things inside of my test suite.
[00:17:27] And I'm like, yeah, that's true, you could probably boil it down to four or five. And in fact, what I do is for each type of test that I define, I create a separate test suite, each with their own set of conventions, And those conventions lovingly reinforced with like their own spec helpers or test helpers to, to try to encourage consistency.
[00:17:43] I actually did a whole talk just on that called breaking up with your test suite. It's up on our blog or there's a short URL. Now in Agilent, there's this illustration people like called the testing pyramid. TLDR, stuff at the top is illustrated to be more integrated, stuff at the bottom is less integrated.
[00:17:59] And when I look at most people's test suites, they're all over the place. Some of the tests call through to other units, others tests will fake out their relationships with other units. Some of the tests might hit a database but fake third party APIs. Some other tests might hit all those fake APIs but then operate beneath the user interface.
[00:18:14] Which means every time I open up a test I have to read it carefully and then understand okay what's the plan here? What's real, what's fake, what are they trying to get out of this test? And it's a huge waste of time. So instead I start with just two suites in every single test in every application that I write.
[00:18:28] One suite I make maximally realistic, as integrated as I can possibly manage, and another suite I make as isolated as possible. Part of the reason I do this is because then intuitively I can answer, should I fake this, yes or no? And I one of those two extremes instead of landing all over the place.
[00:18:43] The bottom suite, it's job is to make sure that every little thing works in your system and the top suite is to make sure that when it's all plugged together nothing blows up. It's pretty straightforward and very comprehensible. Now, as the need arises it might be the case that you need to define some kind of semi integrated test suite.
[00:18:58] And it's just important that you establish a clear set of norms and conventions. So for instance, I was on an Ember team recently and we agreed we're gonna start writing Ember component tests. But up front, we had to all get on board with the fact that we're gonna fake our APIs, we're not gonna use testable objects, we're gonna trigger actions instead of UI events, and we're gonna verify app state, not HTML templates.
[00:19:18] These were arbitrary decisions, but we relished the opportunity to lock in those arbitrary decisions because we knew it would buy us consistency. Next I want to talk about how too realistic of tests bum us out. Because when I ask somebody, Hey, how realistic do you think this test should be?
[00:19:34] They don't really have a good answer other than maximally realistic. I want to make sure my thing works, so as realistic as possible. And so they might be proud of their very realistic web test. There's a browser and it talks to the real server and a real database. And in their mind, this is as realistic as it gets.
[00:19:48] And to poke holes in it, I might ask, Hey does it talk to your production DNS server? And they're like, no. Does it talk to your CDN and verify that your cache invalidation strategy is working? And they're like no. So it's not the case that it's a maximally realistic test at all. In fact, there were very, definite boundaries here but the boundaries were totally implicit.
[00:20:05] And that kind of implicit shakiness is a problem because now if something blows up, anyone on the team is liable to ask, why didn't we write a test for that? And it puts these teams in a trap where they write some tests. Stuff blows up in production, and then the managers come, and they all have a come to Jesus moment, and they're like, Why?
[00:20:21] And then they're like, never again, and their only reaction is to increase the realism of all of their tests, increase the integratedness. Now, that's, that would be fine, except for the fact that realistic tests are slower they take more time to write, to change, to debug, they re they require a higher cognitive load, we have to keep more in our heads at once, and then they fail for more reasons, because there's so many moving parts.
[00:20:39] They have a real cost. So instead, think this way if you have really clear boundaries, then you can focus on what's being tested really clearly, and you can focus and be consistent about how you control stuff. So the same team, with that clarity of mind, Same thing happens. You write tests, stuff happens.
[00:20:56] Something blows up in production and then they can have a backbone. They can stand tall and have a grown up conversation about how up front they all agreed that type of test was too expensive. Or, hey, they didn't intentionally break production. They were unable to anticipate that particular failure.
[00:21:10] Simply having tests of it it's really hard to automate something you can't predict, right? And additionally, maybe they could write like a targeted test of just that one concern off to the side without making all of their tests slower in some broad based way. Aside from having high costs, realism and testism is some kind of universal ideal or virtue.
[00:21:29] In fact, less integrated tests are useful too. They offer much richer design feedback of how it is to use our objects and any failures they might have are much easier for us to understand and reason about. I just said reason about, damn, sorry, slip of the tongue. Alright, next up, let's talk about redundant code coverage.
[00:21:46] So suppose that you've got a lot of tests in your test suite. You've got browser tests, you've got view tests, you've got controller tests, those all call through to a model. Maybe that model has relationships with other models and they all, everything's tested, 8 ways to Tuesday. So you're very proud of your very thorough test suite.
[00:22:01] In fact, you're a test first team, so you need to make a change to that model, right? So the first thing you do is you write a failing test and then you make that test pass and you feel pretty good, so you push it up to your continuous integration platform. And then what happens? You all those things depend on all those other things.
[00:22:14] So your controller test, your view test, your browser test, they all broke. Those related models, they call through to that model, they incidentally depend on it, so those all broke. And what took you half an hour on Monday morning, you're now spending two days just cleaning up all these tests that you didn't anticipate having broken.
[00:22:27] So it was thorough, yeah. But it was redundant too. And I found that redundant coverage can really kill a team's morale. And it's the sort of thing that doesn't bite you on day one, cause everything's fast and it's easy to run all in one place. But once things get slow, having a lot of redundant coverage can really kill your productivity.
[00:22:42] So how do you detect redundant coverage? It's the same way you detect any coverage, right? You can run a coverage report and then look at, Oh, the only thing we look at when we look at a coverage report, right? It's just like the easy targets of ways that we can increase our coverage. But there's a lot of columns there.
[00:22:55] What are those other columns say? We never look at those other columns. The last column is the average number of hits per line and I think that's pretty interesting, right? Because that top thing got hit 256 times as I ran my tests. What that tells me is that if I change that method I'm gonna have tests breaking everywhere.
[00:23:12] It's an important thing to think about. So one thing we can do is identify a clear set of layers that we test through. Like for instance that same team might agree like the browser tests are valuable but these view and controller tests are mostly redundant. So we'll just test through the browser and the models and reduce the amount of redundant code coverage.
[00:23:27] Or, totally different strategy, you could try your hand at outside in test driven development, where you test from the outside in, but you isolate each thing from the stuff that it depends on underneath so that you don't have this incidental dependency on other objects in your tests.
[00:23:41] Some people call that London School TDD Martin Fowler called it Mockist TDD, don't love that term or if you've heard of the book Goose, Growing Object Oriented Software. I realize now that I've iterated enough on it that it I just call it my own thing, I call it discovery testing lately.
[00:23:56] I recently did a free screencast series on our blog about it just to explain the concept and my workflow. I'd love if you check that out if this interests you but we don't have time to talk about that today. However, I did just bring up test doubles and fake stuff, so it would only be fair to talk about how people hate in their tests careless mocking, right?
[00:24:12] So I said test double. Test double is a catch all term for anything that fakes out another thing for the purpose of us writing our tests, like a stunt double. And, a test double incorporates like, it could be a fake object or a stub or a mock or a spy, something you get from like a mocking library.
[00:24:27] And what's funny here is that I happened to co found a company named Test Double. maintain several testable libraries. So when I go and talk about testing, people are normally like, oh, Justin, you're probably pretty pro mocking, right? And it's actually a little bit more complicated than that.
[00:24:42] I have a nuanced relationship with mock objects, with testables. Because the way that I use them is this very careful and rigid process, right? I start with I have the subject that I want to write and I think I'm going to need these three dependencies. So I start with a test of that subject and I create fakes of those three things because they don't exist yet.
[00:24:59] And I use the test as a sounding board. Are those APIs easy to use or are they awkward? Do does the data flow, the data contracts between those three things, do they all make sense? And if not, I can very easily change the fake because it's the thing doesn't even exist yet.
[00:25:12] So it's a very easy time to catch design problems. That's not how 99. 9 percent of the world uses their mock objects. Most people are trying to write a realistic test and they've got dependencies. Some are easy to set up, maybe another's hard to set up, maybe one fails intermittently and so they just use mocking frameworks as this cudgel.
[00:25:31] Like they're just shutting up those dependencies that are causing them pain and then they just try to get their test to pass and then as soon as the test is done they're exhausted and then they push it. But. On day two and onward, we realize that those types of tests, they just treat symptoms of test pain, not the root cause.
[00:25:51] They greatly confuse future readers. What's the value of this test? What's real? What's fake? What's going on here? What's the point? And they make me really sad, right? Cause they give test doubles a bad name and I gotta protect my brand, y'all. If you see someone abuse a test double, say something.
[00:26:07] Hashtag Macho Mocs. Really please. So before we wrap up on test isolation, I want to talk about application frameworks. Because frameworks are cool. They provide repeatable solutions to common problems that we have. But the most common category of problems that we deal with are, how do I get my app to talk to X thing?
[00:26:24] They're integration concerns usually. So if we visualize our application as like some juicy plain old code in the middle and some framework coupled code around the periphery we Then maybe your framework is providing you with an easy way to talk HTTP or email or other cool stuff. And the way that I visualize applications is like some have a maybe a default amount of coupling to the framework.
[00:26:46] Maybe I've been in some projects where literally every single line of code is coupled to a framework given type or asset. And then some have very like intentionally dodging the framework designs where they try to like skirt away into a nice little domain driven land off to the side.
[00:27:00] But regardless, frameworks raise this dilemma when it comes to testing because they focus mostly on integration problems. And as a result, when the framework provides you with test helpers, those test helpers assume the same level of integration, because you want to make sure that you use the framework correctly.
[00:27:14] And that's completely fair. The frameworks aren't messing up here. But, when we, as framework consumers, look at our framework as the giver of all things that we need to use, that means we're gonna end up only writing integration tests. When in fact, if some of our code doesn't rely on a framework, why should our tests?
[00:27:29] The answer is they shouldn't. You might still have a first test suite that does call through all the framework stuff. That overly integrated test suite that makes sure everything's plugged together right. But if you've got a lot of juicy domain logic, then by all means, test that without the coupling to your framework.
[00:27:43] Not only will it be faster, but you're gonna get much tighter feedback, much better messages, and much, a better sense of, like, how the test can help improve your design. So that was a little bit on test isolation. Congratulations. We got through round two. Just one round to go. We're gonna talk a little bit about test feedback.
[00:28:00] We're gonna start about another thing that people hate about the tests are bad error messages. Uh, let's talk about error messages. But oh crap, I broke the build. So let's go pull down this gem that I wrote. This is a real gem. There's a real build failure. So naturally it's going to have an awesome error message.
[00:28:17] Let's take a look at the error message. Failed assertion, no message given. On line 25. What's my workflow here to fix this? You gotta see the failure and then I gotta open up the test, find that line, I gotta put out a print statement or I gotta debug to figure out what this, what the expectation was, what the actuality was.
[00:28:34] Then I can change my code and then I can see it pass. And then at that point I need a coffee break because it's been 20 minutes. And that's my workflow. It's super wasteful every single time I see a failure in that particular project. So even if a test. pride ourselves in fast tests. Bad failure messages provide so much friction and waste that they can easily offset how fast your test suite is.
[00:28:55] Now let's look at a good error message. So this is an RSpec given example. We're gonna say like then username equals sterling archer. We run that test. And when we look at the test Jim designed this so well you can see the assertion right there. Expected sterling Mallory Archer to equal sterling Archer.
[00:29:10] And you can see that you tripped the failure by the whole expression evaluating to false and then what it does is it keeps calling until it can't call anymore. So the thing on the left there is like user dot name evaluated sterling Mallory Archer yes. But then it knew it could just call user and it's oh, look, user is an active record object and it prints that whole active record object there for me.
[00:29:28] So now, and most of the time when I see a failure in RSpecGiven, I'm like, okay, cool, so my workflow is see the failure, realize what I did wrong, change the code, and then earn a big juicy promotion. Because I'm so much faster than that other guy who's writing bad assertion library stuff. In my opinion, judge assertion libraries, as well as how you use them, Assertions, most assertion libraries allow you to write really great assertions and we just find a way not to.
[00:29:50] Judge them on their message quality, not just how cool and snazzy their API is. I think this is really important and overlooked. Next up, let's talk about cause we talked about productivity a little bit. Let's talk about slow feedback loops. 480 is an interesting number. 480 is the number of minutes in an 8 hour workday.
[00:30:06] And I think about this number a lot. So when I'm looking at my own feedback loops, let's say it takes me 15 seconds to change some code, 5 seconds to run a test 10 seconds to decide what I'm gonna do next. That's a 30 second feedback loop. That means in an 8 hour workday, I have an upper bound of 960 thoughts.
[00:30:22] that I'm allowed to have. If you're like me and you have some non coding responsibilities though, you probably have some additional overhead. The non code time might take some time, context switching, and in a 60 second feedback loop, which obviously ties back to 480 that would allow for 2 hours of non code time in an 8 hour workday.
[00:30:39] But pretend we've been very successful. We have a lot of tests and running a single test takes us now about 30 seconds. Now we're looking at an 85 second loop, so just 338 actions a day. Almost cut by a third. But that non code time, that doesn't, that's 2 hours fixed. That doesn't care how fast your tests are.
[00:30:54] So that has to get bumped up too. So now we're looking at a slightly, slightly slower loop. Now imagine like you've got really bad error messages like we just talked about. Instead of being able to see in 10 seconds what's going on, you got a debug or whatever and it takes you 60 seconds to figure out what's going on.
[00:31:07] So now your feedback loop's 155 seconds. So you only have 185 useful thoughts that you can have in a day. And that sucks. But if you've ever been on a team with a lot of integration tests and you draw the short straw for a given iteration and your job is to update all those integration tests. I was on a team once where it literally took four minutes as the baseline to run an empty cucumber test.
[00:31:27] And that was really slow. So in that case, 422 seconds might have been my feedback loop yielding only 68 actions in a day. Now I don't know about you, but if you're running If I'm running a four minute long test, what happens is, I'll run, start the test, and then I'll go check out Twitter or Reddit or email or something, and then I'll come back and I'll realize, oh damn, the test finished three minutes ago.
[00:31:48] So my real feedback loop is more like six hundred and sixty, like eleven minutes. 11 minutes. That's 43 actions a day. My brain at the end of this six months was literally rotting. I could feel my skills atrophy. I was miserable. Even though I got to spend a lot of time on Reddit. So 43, you'll note, is significantly smaller than 480.
[00:32:11] And you may not realize it, but we just did something really significant together here today. We found it. It's the 10x developer. The mythical 10x developer in the room today. So this stuff matters. A few seconds here and there really add up. I encourage you. Use a stopwatch. Profile. Monitor your activity.
[00:32:32] Every now and then. And seriously try to optimize your own feedback loops. And if you're not able to, if your app is just too slow, then you can always just implement features off to the side. Find your happy place. And iterate quickly and then integrate later if you have to. It's really important.
[00:32:46] So next up, I want to talk about one of the contributors to slowness that we just mentioned. Painful test data. Because controlling test data is really hard. Now how much control each of our tests has depends on the testing data strategy that we apply. So for example, you might use like inline model creation in every test.
[00:33:02] So you have a lot of control over, over how your test data is set up. Some people might use fixtures where you have a pretty good start point for a schema every time you run your tests. If you have a lot of complex relationships or if you need a boatload of data to do anything interesting in your application you might curate a SQL dump that you can prime and load at the beginning of each test run.
[00:33:21] And then other pe places like who either can't or choose not to control their data have to write tests that are self priming. If I wanna, test some behavior that requires an account I have to use the app to first create an account and then I can run my test. So that's None of these are good or bad per se, but it's important to note that you don't have to pick just one means of setting up data in your application and you're allowed to change it midstream.
[00:33:41] So if we look at the testing pyramid, maybe, inline we agree is like a good way to test models because it's very explicit and we have a lot of control. Maybe fixtures are good for integration tests because we don't want to creatively keep creating users when we could just have a default one.
[00:33:55] Data dumps, I think, make a lot of sense for smoke tests, so that we don't see, four minutes of execution inside of our factory RB file on every single test run. And then, you probably have no other option in if you're gonna write any tests against staging or production, other than self priming, because you probably don't want direct database access.
[00:34:11] What I've found is that in slow test suites, data setup is normally the biggest contributor to the slowness. I don't have proof of that, but it feels truthy, so I made a slide. I encourage everyone, though, to profile those slow tests. And if necessary change your approach to how you control your test data.
[00:34:26] Speaking of stuff getting slower, let's talk about one of my favorite phenomena, super linear build slowdown. Our intuition about how ta long our tests are gonna take to run really betrays us. Because if we write one integration test, and it takes five seconds to run, we assume that means, if we write 25, it's gonna take 25 times as long.
[00:34:42] And if we write 50, it's gonna take 50 times as long. The reason we assume that is because it means we think oh the duration of one test, It means that we're spending 5 seconds in test code, because it's a 5 second test. But that's not how it really is, because we also spend some time in app code, and some time in setup and teardown.
[00:34:56] In fact, we probably spend more time in app code than in test code. The test is pretty small. And we probably spend a couple seconds, setting up our database. So maybe we're only spending in that 5 second test, 1 second in test code. If we add five tests, that means that while the app is getting bigger, those features are starting to interact with one another, so the app code's gonna get marginally smaller as things get bigger.
[00:35:14] And we're gonna spend more and more time in setup and teardown as our models get more complicated. And that test, that first test which we did not change at all, is now taking, what's that 6 plus 1, 7, 7. I'm not really great at math. That's 7 seconds. Where it was 5 seconds. And that's not a big deal, right?
[00:35:28] That's just a couple of seconds. You can see the deviations right there. It's not a big deal. It's not that big. Until we start to talk about more tests. If we had 25 tests. Maybe instead of 3 seconds per test, it's now 4 seconds spent in app code. And maybe 6 seconds spent. Because we have a lot of data set up now.
[00:35:44] A lot of factories or something. And so now the same 1 second test, the first test that we wrote, is taking 11 seconds instead of 7. And we start to see this just geometric curve, right? Go way up. But now we take it to 50 tests and things start to get really complicated all like tangled up and now we're looking at 18 seconds per test and we have to zoom out the graph because now it's 900 seconds.
[00:36:04] So halfway through our journey of building 25 tests, we added 150 additional seconds into our build. Above and beyond what our intuition told us we should have had. And that second half of tests, we added 500 seconds. As a consultant, it's shocking to me how often I hear from teams who are like, Yeah, our build's a little too slow.
[00:36:20] And then three months later, it's Oh my god, our build is nine hours, please help us. Out of nowhere, like they don't feel it. Cause it's counterintuitive. Track this stuff. In fact, what I encourage everyone to do is avoid the urge to create a new integration test as if by rote for each and every new feature that you write.
[00:36:35] Instead I try to just han handle a couple of integration tests that zig zag their way through all of my different features instead. In fact, that's a better way to test the interactions that real users are gonna do, as opposed to just having and now I've got this model, and this crud.
[00:36:48] Feature by feature without any interaction between our tests. Early on, too, a fun thing that you can do as a team is you can make any arbitrary decision you want. You might decide let's gonna cap our build at 5 minutes or 10 minutes. And once we start creeping up to 9 minutes, say, we can all say okay, now we gotta delete a test or we gotta make stuff faster, before we can add this next test that people wanna write.
[00:37:08] It's really effective and it's it's drawing a line in the sand and I've seen a lot of teams have a lot of success with it. On to our last topic. This is my favorite. False negatives. So what's false negatives about? What it gets at is this question. What does it mean when a build fails?
[00:37:24] And immediately someone will answer it means the code's broken. Nope. Because the followup question is what file needs to change to fix the build? Usually we have to update a test to fix it. Oh, so a test was broken, not the code. And then they scratched their head and I have to define what a true and false negative is.
[00:37:42] So a true negative, a red build means that something was actually broken in a meaningful way. And the fix is that we have to fix our code and make our code work again. A false negative is a red build means we're unfinished. We forgot to update a test somewhere. And the fix is, go update some test somewhere.
[00:38:00] True negatives are great, because they reinforce the value of our tests. When our managers pay us to write tests, they don't know false negatives exist. They think that every test failure is like a bug that doesn't escape into production, right? So they make us feel really good. But what I've found is that in practice, on big teams, they are depressingly rare.
[00:38:18] I can count on my hand like three or four in the last several months that were really true negatives like we really caught a bug with our gigantic test suite. Yeah, bummer. False negatives, meanwhile, they erode our confidence in our tests. They're the reason why our build bums people out.
[00:38:35] Every time we see a build failure now it's Oh, chore, I gotta go update all these tests. We start to feel like we're slaves to our tests. And that's really negative and draining and demoralizing. Now, oddly enough, the top causes of false negative test failures are redundant code coverage, right?
[00:38:49] We updated that models test and the model, and then we forgot that it was going to have a whole bunch of other incidental dependencies on other stuff. And slow tests, because if it's so slow that I'm not running all of my tests before I push, where I could catch it early, instead I'm pushing that up to CI, outsourcing it, and then creating a lot of work for myself later in finding out oh, shoot, I broke a bunch of other stuff.
[00:39:09] I. e. when you have a lot of integration tests, you tend to deal with a lot of clean up from false negatives. If you've been tuning out for most of this talk, the TLDR is please write fewer integration tests. That's really all there is to it and you'll be a lot happier. So I encourage you to track, especially if this is a new concept to you, I encourage you to track whether every build failure was a true or false negative, and then how long it took you to fix it.
[00:39:32] Because that's the kind of data that you can use to analyze root cause problems in your test suite, and then use that to justify your investment in making broad based improvements to your tests. You know what? We just talked about five things about test feedback. That means we won, we did it, we reached the end of our journey together here this morning.
[00:39:48] If this talk bummed you out and felt like it was like a little bit, too close to home. Remember that no matter how bad your tests are, this guy right here, I probably hate Appleworks more than you hate your tests. I'm here from Test Double, like I mentioned, if you're trying to hire senior developers onto your team, you're probably having a really bad time right about now.
[00:40:09] At Test Double, we've got awesome senior developer consultants and we love working with existing teams. And we can help you with this kind of stuff. So get a hold of me, I'm gonna be here all week. We've got Josh Greenwood, he's around here somewhere, as well as Jerry D'Antonio, who's giving a great talk on concurrent Ruby tomorrow afternoon, I hope you check that out.
[00:40:26] If you want to be like us and focus with a mission on improving how the world builds software and focusing on these kinds of problems and helping others get better consider joining us. You can hit us up at join at testdouble. com. But most importantly, and I'm going to be here all week, I hope this was valuable to you.
[00:40:41] I got lots of stickers to give out and business cards and stuff, and I would love to meet you and hear your story. But most importantly, there's, thanks for sharing your precious time with me. I really appreciate it.