TESTS FAILED

All software must fail. All machines must fail. All people must fail. Eventually.


What a hook, right? Wowza. Anyway. This one sort of goes on. Josh goes nuts and gibbers on about software engineering principles for a time. He, uh… sort of ties it all together? Maybe relates it to novelwriting? Hopefully? If you’re short on time, or doubtful, scroll through until you see Hans Landa again.

tumblr_n2iw0ybhou1tvjti3o1_500


 

The trick of engineering machines (software or hardware), is to accept a rate of failure and accommodate. Then you have to answer important questions: How often can a thing fail? What are the consequences of failure? What is the cost of a failure? Then you have to look at your given resources and decide how best to accommodate failure.

If failure is acceptably rare (say, one time in a million iterations), then it is acceptable failure

If the consequences of failure are minimal (can you just turn it off and on again?), then it is acceptable failure.

If the cost of failure is minimal (a D-Flange failed, but luckily, new ones cost $1.85), then it is acceptable failure.

Longevity, reliability, and safety are just ways to relatively measure the failure rate of a device. How many recoverable errors before your computer’s storage can’t recover those bits? How many cold starts can your car battery take before it gives up the ghost? How many times in a thousand doses does the radiation therapy machine kill its cancer patients?

All machines must fail. But certain failures are unacceptable. Loss of life, or personal harm, cannot be accepted. But if the function of a machine is inherently dangerous (say, providing radiation therapy to cancer patients), the possibility of harm must be addressed, and its chances must be reduced to as close to zero as possible.

Software is no different. A machine’s programming is just as an important part of a machine as its mechanical pieces. If software fails, the machine fails. The makers of the device have to accommodate, and address any inherent dangers.

Designing software for failure is tricky, though. Software is intricate, and requires the correct functioning of thousands upon millions of moving pieces in order to perform basic functions. Given an ever-changing ecosystem of moving parts and an environment completely outside of any control, even the simplest application is at risk for failure. And because programs can be run so often, and so quickly, probability starts to do weird things. The chances of one failure in a million become a certainty when executing a million times in a row. Failure becomes unacceptable when it is a certainty.

That said, wrenching on code with the knowledge of certain failure makes one paranoid. And defensive. And leaves one with a combined sense of dread and wonder at how anything works at all. But you learn to accommodate. You learn what pieces are most likely to fail. You test and validate your inputs. You close your file handles as soon as you’re done with them. You flush your buffers even though you don’t need to. You write your code to fail first, and fail fast, and to die in a spectacular, dramatic fashion so that somebody notices. You expect your code to die. You make an art out of dying. Ideally, your code should be better at failing than it is at succeeding.

835

But nobody is putting money into software that just fails. Software, like any machine, has to perform a required function. Whether it’s booting up your computer, or simulating the metro system in Cairo, or beaming X-rays through a tumor, it has to do something. It has to perform, eventually. You can put a lot of effort and resources into making a program die well. But those resources are in competition for what was originally intended for the program to do. Restrict those resources and some degree of compromise must occur. Higher failure rates have to be accepted. How well software is constructed depends on how well its problem is understood, and how the risk of failure was addressed, and mitigated, given finite and often limited resources.

Software engineers have known this for some time. Making good software is hard, and even more difficult in dangerous situations, but it has to be done. We were building software for space flight back in the 1960’s, the risks for failure being the death of astronauts and the failure of the space program. Young computer programmers learn about failure quickly, with their first fumbling attempts fraught with error codes and stack traces. They learn that the world is an evil, uncaring place, full of malicious, devious input values. And they learn that even the most well-meaning user, trained in every aspect of the software, will eventually do something dumb and destructive. They learn about Margaret Hamilton’s work on Apollo 11 on the same day they learn about the infamous Therac-25.

[record scratch, freeze frame, Baba O’Riley starts playing] Hi. You might be wondering how you got here. Don’t worry. I, too, thought that this was a blog about writing, not some regurgitated Wikipedia article about software engineering. Fret not, my friend. It’ll all make sense soon.

Let’s say, somehow, you’ve magically, no, miraculously put together software that performs well, and fails in a sane, organized fashion. It does what it’s supposed to, and when the unforeseeable occurs, it plays out a classic death scene both appropriate to the character and in direct service to the plot.

Let’s say it’s the program that controls the LEDs for a new brand of bicycle lights for Bespoke Bicycles (intended to be a fake company, sorry if this is a real one). That’s right: a program controls your bike lights now. And your program does what it needs to: turn on, wait one second, turn off, wait one second, then repeat. Cheap and easy, everyone loves it. The only way that the software can fail is if the batteries run out. If that’s the case, then they just put in new batteries.

Then the customers say, “It needs a toggle button.”

You ask, “Why?”

They say, “It’s because we don’t want to have to take out the batteries every time we get off the bike.”

You say to yourself, “Okay, that makes sense. I just thought that the light needed to blink. I wasn’t thinking from that perspective.”

The hardware guys design a button. You don’t have to make any changes to your software as the toggle just cuts the power. On, off. Easy. Your blinking software is perfect.

Then the customers say, “We want different kinds of blinking.”

Confused, you ask, “Different kinds?”

“Yeah. Not just on or off. Like… different.”

“But… a light can only be on or off. That is the nature of light. And… electricity— Do you mean you want it to be dimmable?”

“No, like… on— off— quick flash— slow flash— another quick flash then a slow flash—” They go on, detailing in a strangely rhythmic fashion the nature of the light switching. “I mean… Almost like it’s random. But not.”

“But… why? That would be annoying as hell, wouldn’t it?”

“That’s the point. It’s a bike light. It’s so other people can see you. And being annoying is the same thing as being seen. Bicycle rights.”

“Oh. That makes some sense. I mean— We’ll have to add in… Then we’ll have to.. Hmm. I’ll get back to you.”

You go back to your software. The facility for waiting in between blinks lets you wait for a variable amount of time. But based on the new requirements, the timing mechanism doesn’t let you wait for less than a second. You need sub-second timing. Plus, your beautiful blinking loop now has to be dirtied with a bunch of random timings. But that’s okay. The hardware guys give you a sub-second timer, and tell you how to use it, and now you’ve got a really annoying bike light.

“Oh, no… sorry,” the customers say. “We wanted the normal blinking, too.”

“But it’s… It’s not nearly as annoying. I made the new one considerably more annoying.”

“But that’s just for… Well… Sometimes we want to be annoying. Other times… we just want normal blinks. Can’t we have both?”

Goddamnit. You go back to the drawing board. The hardware guys laugh at you. Then tell you that it’ll be too costly to redesign the bike light chassis for an extra button. You’ll have to make do with just one button.

“How the hell will I do that without some sort of switch? Do you expect them to yell voice commands at the light?”

They humorlessly tell you that it’ll be too costly to redesign the bike light to include a microphone.

“What, then?”

They inform you that the power button can serve a dual purpose. If they do some clever wiring, and put power control in the hands of the software, then you can just click the button for a short period of time to toggle between blink modes, and hold the button for a long time to cycle the power.

You sigh. You know they’re right. But your software is getting more complicated. You have a timer at your disposal but now you have to use it to check for both button presses, the length of the button presses, and to do the different styles of blinking. But you’re the pain wizard, and a damned good one. You figure it out. Under budget and on time.

Time passes. The customer requests three more blinking modes to suit whatever mood. They request a charge indicator light, activated only when you tap ‘Shave and a Haircut’ on the button. Blue for charged, red for needing a charge. Then they want rechargeable batteries. Charging over USB cables. Different color lights depending on rider profile. GPS to detect when you’re riding in a federally-mandated low annoyance level bike light zone. Twitter and Facebook integration. Everything but the kitchen sink. Your code is a bit of a mess, but the customers are happy.

Their last request throws you. “We want it to be waterproof.”

It makes a lot of sense. The light probably does strange things if water gets inside, if it works at all.

You talk to the hardware guys. “So, just some rubber gaskets and we’re good, right?”

They shake their heads solemnly. They say it will require a new chassis. And if they’re going for a new chassis, they might as well switch to the new industry-standard BikeLightWare platform rather than keep with the expensive proprietary design you’ve had so far.

“Oh. That’s fine. Will BikeLightWare run my blinker software?”

Oh, how they laugh and laugh.

The new platform is entirely foreign. You sift through the manual. You can’t even see where they’ve got a sub-second timer. Or a timer at all. The hardware guys refuse to add one, saying that they only support what comes out of the box from BikeLightWare.

Ok. Fine. “I’ll make my own,” you say. But even you hear how uncertain your voice sounds. Everything you had, every intricate piece stacked atop the next, now has to be rebuilt. Do you know all the pieces? Do you even know how many pieces there are? How will you know things work the same? You sigh. You’re looking at months, maybe years of thankless work just to get things back to ‘on and then off again’-style blinking. You curse yourself and those damned hardware guys for not thinking about waterproofing. Of course it rains when you’re biking! Of course there are puddles! How could we have been so short-sighted? And how in hell did BikeLightWare become the industry standard? They don’t even have a sub-second timer! And even if you get all of this working, you can’t just hand all of this over to the paying customer and expect them to tolerate anything less than what they’ve had before. They don’t understand the change. They just wanted the dumb light to be waterproof.

In this harrowing example, everything would have been different if our hero had done things a little differently.

Start off with a waterproof bike light chassis? Sure. But maybe one didn’t exist when Bespoke Bicycles was in its infancy. Or it was too expensive. Or maybe they were just making bike lights for people living in the desert.

What about writing software against standardized bike light hardware to begin with? Same answer: maybe one didn’t exist. But more likely, the standards were designed by committee, and just suck.

The simplest solution would be to say ‘no’ to waterproofing. You can always say ‘no’. Besides. Customers don’t know what they really want. Customers are the worst. You got into the bike light business for the art. Make something for yourself, not for your customers. Yeah, but… Successful software needs users. Customers are users. And customers need to be happy, otherwise they stop being users.

lvvrp

No. The one thing that our bike light codemonkey hero didn’t do is test. Granted, he made sure it worked before pushing it out the door. And he fixed whatever issue the customers presented. But he did not test, in terms of software engineering.

Software testing is as much a part of software engineering as the actual writing of code, despite the fact that it is often neglected. Software testing is the process of formally defining the specific behavior of your software, and validating that behavior for function, quality, and the absence of unintended consequences. Software testing validates not only how software performs, but also how it is intended to fail.

For instance: your Bespoke Bicycles brand bike lights need to change color if you hit the button three times in one second. To test this, you will need to ensure that the bike lights change color when you successfully depress and release the button three times in less than 1000 milliseconds. If you press it one or two times, it ignores the input. If you press it more times, it ignores the additional button presses, but still changes the color of the lights.

A bike light, as simple as we can imagine it, can be complicated. And if something changes, or if features are added, the likelihood of failure increases. If an underlying component changes (say, the timer, or a migration to the shitty BikeLightWare platform), then someone, or something, needs to validate our color changing rules. If any of those behaviors are violated, then the product won’t be able to change colors, and is considered defective (a failure). Defects cost money. User dissatisfaction costs money. Bad reputation costs money. Failure costs money. As stated earlier, risk of failure needs to be addressed and mitigated by the software engineers, or they aren’t doing their job.

Formally describing the behavior of your software isn’t hard. Well, it is. But it’s no harder than anything else you, as the lone programmer at Bespoke Bicycles, has to put up with. It’s as difficult as being able to put down a list of instructions in how to test a feature. Or, to put it another way: it’s as easy as telling a story. For each behavior, you should be able to tell a story about how a user interacts, and what the intended result is. If you can’t, then you have to wonder how you’re supposed to write code for the behavior in the first place.

Programmers figured out that testing is invaluable to their development process. Having a battery of tests fully documenting existing behaviors means that, at any time, all behaviors can be validated. It would be tedious, but someone could sit down with the bike light testing steps and validate every single feature. Or, you can do as the rest of programmers do: automate, and have the computer do the testing for you.

Huge amounts of study in software engineering focuses on automated software testing. So much so that entire development strategies are based around it. Some methods even say that you shouldn’t write any code before there is a test written for that behavior. But in my experience, practicality and deadlines will always dictate how much and how often something is tested. It’s fantastic to be able to tell someone that, “Oh, yes, I rewrote the entire stack for the BikeLightWare platform. And all of the tests are passing. Thundercats are go.” It doesn’t always happen.

But remember, friends: all machines fail eventually. If you test your software, all that means is that your software will fail less in the ways that you tested for.

For instance, let’s get back to that three-click color change. Most customers wouldn’t have a problem clicking a button three times in one second. But some small percentage of your customers have written you strongly worded (if ungrammatical) letters saying that their bike lights won’t change color. You get a hold of one of these ‘defective’ devices. You try out the three-click change. It works just fine. What the hell?

You investigate. You ask around. Eventually, someone at the office asks if they can try it. They do it, but they do something… different. They fidget with it a little. Or they don’t depress the button right away. Or, as you figure out, they press the button three times, but then they hold the last. Who does that? That’s not in the instructions. That’s just… that’s weird, right?

You find the place in your code. You figure out that the BikeLightWare platform differentiates between “button down” and “button up” triggers. If someone keeps holding, and your software hasn’t counted three “button up” triggers, the color change doesn’t occur.

You’ve identified an uncommon but entirely possible series of inputs. It causes user frustration, and costs money as they send it back in the mail as defective. Learning from your past mistakes, you wrote a whole bunch of tests upon migrating to your BikeLightWare platform, but with everything else you were focused on, you didn’t think to test a long third button press. You told your boss that all tests were passing. Thundercats were go. But now no longer.

Unfortunately, software testing is not silver bullet for eliminating software defects. Repeat it with me: all machines must fail. But automated testing is a great tool for developing better software. For each new defect, you can write a new test. For each new fix, you can verify that the test fails at first, and then succeeds after the fix. You can make sweeping, destructive changes, and observe what succeeds and what fails. You can do this continually, automating your tests to be run every time you hit the ‘save’ button on your code. You can demonstrate to bosses that everything you knew that could break was working the last time you checked.

[record scratch, freeze frame] Wouldn’t that be fucking great if it were the same for writing a book?

1sxad5o

As I’ve boldly claimed before, each word of a novel is part of a machine with a few hundred thousand moving parts. A novel is basically software for feelings, right? And as software, and more generally, a machine, a novel must fail. And its authors must mitigate that risk. So, shouldn’t you be able to apply tests to a novel in the same way you can apply tests to software? Shouldn’t these tests make for a better novel?

A novel is a machine. But it isn’t a machine in the same way that a program or a power drill would be. It’s a subjective machine. A non-deterministic machine. For the input of each given reader, a different output is more or less guaranteed. How are you supposed to write testing rules for that? If the machine is inherently non-deterministic, how can you be expected to validate outputs?

It’s difficult, but it’s not impossible. When setting out to write, you set down your core theses, right? Good. These are your first tests. “Does your novel support your theses?” would be a great test. Unlike a scientific paper that is valuable whether the hypothesis is proven or disproven, a novel really should support your thesis. For example, what if your thesis was: “Expectations are never the reality.” But everything in your novel happens more or less as expected… FAIL. But if your theses are well supported, and aren’t just shoehorned in at the end with a final monologue from a secondary character that didn’t have anything to do with the story? SUCCESS.

Okay then. Characters? Does your book have characters? SUCCESS. Does your book have characters that are People of Color that respectfully represent their ethnic and cultural background without being insulting or culturally appropriative? OH GOSH.

Let’s stick to what we can test. Story, right? Does your book have a story? SUCCESS. Does your book have a story that is personally meaningful, universally relatable, relevant to the cultural zeitgeist, but is not derivative of other works or victim to formulaic plot and genre tropes? OH DEAR.

Mixed test results so far. Dialog. There we go. Good dialog can float everything else, right? Does your book have dialog that is not just talking heads, or people talking past each other, or have characters explaining information that other characters already know? HOPEFULLY. Do all of your characters have dialog that is strong enough to withstand a really, really bad audiobook recording, where the voice actor reads everything in an accent reminiscent of a bad Swedish soap opera? OH NO.

You scour the Top 10 Writing Tips. You Youtube the Top 10 Rookie Writing mistakes. You descend the depths of TVTropes. You gather your tests. The things that you must do. The things that you must absolutely not do. You keep them posted next to your bathroom mirror. You recite them before you go to bed.

You work your way down the list as you make your edits. Fixing some things break others. You fan your draft out to beta readers. They find a bunch of things wrong that never occurred to you until they brought it up. Their opinions on blinking lights differs from yours considerably. A ton of new items make it to the list. But after time and toil, finally, you’re down to your last checkbox. The last test says: “Did you write the novel you want to read?” You wrote this a long time ago. It should really say: “Have you worked on this for so long that you now hate this strange thing you have created?”

YES.

And so you publish. Thundercats are go.

And then you are told that your book is not waterproof. It should have been waterproof from the beginning. Why wasn’t that on your list?

Readers that paid money inform you that you failed more tests. That your novel has plot holes, and leaks, and seams. That characters are made less interesting for the choices you made for them. That the story is hard to follow. That the dialog is discomforting. That you failed the Bechdel test. You don’t feature enough women, or people of color.

You pull your list of literary tests out of the trash can. You look them over again. You had hundreds of tests. A battery of tests. And you passed all of them.

But you knew about the other tests. Or you knew they must be out there. And if you didn’t, some were common sense if you put your mind to it. But you didn’t. You put your mind to everything else. You put your mind to the discrete, finite list you had said would formally describe the behavior of your novel. And if all of your tests passed, then you would have a better novel.

But all machines must fail. You, as the designer of that machine, have to manage a set of limited resources in order to address and mitigate the risk of failure. You can reduce risk by testing and eliminating failures. But something will always come up.


Here is where you may start reading if you elected to read the abridged version of this godforsaken tome.

ekvdrti


Mike and I published our first eBook back in October and the paperback in November. So far, we have had pretty good responses. We had plenty of beta readers, plenty of editing. But there are still there are problems. Problems that, unlike with software, we can’t go back and fix. Software is mutable, and ever changing. Software can fail, and look great doing it. Books… less so. Books are abundant, and low in value. An imperfect book can be dropped and replaced with a new and better one near instantly.

We recognize the problems. For most, we even know how we created them. A lot of our issues stem from our first big problem when we arrived at our first draft and began querying publishers: Is the book too long? FAIL. Weighing in at 300k words, nobody would have accepted something of that size from first-time authors. It was too big, and too costly to edit and polish. High risk of failure, with unacceptable consequences. So we made a hard choice. We decided to serialize. To split up our work into shorter, well-polished volumes. We decided later that we would go the self-publishing route, which confirmed our decision to split/serialize, as otherwise volumes 1 through 3 would have been too big to finish by the end of the year.

But large, sweeping changes cause problems. Like waterproofing our fictional bike light, our chassis got swapped out, and Volume 1 become its own unique machine, with its own unique problems. We had pushed out content to beta readers, but only ever included all three volumes. After the split, we never had fresh eyes on Volume 1 and other than our editor. And that is what we published. Which is like waterproofing your bike light, but then only testing it out with people who live in a desert.

Later, we were politely informed that we did not pass the Bechdel test. We were asked, “Do you know what the Bechdel test is?” We said yes. We knew about it. Did we pass the Bechdel test when the book included Volumes 1, 2, and 3? SURE. Did we pass the Bechdel test after the split? NOPE. In software testing, this is called a ‘regression’, where changes to parts of the software cause other parts to fail. Fitting, then, that a regression defect causes our first novel to be socially regressive. Shwoops.

uvz4qys

Okay. But that’s a problem of sample size, isn’t it? Mike and I can’t beat ourselves up over that. In the effort to make a more focused, polished product, we had to make some choices. It wasn’t a conscious choice to not have two women talk to each other about something other than a man. It wasn’t a conscious choice to not have two women talking to each other at all. But we did consciously decide on some issues. And those issues got addresses. Besides, pick a small enough part of a whole, and it’ll fail whatever test you want. Call it ‘testing gerrymandering’, if you like.

The further you walk down the list of failures, the less defensible things get. There are bigger criticisms you can’t discard or excuse because of sampling size. It’s a valid criticism that there aren’t enough women on board our fantasy airship. It’s a valid criticism that there aren’t explicitly stated people of color. And it doesn’t matter that we know about it, and thought about it, and cared about our choices. It doesn’t matter that there might be reasons in the story for it. Those reasons aren’t in the book. They aren’t in the appendix. So we failed the test. But I suppose no novel can be perfect. All machines must fail. And we keep building them anyway.

Writing software with the knowledge of certain fallibility makes one paranoid, defensive, and a little fatalistic. Writing fiction fills me with the same feeling. Every word you write might be spelled wrong, or be unintentionally alliterative, or even worse: rhyme. Every sentence you write has a good chance of missing a word, or repeating one. Every line of dialogue might be read strangely, or seem inconsistent with a character. Every paragraph may be logically inconsistent and in conflict with the other.

We have processes in place that look like testing. We have lists of feedback from readers, with TODOs next to them as placeholders, reminders for us to fix them. We have a file called TheJerkList, which is a place where we can be honest (and jerks) to each other about our writing. We do read-throughs and multiple passes, each focusing on story, pacing, story, tone. Each pass involves reading the book, end to end. Our test validations are not automated. But the tests are documented somewhere. It’s just the error-prone humans that can’t be relied upon to evaluate the tests properly. I’m thankful I have Mike to read my edits day to day. We catch each other’s bullshit all the time. We have to. Otherwise, who knows what kind of stuff I would have missed.

gi0eg

A few weeks ago, I was at some friends’ house, getting ready to play board games. A couple was there who I had met before, but hadn’t seen in some time. They were trying to set up a game of Dominion. I was trying to discreetly give one of my friends a physical copy of the Volume 1, of which she had been one of our earliest beta readers. She happily accepted the book, and showed it off to everyone.

“Did you guys know that Josh wrote a book?” she said, indiscreetly.

“Oh, cool!” the acquaintences said.

“Co-authored, but yes,” I qualified. I am not a creature that belongs in the spotlight.

“That’s awesome. How long did it take you?” they asked.

“Oh, four or so years. Most authors can crank out a book a year, but I guess we’re slow.”

“That’s fine. Wow. And how long did it take to get published?”

“Oh, well, we self-published.”

“So, on Amazon or something?”

“Right. It’s not that big of a deal. I mean, the self-publishing industry is sort of the Wild West right now. Pretty much anybody can crank out a novel these days.”

“Well,” the acquaintence said. “You shouldn’t do that.”

“What?” I asked.

“Don’t diminish your accomplishment like that. You’re selling yourself short. Don’t do that. You should be proud. Most people never write anything.”

I was sort of taken aback. Partly because I’d only met this person twice. Second because I didn’t realize what I was doing. I don’t take compliments well. I thanked them. I continued shuffling Dominion cards.

If our books ever see much success, I’m sure we’ll find more things wrong with our books. We’ll keep writing. Jeez, we sort of have to, now. But the upshot is that we’ll become better writers. And while we can’t test our books in the same beautiful, automated way that software is tested, we can take the same SUCCESS / FAIL list from the first volume and apply it to the next. And hopefully we’ll improve our ratios.