Quality Assurance and Code Quality
When software is produced, particularly when produced by a team as part of a business, there are lots of ways in which the production of the software could, in the final analysis, be considered unsatisfactory. And un-satisfactoriness spells bad news, for the team and particularly for whoever’s managing the team, which, given you’re reading this blog, is probably you.
If you want to avoid bad news you could try hiring the most talented people you can and hoping to not to get unlucky. But hoping not to get unlucky is the sort of strategy that works until it doesn’t, and often you only get the opportunity for it not to work once.
The smart approach is to put in place processes to ensure quality assurance, and that’s what this blog is about. We’re going to look at what “good” is when it comes to software, from meeting requirements to less visible aspects of quality. And we’re going to look at the processes by which software can be assessed and quantified, focusing primarily on the many ways in which software can be tested.
The hard way of Quality Assurance (QA)
The first sort of testing which happens once the code has been written, while a particular feature or set of features is on the path to being classed as “done.” This testing is typically called Quality Assurance testing, or Quality Assurance (QA), and it has several significant features.
First, Quality Assurance (QA) should be an internal process. It is best performed by the team to verify that what they are producing is worthy to be seen by stakeholders outside the team (and teams will often resist letting other members of the organization see or play with software until it has passed Quality Assurance (QA) in the same way that you wouldn’t serve up your signature rustic bean casserole to your significant other’s parents without tasting it first).
Quality Assurance can be a big job. In large teams, there is often a dedicated Quality Assurance engineer who does nothing but, all day every day. In smaller teams, it typically either falls onto the product owner or project manager, or the developers themselves share the burden. The one golden rule is this:
The person who wrote the code for a piece of functionality should never do Quality Assurance on that piece of functionality.
Does it do what it says it does?
Quality Assurance (QA), if done rigorously, typically comprises three distinct activities. The first is to take a strict and literal interpretation of the spec and to check whether the software does what the spec says it should do. If the spec says something should happen when the user takes a certain action, and that thing does not happen when the user takes that action, then Houston, you have a problem.
For this to work there needs to be a spec, obviously, and the spec needs to be explicit. If there is a problem with the functionality, then the Quality Assurance process needs to make it very clear what that problem is, and this is where the Given-When-Then (GWT) approach to specs described in the last blog really comes into its own.
Each GWT, if properly written, describes an exact, repeatable test: Set up the software as per the “Given” section, take the actions described in the “When” section, and compare the results to what is listed in the “Then” section. If the expected result doesn’t match what the tester actually sees, then the GWT (which hopefully has a unique ID or code for ease of communication) already provides the documentation for the test failure. All the tester has to do is tell the developers which GWT(s) need to be fixed.
Does it do what it doesn’t say it does?
The second aspect of Quality Assurance is to mitigate the inevitable incompleteness of any spec. No document can possibly specify what should occur in every possible scenario and nor, thanks to our old friend the Imagination Problem, will they ever in practice even cover all the relatively plausible “edge cases.”
That means the coders are likely not to have considered all the edge cases, and therefore it may not be known until you do Quality Assurance (QA), how the software will behave in those cases. Most of the time, even though there is no correct behavior specified, a Quality Assurance engineer will know incorrect behavior if they see it—if the software crashes, if it displays incorrect information and so on.
So part two of Quality Assurance (QA) is to uncover the behavior in the edge cases and document any incorrect, or possibly incorrect behavior. Or, more informally, this is the bit where the tester tries to break the software by doing weird stuff to it.
Documentation at this stage is absolutely key. When a tester raises a bug, the developer’s first action is to try to recreate the issue themselves—if they can’t find the problem they can’t understand and fix it. But because we’re in the world of edge cases here, often the problem discovered by the tester will only occur if a very specific set of actions is taken, and the tester needs to describe those exact actions.
If the software crashes when the tester enters “Hello I am a walrus” into the email field, it’s no use them saying, “It crashes when I enter an invalid email,” because the developer might try to reproduce the bug by entering “invalid@@email.notavalidemail,” and find that the software doesn’t crash at all for them, and find themselves at an impasse.
Whereas if the tester specifies exactly what they put in the email field, the developer can put that in too, observe the crash, and through observing it, realize that, for example, the crash occurs if the contents of the email field have spaces in them—which they’d never have discovered by putting in “invalid@@email.notavalidemail.”
This isn’t a trivial problem. Hours, and I mean hours, of developer time are wasted trying to track down bugs that are poorly specified. It’s bad enough having to deal with nebulous descriptions from customers (“It doesn’t work when I log in”), so when the descriptions come from people who are being paid to write them, you really must expect better.
To help, some teams adopt quite formal structures for bug reports. Often there is a quick description of the problem at the top, followed by a detailed “repro,” i.e., the specific steps the tester took to cause the issue, culminating in a sentence describing the expected behavior, and then a sentence describing the actual behavior.
To make bug reports tighter still, you can also enforce the following process on whoever does Quality Assurance (QA): When they identify a bug, they jot down what they did in the lead-up to the bug occurring, as a set of repro steps. They then start again from scratch, following the steps exactly, to see if the bug occurs again. If it does, they can then pass the bug on to the developers.
If it doesn’t, they need to keep trying different things until they recreate the bug, and keep going until they can perform the exact same set of steps twice, and get the bug both times.
I will concede, though, that this is pretty onerous, particularly if you don’t have a dedicated Quality Assurance (QA) engineer. If your developers have been lumped with Quality Assurance each other’s work in between writing their own code, you will find that some have more patience for Quality Assurance (QA) than others, and it may be that they just won’t be as obsessive-compulsively precise as you’d like, because they’d rather be coding.
If this is the case, you’ll need to agree with the team a minimum level of diligence that must be applied to Quality Assurance (QA). Remember that there’s a trade-off: the less time the team spends on verifying and documenting bugs in Quality Assurance (QA), the more time they’ll spend cursing the poorly documented bugs when they then have to fix them. Hopefully, a happy medium can be found.
Does it do what it said it did?
The final part of Quality Assurance (QA) is “regression testing,” which means testing to see whether new functionality has introduced a regression, which means testing to see whether the stuff that used to work still works now that there’s new stuff. This is important because new stuff breaks old stuff all the time. And I mean, all the time. To a frightening degree.
This presents a problem because when working on a large, mature, feature-rich application the amount of existing functionality that could be broken can be vast. And a regression could be found not only in the functionality as specified in the original spec for the old features; it could also be found in some obscure edge case. So a completely thorough regression test for a new feature would actually mean re-testing every single test ever tried for any of the existing functionality.
This is obviously impossible, or at the very least so massively impractical as not to bear thinking about. So another trade-off is needed. If regression testing is to be done manually (and we’ll look at automation later on in this blog), it must be sized to fit the time available to the tester.
This may mean agreeing on a standard set of tests that cover the basic functionality and running through those every time, only adding to them when big chunks of functionality are added. Or it may mean working with the developers to make an educated guess about where, if the new functionality were to have broken the old, those broken bits would likely be found, and limiting regression tests to those areas. Or some combination of the two.
Coping with failure
If in basic functionality testing, edge case testing, and regression testing, every test gets a check mark and the new software passes with flying colors, then Quality Assurance (QA) is complete and there’s nothing more to be said. However, that absolutely never happens, and so when your Quality Assurance (QA) reveals many test failures, don’t worry.
In fact, paradoxically, the more tests that fail, often the smaller the problem. If there’s one big problem, it’ll stop the tester effectively running any of the tests, so they’ll almost all fail. For example, suppose you’re making desktop software with an installer, and the installer is broken.
The tester can’t install it, so can’t pass any of the tests. Which is great, because all you have to do is fix that one problem. In this case, fixing the installer will instantly fix the majority of the tests. What you need to look out for is lots of individual test failures all dotted around because that’s an indication of lots of separate bugs to fix: remember that in software, two small bits of work take longer than a single large one.
So what happens when you have test failures? Well, the tester documents the problems and sends them back to the developers, who prioritize fixing those bugs overdoing anything new, and then re-submit the software for testing once the bugs are fixed, and keep going round and round until every test passes. Simple.
Except it’s not actually that simple. The big problem is that given enough time examining a non-trivial change to a piece of software, any tester worth their salt will almost always be able to find a problem with that change. Fixing the problem will necessitate making another change, with which the tester will probably be able to find another problem.
The Quality Assurance (QA)/bug report/code fix loop is potentially infinite. But professional teams often barely have enough time to do even a single thorough round of Quality Assurance (QA), and certainly can’t do more than three or four.
So how do teams break out of the Quality Assurance (QA) loop and release, despite these inevitable test failures? Well the good news is that some errors spotted by testers aren’t really errors at all; rather, they’re matters of opinion on design and UX. Often testers will say things like: “The designs only show what the message box should look like when there’s a single line of text in it.
We don’t have a design for multiple lines of text, and the software currently bunches the text really close together, and it looks pretty ugly to me.” In a situation like this, the designer can be brought in to adjudicate, and they may well say, “It looks good enough to me.” In which case the “failure” can be ignored.
Similarly, the testing process may draw out previously unspotted UX consequences, that aren’t an indication of a bug so much as an identification of a flaw in the initial design or spec. And again, the consequence of this may be that the designer and product owner confer and acknowledge the flaw but agree to live with it (or, and this has just as satisfactory an end result practically speaking, they may get defensive and argue truculently that it’s not a flaw, it’s a perfectly reasonable consequence of an entirely watertight spec, thank you very much).
Equally, upon consultation with the developers, it may turn out that some bugs are an inevitable consequence of some technical feature that is hard or impossible to remove. If the bug is so dramatic as to ruin the user experience entirely that presents a serious problem, but often a pragmatic conversation can be had by the team where it’s decided that the bug can be lived with because there’s some workaround or mitigation.
e.g., “If it happens and they get locked out of their account they can simply email us and we’ll reset their credentials at our end.” This sort of thing is seldom ideal, but it happens all the time, so don’t beat yourself (or your team) up if it happens to you.
Finally, some edge case bugs may be deemed so obscure as not to be worth fixing. A diligent tester may pick up on problems that will only happen in such rare scenarios (“If two users with the same name register their accounts on the same day in different years and one of them upgrades to the premium package while the other one is on their free trial period and we happen to be running this particular special offer at the time, the other one will get fifty ¥ of free credit”) that it’s not worth the time to fix them because you’re betting that the scenario won’t crop up in the real world. Or that when it does crop up, hopefully, it’ll be far enough into the future not to be your problem anymore.
All the above are what are often called WONTFIX scenarios (as that was the name of the label applied to them in a particular piece of popular bug tracking software), and they act as a constant reminder that we don’t live in a perfect world. The one other way in which we prevent testers from finding an infinite stream of bugs is by limiting the time in which we allow them to look for them.
Sometimes, when a piece of software has already been through a couple of rounds of Quality Assurance (QA), and the pressure is on, it’s worth gently suggesting to the testers that they don’t look too hard for bugs this time. If there are obscure problems that you probably wouldn’t fix anyway if you found them, it can be better not to find them—better, that is, for the morale of the developers, who like to maintain the illusion that it’s possible for them to produce something bug-free.
Rest assured that if there are any serious bugs remaining, you’ll almost certainly hear about them from your users eventually anyway. In the meantime, it can be better to emphasize the “good” in “good enough.”
Just accept it
Once your software has passed Quality Assurance (QA), you may well want to do some form of “user acceptance testing,” or UAT. The term makes the most sense when the end users interact directly with the software team, for example when the software is an internal tool that has been commissioned by the department that will be using it.
Once the software is built, you could get the people who will be using the software day-to-day to try it out and solicit their seal of approval. In other scenarios, such as when the end user is a customer, UAT is typically performed by either someone who is a proxy for the end user, or the stakeholder who greenlit the project in the first place, or the person whose neck is on the line if the software fails.
In all cases, the person performing UAT needs to be someone who signed off the initial spec, because the primary purpose of UAT is to verify independently that what has been built is what was asked for, and that the user, or their representative, accept the software as a satisfactory fulfillment of the initial requirement.
There is, however, a second, more sneaky purpose to UAT, which is that it transfers some of the responsibility for the quality of the software onto the stakeholders’ shoulders. If software passes UAT it is as though they have said, “We have inspected the software thoroughly and as far as we are concerned it is fit to be released. Do so with our blessing.”
If later a problem is found with the software, then the blame is shared by stakeholders, because they should have spotted the problem before the software was released. Or, at least, that holds true for certain varieties of the problem: if the initial spec was badly thought out at the start, it should now be apparent for the stakeholders to see and act before the software is released.
Likewise, if the software fails to meet the functional aspects of the spec. However, be aware that UAT-performing stakeholders are almost never experienced professional testers, and therefore they can’t reasonably be expected to do rigorous edge case or regression testing, so it is not their job to spot the non-obvious bugs, and therefore not their fault if those bugs slip through.
To minimize friction and discontent between the developers and the stakeholders, the team’s manager should attempt to adhere to the following rule:
UAT should never throw up any surprises
There should never be a bug that gets spotted in UAT that didn’t also show up in Quality Assurance (QA). It’s worth getting your Quality Assurance (QA) testers to think about how your UAT testers will interact with the software to triple-check this. Nothing erodes trust like a show-stopping bug that gets found by a stakeholder after the software team has claimed it has passed Quality Assurance (QA).
Likewise, if there are WONTFIX bugs thrown up by Quality Assurance (QA), it’s important that the UAT testers are told about them before they try the software. Send them a list of “known issues” so that if they hit one of them, it doesn’t worry them so much. If you’re not confident that the stakeholders will get through their testing without something unexpected going wrong, your software isn’t ready for UAT.
Where there’s smoke
Once your software has passed both Quality Assurance (QA) and UAT, it’s ready for release into the wild. Up until this point, your software will be accessed in some sort of test environment—for example, it may be available at http:// test.mywickedawesomesite.com rather than http://www.mywicked awesomesite.com, or the app is only downloadable via some beta testing system rather than in the app store. That’ll change when you release it, and your software will end up “in production,” as the jargon has it.
In theory, if your deployment process is smooth, and if everything has worked well in the test environment, it should work exactly as well once released. However, successful releases, in theory, lead to congratulations, promotions, and raises only in theory, and a theoretical raise isn’t worth the paper it isn’t printed on. Successful managers live in practice, not in theory, and in practice, you’ll want a reliable way to verify that the release has been successful.
The final sort of manual testing I’m going to mention does exactly that, and is called “smoke testing.” A smoke test is a brief sanity check to make sure that a piece of software basically works. Often it will be a cut-down version of the regression tests performed as part of Quality Assurance (QA). It will be cut-down not because it’s less important than Quality Assurance (QA)—if anything, it’s more important—but because at this point you really, really shouldn’t be finding any new bugs, and endlessly repeating lots of passing tests is a waste of everyone’s time.
Equally, once software has been released it can be a bit tricky to test, because your actions may have real-world consequences: if you want to test buying something you may have to enter real credit card details and actually get charged, and if you want to test deleting a user’s account it’ll actually get deleted. So, create a list of steps for your smoke test that has a balance between thoroughness and practicality, and update it regularly as your software updates.
Finally, make sure it’s very well understood who is to perform this smoke testing. If you have a dedicated Quality Assurance (QA) engineer it could be them; it could be the person in charge of releasing the software; it could be you. Don’t do what I once did and assume someone had done the smoke test, only to find out the next day that the release had broken the login page and our users hadn’t been able to access our app for over twelve hours.
The other hard way
The stuff I’ve described above sounds quite a lot like hard work, doesn’t it? And worse, a lot of it sounds like repetitive drudge work—repeating the same tests over and over, doing the same thing every time and expecting the same result. Plenty of tools exist for the automation of tests. You can make software that “exercises” other software, putting it through its paces by acting just like a user and clicking buttons, entering text, and reading what appears on the screen.
Typically it is controlled by writing a “script” for each test that contains a sequence of actions to take followed by one or more expectations for the subsequent state of the software, where the expectations are couched in terms of what is visible to the testing software—which is what would be visible to an actual user.
So long as each test is clearly specified in advance, the testing software can zip through the scripts, often in a matter of mere seconds, and can present a count of how many tests resulted in the software meeting the expectations specified (i.e., how many of the tests passed and how many failed). If done correctly, the amount of human effort involved can be reduced by several hours per day.
This sounds glorious, verging on too good to be true, and indeed when a story surfaced on Reddit in 2016 of a Quality Assurance (QA) engineer who managed to entirely automate his job within a couple of weeks of starting, and then managed to spend the next six years playing computer games and going to the gym without his managers even noticing, commenters were quick to point out the implausibility of the tale.
The only tests that can be automated in this way are Quality Assurance (QA) regression tests and deployment smoke tests, and the only reason you’d give one person full-time responsibility for running and maintaining a set of tests would be if the software being tested was being constantly changed—otherwise there’d be no risk of any of the tests ever failing.
But if the software is constantly changing, that means the regression test suite would need constant updating to make sure it comprehensively covered the core functionality. Six years is a long enough time in the software world that it’s unlikely the software at the end would remotely resemble the software at the start, so the mere job of continually updating the scripts ought to occupy a reasonable amount of time.
Furthermore, the idea that one would give a Quality Assurance (QA) engineer responsibility for manually running just the boring repetitive regression and smoke tests is a bit unlikely. Normally the trade-off for doing the boring bits when they’re not automated is getting to do the more fun stuff as well (i.e., trying to break new functionality). So even if the engineer managed to automate part of their job, that’d just mean they’d have more time to spend on the other part.
All that being said, there is no doubt that Quality Assurance (QA) automation offers many desirable benefits. You can get away with fewer testers, who can spend more of their time hunting for exotic bugs. It can lead to more rigorous and reliable testing. Machines aren’t subject to human vices such as sloth, so won’t ever skip a few tests because they’re feeling lazy and would really like a longer lunch break so they have time to get across town to that new ramen bar.
Furthermore, you can reduce your “cycle time”10 with Quality Assurance (QA) automation, since if a task isn’t complete until it’s deployed, and you can’t deploy until regression tests have run, then if regression tests only take two minutes to run rather than taking two hours and having to wait until a tester has two hours to spare, automation can practically eliminate that whole step.
The trade-off, though, is that the setup of Quality Assurance (QA) automation is time-consuming. It’s also a bit of a niche skill, because it normally involves writing some software to interpret and execute the scripts, so you have to have some ability as a coder, but you also have to make sure that the scripts cover the right things, so you have to be able to think like a tester.
Normally you’ll find that anyone who has the ability to write software ends up writing the software to be tested, not the software doing the testing, because the former is a more obvious business priority than the latter, and it’s very hard to resist business priorities.
It’s why testers often don’t know how to code: if they knew how to code they’d be pressganged into becoming full-time developers. Managers of development teams can do tremendous long-term good by making the case for Quality Assurance (QA) automation to the rest of the business, so that they can carve out time as early as possible to put in the leg-work to set it up, maximizing long-term rewards.
However, I will concede that it is often the case—particularly in the ship-it-or-go-bust world of tech start-ups—that the short term priorities really are more important than setting up automation, because no one cares if a product that never made it to market was supported by a superbly efficient Quality Assurance (QA) process.
I have no hesitation in asserting that automated tests of the sort described above are A Good Thing, because it seems self-evident to me both that regression tests and smoke tests are A Good Thing, and that the ability to get a computer to do them quickly and reliably is also A Good Thing.
I say this, because the next sort of test we’re going to look at is much more controversial, and while I’m in favor of it, some very intelligent and experienced people disagree with me. We’ll get into the pros and cons in a little bit, but first let’s dive into what these tests actually are: In this next section I’m going to be talking about tests that isolate chunks of code within a piece of software and test those chunks.
When the chunks are small, the tests are often called “unit tests,” and as they get bigger they are often given names like “functional tests” or “integration tests.” These are tests written by software developers, and they’re almost always written using the same programming language that the main software is written in; they’re stored in source control alongside the main software code, are normally written at roughly the same time as the bits of code that they test, and are often subject to the same review process as the rest of the code.
For example, suppose you were building a calculator app. In the previous blog, we talked about how coding involves creating conceptual models with interacting entities that have different responsibilities and abilities. Let us suppose that in our calculator app’s code we have an entity called the Interface that is in charge of “drawing” the user interface on the screen, complete with all the buttons, and noticing whenever the user taps any of the buttons.
There is then a separate Calculation Manager whose job it is to keep track of the buttons that have been pressed and work out what calculation to perform. It outsources the actual calculation process to the appropriate one of four Operators (called Addition Operator, Subtraction Operator, Multiplication Operator, and Division Operator), passing them the numbers to act on and receiving the result, which it passes back to the Interface to display.
If we now consider one of the main requirements of our calculator, namely that it be able to perform division, we can see that each entity must work in a particular way for the division to work. For example, the Calculation Manager must know that when the Interface tells it that the “÷” button has been tapped prior to the “=” button, it must pass whatever numbers have been entered to Division Operator rather than any of the other Operators.
And when the Division Operator is passed two numbers and told to divide them, it has to, well, divide them. Getting into more detail, if the Division Operator is passed two numbers that don’t divide to give a whole number result, it needs to respond appropriately, probably rounding the number to a certain number of decimal places, depending on how you want your calculator to function.
And if the divisor passed into the Division Operator is zero, the Division Operator needs to respond sensibly so as not to crash the whole app: probably notifying the Calculation Manager that an error has occurred, and relying on the Calculation Manager to work out what to tell the Interface in response.
What’s happening here is that having defined our conceptual model, we are in turn defining the requirements for each component in the model, resulting in essentially a miniature spec for each component, saying how it should behave in each situation. The purpose of unit tests is to verify this spec for each component individually. So you might have a set of unit tests for the Division Operator that pass different pairs of numbers in and check what comes out.
These tests would include situations where the result was not a whole number and would verify that the result has the desired number of decimal places; and a situation where the divisor was zero, and would verify that the result was an appropriate error notification.
Higher level tests like integration tests then do something similar, but test how well individual components work together—so you might end up with some tests that check that a result returned from the Division Operator makes it back to the Interface and gets shown to the user without being modified.
Some code bases, teams, and companies will shun such tests entirely. Others will absolutely insist on them as a means of ensuring software quality. They will require that for any given piece of functionality, there should be at least one high-level test documenting how the thing is supposed to work, and several unit tests, including some to cover the edge cases.
They may have a semi-automated “continuous integration” (‘CI’) pipeline set up, which ensures that when a piece of code is written it actually cannot be committed to source control unless every single test in the code base is passing, and there are tests in place to cover all new code in the code base.
So that’s what these internal tests are. The question is then, what’s the need for them? To which question there are three main answers. The first is that tests reduce the number of bugs in the software. A test checks whether the software works as expected, and makes it very apparent if it deviates from expectations.
There’s a counter-argument here. It’s garbage, but it’s fairly common and is often parroted by coders who have never worked with automated tests and don’t want to have to start because they think it sounds like hard work. I repeat it here so that you can recognize and refute it should the need arise.
It runs like this: Bugs are mostly found in edge cases, and only in edge cases that the person writing the code didn’t consider at the time (if they’d considered them they’d have found and fixed them). Automated tests can only test for specific edge cases that are thought of by the person writing the tests.
Since that person is the same person writing the code, the only edge cases that automated tests can test are the ones the developer could think of, which are ipso facto the ones the developer will already have made sure are bug-free. Therefore tests can only ever be redundant.
This argument is awful because it completely misunderstands the sorts of bugs that automated tests catch. The point about putting in place a bunch of tests for a piece of code you write isn’t to find bugs in that code when you first write it. No, the point of those tests is so that when you, or another developer, write a bunch of additional code that involves changing the original code and introduce new bugs in the old functionality, the tests you wrote beforehand will notify you immediately of the new bugs.
Bugs creep in when code changes, and that’s what tests protect you from. There are many arguments against writing automated tests, but if you ever hear the “But it only tests the stuff I know is working” one, dismiss it, and roundly rebuke whoever said it.
The second benefit of tests is documentation. Done correctly, tests can tell you what the software does as a whole, and the role each component part plays. This is particularly useful because tests are the only form of documentation that doesn’t go stale. By which I mean, most documentation is accurate only at the point it is written, because after that point the software changes. It’s notoriously hard to keep documentation up to date.
This is partly because software developers tend to like writing code but dislike writing essays, so they’ll allow themselves to forget to update the corresponding documentation when the software changes.
Attempts to combat this through putting the documentation next to the code itself, in the same files, through the use of “code comments” (words that a computer ignores when it’s reading the file, used to allow developers to communicate to one another) are also prone to failure, with comments being updated more slowly than the code surrounding them, leading them towards inexorable obsolescence.
Whereas a test describes a situation and what the software should do in that situation, which is basically all that software documentation needs to do. When the software’s behavior changes, if the test isn’t updated to describe the new behavior it’ll fail when run, and that failure will force the developer to update the test.
You might worry that since a test takes the form of a piece of code, the tests might be no clearer at documenting what the code does than the code itself. But fear not, because test code is (or at least should be) a breeze to read. Many test frameworks enable the developer to use a domain-specific language, or DSL, to write their tests.
A DSL is sort of like a mini programming language that is designed for a specific context, or domain, and sacrifices flexibility (it doesn’t really work outside its intended context) in order to be really expressive in that context. DSL’s designed for testing enables you to write a test whose purpose is really easy for someone else to read. For example, read this bit of code:
I very much doubt you would have trouble describing the behavior this test tries to verify, and it would be fairly easy to read this as documentation of how the login screen should work. Underscores and brackets aside, the DSL this code uses (a subset of Ruby, with convenience methods provided by Capybara and RSpec, if you’re interested) lets you write almost plain English to describe actions and subsequence expectations.
The final major benefit of tests is code quality. We’re going to talk more about what that means beyond the mere absence of bugs a little bit more below, but for now, I’m going to focus on one aspect of code quality, which is resilience to change. Change, which is fast becoming the villain of this whole blog, causes code to have to be rewritten to accommodate new requirements for software’s behavior. Depending on how the code was written in the first place, it may be easier or harder to make changes without breaking everything.
Having tests, as mentioned above, makes it easier to tell if you have broken something, but there’s another benefit: writing tests forces your code to be modular. The reason is that unit tests, the ones that test individual chunks of code can only be run if it’s possible to separate code into little chunks in the first place.
If your code is one big sprawling mess, it’s really hard to pick out an individual bit and write a series of tests for how that bit should behave and then get those tests to run correctly.
So to be able to write tests in the first place, you find yourself steered away from big sprawling messes. Which is pretty valuable, because experienced developers who should know better, even with the best of intentions, often have a tendency to veer towards big sprawling messes.
They’re easier to write, at first, because you don’t have to think through the details of a conceptual model. You just throw bits in as and when they’re needed until you end up with a Heath Robinson contraption that works for what you need it to accomplish right now, but heaven help you if you want to change something.
If automated tests do all this, why doesn’t every developer use them all the time? The main answer is, as you probably guessed, all about time. Tests take time to write, and you can end up spending far more time worrying over how to express a particular requirement as a test than you do actually writing the code that the test tests.
Tests add more code that has to be reviewed, and more things that have to be changed if the intended behavior of the software changes. The whole process of setting up CI so that test failure is flagged up can be a non-trivial time expenditure at the start of a project when everyone is keen to make more tangible progress.
And equally, there can be a big overlap between what’s covered by the automated tests written by the developers and what’s covered by the testing done by the Quality Assurance (QA) engineers, manually or otherwise. Since developer time can cost more than Quality Assurance (QA) engineer time, sometimes it seems like nixing automated tests is the best way to avoid duplication of effort.
More than anything else, though, whether a team uses tests has more to do with the preferences of the developers, and those preferences are informed largely by prior experience and area of expertise. On the one hand, the benefits of automated tests become more apparent the more used you are to working with large code bases that require updates, and therefore novice developers tend not to see the point of testing, while more experienced ones have seen the benefits firsthand and are converts.
On the other hand, different programming cultures place a varying emphasis on tests. Cultures tend to form around languages, and it’s fascinating the way that different language-cultures have varying attitudes to testing. In my experience (and I’ve yet to find any studies that confirm, refute, or even address this at all), people who use Python or Ruby, for example, tend to love tests, while C# and Java users are 50/50 on them, and the C++ and Objective-C crowds ignore tests entirely if they possibly can.
There’s one more aspect of testing that you should know about. It’s by far the most controversial, inspiring passionate love and passionate hate in equal measure. It’s something called “test-driven development,” or TDD, also known as “test first” development. The basic premise is that, rather than write some code and then write some tests that “prove” that the code works, you should do it the other way around, writing the tests first and then writing the code to make them pass.
That doesn’t sound like the sort of thing that should inspire particularly strong feelings, does it? To understand what’s going on here, let’s look a little bit deeper at the philosophy behind TDD.
TDD, popularized by an Agile founding father called Kent Beck. You’ll recall that XP dictates that in each sprint there’s a stakeholder embedded in the team whose job it is to provide continual refinement and clarification of the spec for that sprint, to make sure that what’s built is exactly what’s wanted by making sure the spec exactly describes what’s wanted. The other half of this is ensuring that what’s built exactly matches the spec, and simply leaving that to the developers isn’t nearly extreme enough for XP.
Instead, XP dictates that between them the developers, working with the stakeholders, should translate the spec into a series of tests, which the written code must pass in order to be proven to meet the spec. Since the tests define what code needs to be written, the rule is that no code can be written until there is a test in place (i.e., a formalized requirement) that will fail until that code is written.
Furthermore, the code can only be written if writing it will cause a test to pass. This means that it is completely forbidden to write any code that does anything that is not described by a test (and therefore described by the spec). So the software isn’t allowed to have any functionality that isn’t explicit in the spec, no matter how trivial. Nor are developers allowed to try to preempt future requirements in the code that they write—their sole focus is on making the tests pass.
This is touted as a significant benefit by TDD proponents, because as noted in the previous blog, developers see software from a different angle to users, so when they strike out “off-piste” and build in extra bits and pieces in advance, they’re liable to head in the wrong direction and waste their time building unnecessary things.
A common source of confusion in TDD is what exactly the tests should test. There’s often a conceptual gap between the spec, which describes how things should behave as perceived by a user, and unit tests, which test the behavior of a chunk of code whose output may not be directly visible to the user at all. To combat this, a chap called Dan North came up with the notion of ‘behavior-driven development’ or BDD, which is essentially TDD with a few more specifics about how it works.
He advocates starting by writing a test that describes how things should look and respond to a user. When that test is in place (and it should always start off as a failing test because there is no code yet to do the things that the test is testing for) the developer should think about the first chunk of code they might write to make the test pass.
They should then write a “unit test” describing the behavior of that chunk (but only the behavior needed to make the original test pass, not any further behavior that chunk may need to exhibit), and once those tests are in place, they can write the first chunk of code. Then they think about the next chunk of code and write a test for that, and so on until they have all the chunks written to make that very first test pass, and each chunk has relevant tests of its own.
The TDD/BDD way can also be misinterpreted and lead to some pretty terrible results. Because it tells you to focus on writing code to pass one test at a time, it’s a bit like building a house one room at a time and completing each room before moving on to the next. This is great, except that there’re only so many stories you can build before your ground floor, lacking appropriate reinforcements, collapses under the weight. And just think how complicated your electrical wiring is going to be.
To counter this, a mantra has evolved called “Red/Green/Refactor.” This reminds one that the first one should write a test that fails, then write the code to make the test pass, and then “refactor,” or rework, the new code to make it tidy and fit in nicely with the code that was in place already.
Advocates of test-driven development will argue that if you combine the tenets of BDD plus Red/Green/Refactor or any of a plethora of other conventions and practices, you will end up with well-written, future-proof code that is resilient against bugs and a pleasure to work with.
Its detractors will argue that to do test-driven development and end up with code that’s worth a damn you have to combine the tenets of a plethora of conventions and practices, and you’ll waste so much time getting wrapped up in myriad processes that you’ll never get anything done. They will claim that following such a method rigorously is pointlessly difficult and time-consuming. It has even been described as “like abstinence-only sex ed: an unrealistic, ineffective morality campaign for self-loathing and shaming.”
However, whether or not TDD is harder to do, or slower, is only relevant if it leads to better code than code produced without using TDD—for example, code where the tests are written after the rest of the functionality. Unless it can make that claim, there’s no point using it at all. And while the academic studies in this area have some issues with selection bias, etc., so aren’t 100% reliable, they do fairly uniformly show no noticeable improvement in code written test-first.
The best one can really say for it is that, while it is not a magic bullet, some developers find it a very effective way of focusing them on the task at hand and helping them to design software that is fit for its purpose and flexible. But some don’t.
So far we’ve talked about software quality in terms of whether or not the software does what it’s supposed to. In this last section, I want to turn to a type of quality that is less palpable. It matters because it is a type of quality that is often in short supply, and its creeping effects can be just as lethal to a project as the functionality failures and bugs we’ve been discussing so far.
The truth is, there’s always a trade-off between speed of development and quality of work. Sometimes (always), there’s internal or external pressure to get things done quickly, quicker than it is possible to produce top-quality code. In such situations, there are several compromises you can make to speed up development. You can reduce the scope of the work, making a piece of software that simply does less than what was asked for.
This is often fairly unpalatable to bosses and customers, so it’s the option that’s most often swept off the table as soon as triage negotiations begin. Alternatively, you can lower the bar for bugginess in code (that is, the frequency and severity of situations where the software should do one thing but instead does another thing/nothing) either by spending less time hunting for bugs in the first place, or by finding and acknowledging bugs but choosing not to fix them. This is also a hard choice to sell.
Finally, you can sometimes produce work quickly that meets a set of specifications and is comparatively bug-free, by accumulating what is known as “technical debt.” Technical debt is, essentially, shoddy workmanship that’s not immediately obvious to the user. It’s the concealed flaw in the porcelain jug that means it works fine for now, but one day, just when you least suspect it, the crack will turn into a split under the weight of the water and you’ll be left with a handle in your hand and wet shards all over the floor.
So what form does technical debt actually take? First of all, you may recall from the last blog that I touched on this very question and described technical debt as a set of conceptual models that are a poor fit for the software’s functionality. This sort of technical debt is often caused by a change in functionality without enough time given to updating the conceptual model; instead, the old model is jury-rigged to meet the new requirements, and things get fiddly.
However, “bad model” technical debt can equally be incurred without any change occurring to the requirements if there is enough time pressure at the start of a project to prevent decent planning of the conceptual model in the first place.
The second form of technical debt arises from what we might call the Pascal Problem. Blaise Pascal, a fanatical devotee of the written word, wrote amongst other things a series of letters weighing in on the ecumenical beef between the Jansenists and the Jesuits in the 17th century. Realizing, when coming to the end of a particularly hefty epistle, that he really had gone on a bit this time, he wrote apologetically, “this letter is long only because I had not the leisure to write a shorter one.”
Software developers suffer the same problem. There are a million ways to write the code for a given piece of functionality, and some are more or less efficient than others, both in terms of how quickly a computer can execute the code and in terms of its brevity and ease of reading by a human. There are elegant and inelegant ways of writing the same thing, and normally coders first write code the inelegant, inefficient way and then go back and try to make it better. Shortage of time can cause developers to omit that final step.
Finally, there is a type of technical debt worth mentioning that has nothing to do with time pressure. This one is what’s occasionally known as “worse than failure” code, or WTF code. This sort of code is often ingenious, elegant in its own way, and does indeed do what it’s supposed to, but is still hugely problematic. It occurs when a developer gets hold of a novel idea, and applies it, completely inappropriately, to a problem best solved by a more conventional approach.
It happens more often than you’d expect because coders are a creative, ingenious bunch who are liable to fall in love with an idea and blind themselves to its faults. They’ll find a way of making it work, but sometimes the results are horrifying. Imagine popping the hood of a troublesome car to discover that the engine is of an entirely custom design, large parts of which have been intricately carved out of a single block of marble.
It’s beautiful, it’s clearly the work of a genius, and when it’s running maybe it works like a dream. But pity the poor mechanic who has to try to repair it when something goes wrong.
Technical debt matters when a bug is found and someone has to look at the code and try to understand why the bug is happening and how to stop it. If the code is hard to understand, or if changing one thing breaks something else, you’ll find that fixing bugs takes longer than it should, your deadlines will be jeopardized, and your team will be demoralized.
It also matters when you’re asked to add new functionality, and you find that once again, understanding what has gone before and adding to it without breaking other things is hard because the code is obscure or has unexpected side effects. Once again, progress will be slow, sometimes quite breathtakingly slow, deadlines will loom, and spirits will sink.
The best way to deal with technical debt is to stop it appearing in the first place. There are things that coders can do, things that automated tools can do, and things that you, the manager, can do.
What you want from the coders, of course, is for them not to write code that contains technical debt. And a major problem for them is recognizing tech debt when they see it. When they get stuck into the details of how each line of code works, it can be hard to take a step back and see the wood for the trees.
Normally tech debt is about the shape of a chunk of code rather than a problem with one particular line. This is another reason why code reviews by a second developer are particularly valuable: a fresh eye can spot awkwardnesses that the original developer, mired in the intricacies, is blind to.
Code review is, however, time-consuming and painstaking, and it would be better if there were a way of flagging up issues without requiring so much of another developer’s time. And lo, there is such a way. Static code analyzers (or “linters” as they’re occasionally known) are software programs that read the code and, rather than carrying out the instructions written, evaluate how well-written those instructions are.
They’re very good at spotting (and in some cases automatically correcting) poor formatting (lines that are too long, inconsistent use of spaces and line breaks, the sorts of things that make code marginally harder to read), but in recent years they’re also getting better at detecting more serious signs of technical debt. Some can measure the complexity of a file of code, and warn if it exceeds a particular threshold, using the reasonable premise that there’s (almost) always a simple way of expressing code, and the simpler code is better.
Some linters can even spot what are called “code smells.” The term was coined by Kent Beck of XP/TDD fame and is used to describe a set of characteristics that, while not necessarily and fundamentally bad, nevertheless are generally indicative of code that could be better written. For example, there’s a code smell called “feature envy,” which is where one chunk of code makes use of lots of functions defined in one other chunk of code.
If this happens it suggests that the logic in the first chunk might belong in the second chunk or vice versa, i.e., that there’s something wrong with divisions into separate entities of the conceptual model. A linter (or a developer), can recognize feature envy in a chunk of code and flag it up as a sign that the conceptual model needs some work.
Some teams will build linting tools into their CI pipeline, so that before the code can be committed into the main body of source control, not only do all the tests need to pass, but the linter(s) must give the code the thumbs up. This sort of constraint is helpful because without it developers, being subject to human weaknesses, will always be tempted to be lazy and ignore the warnings thrown up by the linter.
Of course, even if in general your developers are in favor of having such requirements to save them from their own bad habits, that won’t stop them cursing said requirements vociferously every time they think they’ve finished writing a feature and the linter rejects their changes due to some trivial formatting error.
Code review and automated tools can only do so much, however. The number one tool for avoiding technical debt is time: time for your developers to try out approaches, evaluate them, rework them, and occasionally rewrite them from scratch. The more time they have, the more likely they are to find the well-judged, future-resilient, elegant, readable way of doing whatever it is they want to do.
And this is where you come in, because getting hold of time is itself a full-time occupation, and therefore something that your coders don’t have time to work on.
You, however, can work to get them at that time. If you have bosses, customers, or clients pushing for things to be done quickly, you can push back. They can’t be expected to fully appreciate the long-term benefits of low technical debt compared to the short-term joy of speedy releases, and so it’s your job to fight that particular corner and make sure that those long-term considerations make it into plans and schedules.
Give your developers time and, unless they’re complete numpties, they’ll use it wisely to optimize their code in the less visible ways, and this will stand you in good stead down the line.
Of course, in reality you never manage to prevent technical debt entirely. It creeps in despite your best intentions, and at a certain point it reaches a level that is noticeably slowing down your development. How can you tell if you’re beset by tech debt rather than simply being saddled with slow developers or particularly hard-to-write features?
The best thing you can do is sit down near your team for a couple of hours and just listen: the greater the number and volume of expletives uttered at seemingly random intervals, the more tech debt your team is encountering. Nothing infuriates a developer more than having to work on the lousy code. Or, if you want a more straightforward way of telling, just ask your developers.
They’ll be keenly aware of the extent to which the existing code is getting in their way. If they say there’s a lot of tech debt to wade through, believe them. At that point, you need to make some decisions about what to do about it.
Developers can normally untangle any particularly knotty code given enough time. Oh look, it’s our friend time again! It turns out time is the currency in which technical debt is both accrued and discharged, and yes, technical debt accumulates interest: it takes more time to clear it than it would have taken to avoid it in the first place. Once you’re serious about clearing your debts, you need to come up with a structured repayment plan.
There are two ways of going about this. The first is setting aside some protected time for working on technical debt. I once worked in an organization that had about ten years’ worth of tech debt in place, and we made a heavy dent in it by setting up fortnightly Tech Debt Thursdays, where the entire team devoted the whole day to make the existing code better without adding any functionality.
It reduced our output of new functionality by 10% in the short term, but we successfully convinced our non-technical boss that over time it would increase our velocity by significantly more than 10%, since the overwhelming messiness of the existing code base meant that we were going at a crawl anyway whenever we tried to deliver new features.
The developers liked it because I gave them free rein to pick any area of the code they liked to work on. They all had pet hates that were very well aligned with the most productivity-killing bits of tech debt, and they relished the opportunity to fix them up.
Sometimes, though, an explicit drop in output of even 10% in the short term is too bitter a pill to swallow for the higher-ups, in which case more covert approaches are necessary. My favorite is the Boy Scout Rule, that one should always leave the campground cleaner than one found it.
Applied to code, this means that every time writing a new feature or fixing a bug forces a developer to change a file with existing code, it is that developer’s responsibility to find a way, however small, of reducing technical debt in that file. A quick reworking (or “refactor” to use the vernacular) can often make a file clearer, or adjust the purpose of a part of a model to make it a better fit for the problems being solved by the software.
The Boy Scout Rule won’t fix the sort of tech debt where the conceptual model has gotten hopelessly tangled and large chunks need to be shifted around, but there’s a tremendous amount that can be achieved if done in small increments. Best of all, it’s a mostly surreptitious way of reducing tech debt: if questioned you can spin what you’re doing to say that you’re simply adhering to best practice in the process of writing new functionality, rather than stopping writing new functionality to fix old problems.
Sometimes, however, your developers will come to you and tell you that a particular piece of software is so bad that it can’t be fixed by reworking it; it has to be rewritten from scratch. When this happens, you need to be very careful about how you respond. It is very likely that the developers are correct, that there is a serious problem with tech debt. However, it is also very likely that the developers are incorrect when they say that the best response is a rewrite.
This is a peculiar phenomenon, and one about which coder, writer, and all-around genius Joel Spolsky waxed very lyrical nearly 20 years ago in a blog post entitled “Things You Should Never Do, Part I.” Broadly, his explanation is twofold. First, coders like building new things in general, in the way that all engineers do. It’s more fun to be the architect designing a new building than a maintenance person keeping an old building running.
Second, coders believe that code should be simple, elegant, and beautiful and the real-world code never is. Coders think they can do better. However, the reason why the real-world code is never beautiful is that the real world is a complicated place, and there are exceptions to rules, obscure bugs to address, and flaws in external systems to accommodate.
For the code to work in the real world it needs some lumps and bumps. Therefore, Spolsky argues, if you rewrite from scratch you may start off with beautiful elegant code, but it’ll need the same adjustments to cope with the real world, so you’ll still end up with ugliness. The best you can hope for is less ugliness.
His conclusion is that in almost every case since your goal is to achieve not beautiful code but slightly less ugly code than what you have now, the fastest approach is never to rip up what you have and start again but to slowly and patiently rework and improve what you already have.
Spolsky is, in general, entirely correct, and I would always recommend making his article mandatory reading for any team that has to work on old, ugly code. However, there are specific cases where it does make sense to rewrite things from scratch, and I’m going to close out the blog by looking at some of them.
The reason there was a case for a full rewrite there was that it had been written originally in a language called Java by an offshore development agency, and then the whole system was taken over by an in-house team, none of whom knew Java. Since the SSO was going to end up at the heart of our entire authentication system, that was a major problem—developers can normally busk a little bit of code in any language you care to name, but you shouldn’t be busking anything to do with security.
Similarly, the same team once had to deal with a website that was written as a client-side application (i.e., a series of scripts that ran in the user’s browser), when in fact various interactions with other components that needed to be added only made sense if the whole thing was shifted to run on a server. You can’t just move code from client-side to server-side, it’s like putting an outboard motor on a car and calling it a boat. The technology choices had to change, and that meant a full rewrite.
Second, if you have a very small application and you need to add so much functionality that the old code will end up as less than half of the final code base, at that point you may find that the old code would need so much refactoring anyway (because the conceptual model will have changed so much by the time you’re done), that it may be quicker to start from scratch.
This one is very much a judgment call to be taken with advice from your developers (and a hefty pinch of salt—remember that they will almost always be biased towards the from-scratch approach). But it is often the case that a significant repurposing of a small existing code base is slower than a from-scratch rebuild.
Finally, just because the desire many developers have to work on new, so-called “greenfield” projects is often a personal preference rather than an objective judgment of efficiency, it doesn’t follow that that desire isn’t important. We’ll look at this more in blog 9, but it’s really important to keep your coders happy. And sometimes, just sometimes, letting them loose on a from-scratch rebuild will be worth it just for the joy it gives the team.
There are many ways of assessing the quality of code, and the more you can do to set up formal (and ideally automated) tests to establish whether a piece of software gets the big green checkmark according to each different metric, the more you can be confident that your product will meet current requirements and that it will enable pain-free improvements down the line.
At every turn, however, comes a trade-off. Setting up systems for testing takes time, and holding your work to the highest standard means going more slowly than a quick-and-dirty approach. Sometimes it’s more important to take the short-term view, and simply get things done and worry about the consequences later. Hopefully, by now you’ve got a clear enough understanding of the implications of the different choices you can make to ensure you make the right trade-offs to suit your needs.
Survival in the Face of Reality
I don’t know whether you’ll have picked this up from the foregoing blogs, but I’m a pessimist. A cynic. A glass-half-empty kind of a guy. I’ve cultivated this outlook very deliberately over the last few years, because I find it to be extremely useful, professionally speaking. For one thing, it helps me combat the developer’s natural inclination towards over-optimism that makes accurate task estimation difficult.
For another, it leads me to prepare for the worst, which, from a technical project manager’s perspective is very helpful, because the worst happens with charming regularity. But perhaps most importantly, it encourages me to look for, and therefore recognize, and therefore respond quickly to, things that aren’t working very well. A positive attitude is great, but seeing the bright side of things means deliberately not seeing the dark side, and it’s a short step from rose-tinted spectacles to full-blown denial.
In this blog, I’ve tried to point out the pitfalls that anyone working with coders may face, and given my best advice as to how to avoid them. I’ve done my best to prepare you for the weird world that is software development, and to equip you with the tools to manage projects, products, and teams effectively.
However, things will go wrong. They just will. You should accept and embrace that fact, and look out for the signs that things are going wrong. I would be remiss if I didn’t end this blog by offering some suggestions for how you can respond when they do, and how to salvage from the jaws of defeat, if not necessarily victory, at least an honorable draw.
When your team hate each other
Have you ever had to work with people who just can’t seem to get along with each other? I have, and it can be the most painful thing in the world. Sometimes it will have started because one of them will have done some specific thing that infuriated the other, who will have responded in a way that caused equal resentment from the first party, and a cycle of escalation of enmity has ensued. Sometimes there doesn’t seem to be any particular cause; two people just rub each other the wrong way almost from first sight.
Either way, being in the middle of it is no fun. Communication has a tendency to break down, and you find yourself acting as a go-between. Or you find that one of your colleagues will respond badly to any suggestion, question, or plan that you bring to them if they suspect that it originated from a certain another colleague.
Any time you’re alone with one of them you’ll find them trying to get you to join in with a thorough defamation of the other, and you spend more time defusing quarrels than you do getting stuff done.
I don’t believe that developers are any more prone to this sort of feud than any other type of person, but it can be particularly destructive when two developers don’t get one. One reason for this is that software development is an intensely collaborative process, that relies on constructive criticism: pair programming, code reviews, and so on are all designed to allow someone else to point out what’s wrong with your code, in order to make that code better.
But a forum for constructive criticism can turn into an opportunity for twisting the knife if approached with malicious intent. Similarly, a coder team depends on consensus. If two coders can’t agree on how to approach a large task, they can end up building the equivalent of a vehicle that’s got the front end of a pickup truck and the back end of a motorbike.
And there is ego in the world of software development just as there is ego everywhere else. I once hired a senior developer who, it turned out, had some serious insecurity. He was very intelligent and very experienced— much more so than me, which is why I had hired him. But therein lay the issue: since I was heading up the team, he reported to me, which put him on a par, hierarchically speaking, with the other developers who were largely less experienced than him.
I was slow to realize this at first, but he felt that his position in the organization didn’t reflect his senior status, so he tried to impress his seniority on the rest of us by other means. Specifically, he took it upon himself to point out “how it should be done” whenever anyone spoke to him about anything technical. This meant implicitly (and sometimes explicitly) criticizing how it currently was done, which was, in most cases, how his colleagues had done it.
This, as you may imagine, frayed some tempers. Ironically, it was also entirely counterproductive: far from engendering respect, his attitude had the effect of causing the others to dismiss him a chronic winger, meaning they were far less likely to take what he had to say seriously (even though, to give him his due, his suggestions were almost always on point!).
Things got awkward. In meetings, attitudes ranged from confrontational to sullen to passive-aggressive. Decisions couldn’t be made because any opportunity for a dispute was seized on, simply to provide an opportunity for antagonism. People sulked for days at a time and worked excruciatingly slowly during that period.
Now, very clearly it was my failure as a manager to address this problem early on that led to this situation. Mea culpa, and don’t I know it. But getting to the point where the team hated each other meant that I gained some valuable experience in how to get back from that point, and it’s that experience that I want to mention here.
My first attempt to solve the problem by talking was by introducing a sort of sprint retrospective,3 even though we weren’t technically working in sprints at the time. Every Friday afternoon we’d all get together and chat about what was working well in the team and what was going badly. This was, at first, an utter disaster.
There were two issues. The first was that, come Friday afternoon, everyone was tired out by a full week of work, particularly so since the fraught team dynamic made for an emotionally draining working environment. This meant that a time set aside for constructive discussion swiftly descended into venting and ranting, and everyone came away feeling worse than they had when they started.
We improved the situation by moving the meeting to a Wednesday morning when everyone was still relatively fresh and therefore much more civil. But the second, more deep-rooted issue with this meeting was, of course, that we already communicated very poorly as a team in meetings: that, indeed, was the very problem we were trying to solve. Once bad habits had been established between the parties who couldn’t get on, it was very hard for them to snap out of it.
So I tried a different tack and started taking my team out for coffees, individually, at various points during the working week. In a private and informal setting, we had frank but good-tempered discussions about what was and wasn’t working, what each person’s frustrations were, and what they and I could do about it. The new developer was receptive to the idea that throwing his weight around might not always have the desired effect of establishing respect amongst his peers.
Other team members acknowledged that always dismissing and shutting down the new developer’s contributions would make him feel insecure and might actually be a cause of his continuing criticisms. Finding a space where I could get each individual to look at the problem rationally, cool-headedly, and empathetically meant that I could take advantage of the fact that ultimately we all wanted the same thing: everyone wanted to get along, and those coffee breaks enabled constructive discussions about how to make that happen. We didn’t fix the problem entirely this way, but we did make some progress.
There’s only so far you can go by changing people’s attitudes to a situation, however. At a certain point, you’re going to have to change the situation itself, and this is what I worked on next. Part of the problem was that every conversation the team was having was about tech, and that was the topic that was grinding everyone’s gears.
It was time to find some common ground. Thankfully, one thing that everyone in this team had in common was a taste for beer, which made a good start. Despite personally being barely able to hold my drink, I started coercing everyone out of the office to the nearest pub a couple of times a week for a swift pint after work.
Somehow, evening socializing has a different flavor to a trip out for lunch. Knowing one is done with work for the day means one’s happier to forget about it and talk about other things. Guinness’s in hand, our team started chatting, and immediately found common ground: for example, it turned out almost everyone, including the new developer, was into rock climbing, and suddenly they were sharing tips about good places to go and agreeing on techniques, equipment, and so on.
This was the first time I had seen earnest agreement on anything, and for all it sounds trivial, I do think it was important for everyone to see that general agreement was possible.
One of the underlying causes of tension was that the backlog of work facing the team was so large, and at times felt insurmountable. For the existing team, it felt like they were being asked to produce more than was possible, which made it doubly irksome when the new developer started criticizing what they did manage to produce.
Their sense was that he didn’t accurately appreciate the time pressure they were under, that caused them to have to make compromises in order to deliver on time. This was particularly acute because I had assigned the new developer to work on a separate project with less time pressure for his first month, to ease him in gently.
He was thus isolated from the rest of the team and didn’t get to see the sort of context—deadlines, workloads, working practices—that framed their work. That was another mistake I had made, and I resolved to fix it.
If you want your team to act as a team it’s important to give them something to rally round. An easy, but an ultimately unproductive way of doing this is to find a common enemy. Tell everyone in the team that all their problems are the fault of incompetent upper management, and they’ll probably believe you, and they’ll come together in a shared hatred of the higher-ups.
Which is fine until you realize that that resentment and mistrust causes them to be less motivated to meet the targets and deadlines they’re assigned, and become more insular, communicating less and worse with the rest of the company.
So rather than find a shared enemy, I tried to find a shared goal. I moved around the roadmap to find a way of getting the whole team to work for a while on one single project, together, with an ambitious but not too ambitious deadline, to give them an achievable challenge to work towards.
Furthermore, I suggested to the team that we change our working process from the Kanban style we were accustomed to two-week sprints, and I let them decide the details of what the process would look like.
This change put everyone, old team members and new, on a level footing, because we were all getting accustomed to something new, and we all had shared responsibility for making the process work. A new, shared project and a new, shared way of working helped us make good strides towards getting on better as a team.
I wish I could say that all it took was a couple of weeks of gentle tinkering with this sort of thing before everyone was getting on like a house on fire, but that’s not true. For one thing, that “gentle tinkering” was a brutal, exhausting process for me of running around being everyone’s punching bag when tempers flared and feeling like a failure every time someone on my team was in a bad mood.
And for another, while we achieved moments of real bonding and empathy, the rest of the time all we could manage in those early days was a cautious truce. It would take much longer for everyone to really settle in together, and what ultimately did it was hiring some new people. The new hires were relatively junior and inexperienced and could be mentored by the senior developer.
This finally gave him the recognition of status he had always needed, and he could divert his desire to suggesting improvements towards guiding his charges, rather than criticizing the work of his peers.
I am entirely aware that I didn’t do a great job of getting my team to get along. Reading this, you can probably think of things you would have done in the place that never even occurred to me. But perhaps the point is this: When you are in charge of a team of people, it’s your job to get them to work productively together, and that means getting them over any personality clashes and squabbles.
Whether you’re any good at it or not doesn’t matter. You just have to go ahead and try anyway and keep trying until things improve. The good news is that even if you’re as cack-handed as me about it, you’ll probably eventually see some progress, so long as you keep at it.
When you’re horribly behind schedule
It’ll happen to you: for some reason, in your control or outside it, you’ll find that you’re working on a project to ship some software, and as you approach the deadline that you initially committed to with full confidence, you’ll realize that you’re going to overshoot by a country mile. Maybe the engineers’ estimates were wildly wrong.
Maybe a key stakeholder dropped in a massive new requirement halfway through the project. Maybe you forgot to account for a crucial task. Maybe all of the above and more. No matter how you get there, you will at some point find yourself horribly behind schedule.
What are you going to do about it? Well, there are two ways of interpreting that question. The first is, how are you going to get the project complete? As every project manager will know, any project is a balance of three things: time, resources, and scope. You can try to adjust any of these to change the outcome of a project.
Now, adjusting resources isn’t going to be very helpful to you. Adjusting the timescales for the project is an option, but be aware that that’s mostly a euphemistic way of saying that you’re going to miss your initial deadline and try to convince everyone else to be ok with it.
The final option is adjusting scope, which there is a good way and a bad way of doing. The good way is to reduce what the software does by removing features. The bad way is to reduce the amount of work needed to release the software, by skipping testing, ignoring bugs, and allowing technical debt to accrue. Note that sadly, most software projects cope with being off track by reducing scope the bad way.
How you approach getting the project to completion is for you to decide based on the priorities of the business as a whole. I’ve already said about as much as I can about the different factors that can inform this decision, and I won’t repeat myself. Instead, I want to look at the other interpretation of the question of what to do about it when a project is behind schedule.
Think about it for a second. You’re in the position, uniquely in your company, of knowing exactly what needs to be done and by when, knowing how much of it already has been done, and knowing how long the rest is expected to take. You realize that the math doesn’t add up, and you’re going to miss your deadline. Your developers probably don’t know it, and nor do your boss and the other stakeholders. What do you do? As in, who do you tell, and when, and how?
As always, it’s up to you, but here’s one suggestion: don’t be like British train station departure boards. Britain is dependent on its trains, with over 1.5 billion train journeys made each year. It’s one of the most common forms of commuting. And while 90% of trains in the UK run on time, if using a lose enough definition of “on time,” there are delays, which are most noticeable when you’re waiting on the platform for your train to pull in.
On almost every platform in the country, you’ll find electronic departure boards, listing which are the next trains expected to arrive, what routes they’re traveling, and what their expected departure time is. There’s a curious phenomenon that you’ll notice if you arrive on the platform a little early for a train that ends up being late: at first, the expected departure time displayed will only be a minute or so after the scheduled time.
Oh well, you’ll think to yourself, the train’s slightly late but I might as well stay on the platform since it’s only a delay of a few minutes. But as the minutes tick by, you’ll notice that the expected departure time has slunk back a minute. Then another minute. Then a couple of minutes more.
At every stage, the board will claim that the train is just a few minutes away, so you might as well stick around. But it will keep deferring the expected time of arrival until up to half an hour has passed.
Now, I cannot prove it, but I am convinced that this is a deliberate tactic on the part of the rail operators to try to pacify customers. Trains run on long routes, and when they’re delayed it’s very likely they’re delayed at the start of their journey, departing late. In which case, from the moment the train actually sets off, it is known how far behind schedule the service is, and that could be hours before the train arrives at a particular station.
Therefore, ten minutes before the scheduled arrival at that particular station that’s quite far down the line, the scale of the delay must be known. And yet the departure board almost always merrily proclaims that things are only a couple of minutes behind schedule.
This attempted mollification is misguided. Not only is the constant revision of predictions more enraging than an honest admission at the outset, purely for its own sake, but also, by continually promising that the train is about to arrive, travelers are prevented from making other plans.
I never think I have enough time to nip over to the other side of the station to have a cup of tea and a sit down at the cafe, but nine times out of ten, by the time the train finally arrives, it’s clear I would have had ample time so to do.
So when it comes to revealing project delays, I beg you: don’t be like British train station departure boards. Don’t try to soften bad news by hiding the extent of the problem at first. You may get a milder response at the start, but you’ll be sabotaging your stakeholders’ ability to respond to the problem if you don’t fully inform them at the outset, and you’ll draw more ire later.
When your product just isn’t very good
Sometimes you’ll ship bad software. I don’t mean software that’s riddled with bugs or plagued with technical debt. We’ve covered that sort of thing already. I mean software that does do what it’s supposed to, but what it’s supposed to do isn’t very good.
Your users don’t engage with it, because it doesn’t feel to them like it satisfies a need or desire that they have. No one raves, no one reviews it, and after a time no one uses it. What do you do when you realize you’ve built a dud?
The first thing to point out is that failure isn’t necessarily a bad thing. A particularly prevalent mantra in Silicon Valley and the world of startups is “Fail fast, fail often.” The idea is that, if you’re into lean product development, it’s only through releasing products and gathering usage data that you learn about what your target audience wants and needs. Finding out what they don’t respond well to is one of the most common and useful ways of gathering data.
So each product “failure” is in fact very valuable, so long as you can minimize the resources you spend in order to achieve that failure. If you can iterate your product rapidly enough, you’ll continually refine and refine your product proposition, through trial and error, until you eventually hit upon a formulation that does actually work.
Or, to put it more concisely, we can borrow a quote from Jake the Dog, a character in the animated TV show Adventure Time: “Dude, suckin’ at something is the first step to being sorta good at something.”
That being said, it doesn’t always follow that after launching a piece of bad software, the right move is always to dust yourself off, change the software, and launch it again. Recognizing when to quit is hard, especially if you’ve already committed a bunch of money to a project and are desperate to recoup your losses. I’ve got two examples to share from my career so far of horses that were pointlessly beaten post-mortem:
One company I worked for had access to lots and lots of users and launched a piece of software—a mobile app—that offered a free trial for all those users, after which they had to start paying a subscription fee to keep using it. We ran the numbers and figured out that for every 100 people who installed the app, we only needed one of them to buy a subscription in order to make the product viable (although we were hoping for a much higher number).
In the product jargon, we needed a 1% conversion rate. We launched the app with much fanfare and got a bunch of users to try it. After a couple of weeks of sorting out initial bugs, we started tracking our conversion rate…and it wasn’t good. We were failing to get more than 0.05% of users to buy a subscription. Our conversion rate was 20 times less than the minimum we needed.
This meant that the running costs of the service the app offered massively out-weighed the revenue it generated, and if we carried on the way things were going we’d run the business into the ground.
The boss’s response was to order a full redesign of the app’s user interface. The problems, in his eyes, were that the sign-up process was too long and people were getting bored and giving up before they even got started, and also that the app looked and felt clunky, and people weren’t convinced that the service was worth spending money on.
If we could redesign the user interface to make it look slicker, and change the UX design so that new users were dropped straight into their free trial without a complicated sign-up, we’d convince far more of them of the app’s merits.
So we tried it. We spent several months working on the new UX and UI designs, and with much fanfare we re-launched the new app. We fixed some initial bugs, then started tracking the conversion rate. The good news was that we massively increased it. In fact, with the new design, we even managed to double it. The bad news was that that still left us with a conversion rate of 0.1%, which was an entire order of magnitude less than what we needed.
With hindsight, we should have seen that coming. UX and UI changes, while they can significantly improve the conversion rate, can’t work miracles. For companies dealing with very high volumes of users and decent conversion rates, a marginal increase can mean millions of dollars of improved turnover. But what we needed wasn’t an increase. We needed a revolution. In our case, the problem was simply that the service we offered just wasn’t particularly valuable to the users we had access to.
It’s easy to say these things in retrospect, but in that particular instance, the correct move wasn’t to adjust the software. Based on what we learned from that first disastrous launch, either we should have cut our losses and ditched the product, or we should have invested in reaching a different set of users, who might have had more of a desire for what we had to offer. UI adjustments were a deckchairs-on-the-Titanic response.
My other example comes from a different, much larger company. Here, one of the higher-ups had decreed that users of a particular another service that I worked on should be given a social network to allow them to interact with one another. The inclusion of a social network was seen as a way of modernizing the company, adding value to users through a cutting-edge digital platform.
Note that the social network got signed off before anyone had really thought about what the users would use it for. By the time a product manager was assigned who started researching this question, budgets had already been allocated, timelines had been agreed upon, and the advent of the network had been mentioned to the press via an interview with the boss.
Despite some initial doubts about the value of the project, the product manager decided that since he’d been asked for a social network, a social network was what he’d build. He hired a design agency to come up with a design, and then a software agent to build a prototype. The prototype was passed to a group of beta testers, who logged on enthusiastically, connected to all their contacts and then…stopped using it.
Mostly because there wasn’t much for them to do. They could post statuses, and reply to each other’s statuses, but there were far fewer options for that sort of thing than there were on Facebook, and far fewer people could see what they said that if they posted on Twitter. It was “fine,” and “looked nice,” according to feedback, but that was it.
The product manager realized this wasn’t great, so decided to go back and add a couple more features to promote user engagement. An automated feed of content from the company’s other services was included, and more ways in which people could build out their profile. But the results were the same: the product was perfectly harmless and looked pretty, but there was no real incentive for people to use it.
At this point, the product manager began to be deeply concerned that there was no actual demand for a social network, but by this point, serious money had been spent, and his own manager informed him in no uncertain terms that if the social network product wasn’t successful, both their heads would roll.
So the product manager went back to the drawing board and tried to come up with a reason to get users excited about the network. Maybe, he thought, it could be used as a professional networking tool, since lots of the users worked as freelancers in similar industries, so they might benefit from expanding their network of contacts.
So the product was reworked: the status updates were killed, and the system was set up to focus on making “connections.” If you connected with someone, you got their contact details so you could interact with them in the real world.
Yet more money was spent on design and development agencies. A new prototype was built and was pushed to yet another group of users to test it. And once again, after initial enthusiasm, usage dried up. This time the problem was that the only way users could find people to connect to be by searching for names, and for the most part, the only names they knew to search for were people they already knew in the real world. But because they already knew them, gaining access to their contact details was pointless—they already had those details.
Watching this poor product manager bounce from uninspired prototype to uninspired prototype was a pitiful experience. The product had literally no potential, and he knew it, but the project wasn’t allowed to die thanks to office politics and vanity. The social network was only finally killed six months later when, despairing, the product manager left the company.
His manager could then finally can the project without losing face, saying: “I still maintain that the social network could have been a winner, but the inept product manager bungled it so irretrievably that it never lived up to what it could have been.”
Sometimes a dud product is just a dud. If you can never answer the basic question of what people would want to use your software for, then every penny you spend on it is wasted money. When the product is terrible because the idea behind it is terrible, be bold, bite the bullet, and kill the product.
That’s it, then. Over the past hundred thousand or so words I’ve set down in writing more or less everything I know about how to survive and thrive in the topsy-turvy world of software development, based on my experiences over the past decade. We’ve covered everything I’ve learned about how to build software successfully, and now we’ve also looked at what to do when you’re not successful.
Thanks for sticking with me to the end, and I hope you come away from this blog slightly better informed and better armed against the traps and pitfalls that await you on your own journey through the world of code. Good luck to you! Maybe we’ll come across each other again someday. The world of software development is, after all, pretty small.