These notes are short, self-contained, articles vaguely relevant to program design. My experiences as a working programmer influence the choice of subject, but I do not attempt to justify the notes by reference to what worked or did not work in practice; instead I will try and justify what I say from first principles, or at least by references to published books and papers.
To navigate through this long single page, here is a list of the main headings with internal links to them:
Multi-threaded programs often protect their data structures from simultaneous access by defining critical sections protected by locks; this is common enough to be supported, for instance, by the Java Programming Language (e.g. section 14.18 of the second edition of the Java Language Specification). This note describes how to design a program in such a way that it can be proved not to deadlock.
A group of threads are deadlocked when none of them can ever progress because each is waiting on a lock owned by another. We can summarise such a situation by drawing a "wait for" graph. The nodes correspond to threads. The directed edges show that one thread is waiting to acquire a lock held by another. Given such a graph, we might attempt to find a thread that is not waiting by moving along the edges, from one thread to the thread owning the lock it is waiting on. If there are no cycles in the graph, this attempt must conclude by finding a thread that is not waiting on any other. That thread is free to proceed, and there is no deadlock; we presume that it will eventually release the locks it is holding and allow others to proceed. If there are cycles in the graph, the threads on those cycles can never proceed. Suppose they could and look at the first thread to proceed. Since it is the first thread to proceed, the thread it is waiting on (its successor in the cycle) can not have released its lock, so we have a contradiction, proving that all the threads on the cycle are stuck and waiting on each other.
We can prove that a program cannot deadlock by proving that cycles in the wait-for graph can never occur. Here are some ways of doing this, in order of increasing complexity.
There is a very thorough, probably definitive, treatment of locks in more complex data-structures in chapter 7 of "Transaction Processing: concepts and techniques", by Gray and Reuter.
The implicit lock ordering strategy needs help with callbacks. Code making a callback cannot control, or even anticipate, which locks the callback code will attempt to acquire. The usual practice is to use the critical section to copy the information required for the callbacks to a buffer created for just this purpose, to exit the critical section while holding the buffer, and then to make the callbacks without holding any locks at all. This rules out the possibility of deadlock, but the code inside the callback must realise that the information it receives with the callback may no longer be up to date; perhaps some other thread has entered the critical section and changed the information it contains in the mean time. The code in the callbacks might, for instance, treat the information it receives as simply a hint, or a tip-off, and enter the critical section itself to see what is really going on.
Very often a program will have threads that modify data structures reflecting the current state, and threads that view these data structures and provide information to a human user, or to another program. The viewing threads could either be provided with callbacks summarising the changes made as they happen, or they could be provided with callbacks that just say that something has changed, and that they may wish to enter a critical section and read the changes for themselves. The latter option can sometimes be more robust in several senses. Because the data structures are read directly, there is less risk of a long standing error at time t=1 affecting the interpretation or application of change notifications from then on, through time t=100 and beyond. Any change to the data structures is a new chance for the viewer to start from scratch and get it right this time. Under load, it is possible that notifications of change will be sent in more frequently than the viewing thread can deal with them. In these circumstances the information provided by the viewing thread can still be a consistent view of the most up to date information it can produce. The system cannot build up a backlog and grow steadily more out of date. A careful observer may notice that some intermediate states are not visible, but if this is known (and acceptable) this may be a worthwhile trade-off. Even if all of the changes that enter the system lead to work, a notification plus reread strategy may be more efficient, if it is easier to work on one large batch of changes than on several small batches. In this case, the shared data structure might just be a large queue of changes, and the viewer might simply respond to a notification by emptying this buffer and dealing with all the pending changes in one batch. Under load, larger queues will have built up when it finishes one batch and returns to start dealing with another, and this may make it more efficient than a design that always repeatedly pops off the oldest change not yet dealt with and deals with it in isolation.
As background, let me remind you of the big difference between network I/O and file I/O, even when the same system calls are used for both. Reads and Writes on a network socket do not necessarily transfer the number of bytes asked for, and may hang waiting for the other end to accept or produce the bytes being transferred. Forgetting this is a common cause of error in networking code at the socket error. The 'back pressure' arising when one machine refuses to provide, or, more commonly, accept data, is in fact a valuable source of co-ordination that does a useful job when it stops one machine from swamping another.
I was told an illuminating story about one of the early packet switching networks. Its architects had looked at other networks and observed that they tended to lose packets under load; sometimes they even threw them away deliberately. They decided that their system would be better than that; under no circumstances would their nodes be allowed to drop or lose packets. Soon after their network was switched on part of it became congested, and it did after all end up dropping packets, in the only way open to it. Those nodes receiving packets faster than they could dispose of them accumulated a steadily increasing backlog of packets until they ran out of memory and crashed, dropping all of the packets then in their memory. The nodes in IP networks are deliberately designed to drop packets when they must, carefully choosing which packets to drop, and the higher level protocols (especially TCP) are tuned to regard packet loss as a sign of congestion and back off.
Anybody planning to link two computers together must consider what will happen if one of them tries to demand more of the other than it is able, or willing, to give. At the level of TCP connections, the put-upon machine can delay sending data in writes, or delay receiving data from reads, and the TCP calls on the more demanding machine will hang, or return error codes, depending on whether non-blocking IO is in use or not. Ideally, this back-pressure should cause the system to make best use of such resources as it has. What it should not do is drive it into another variety of deadlock, where two machines are blocked on writes because neither is ready to read, or nothing in the system can be moved because every buffer is already full.
It is fairly easy to show that this cannot happen for the simple arrangements most systems use in practice. A pipeline of machines starting with an arbitrary source and terminating in a sink that can receive an arbitrary amount of data will not deadlock if every machine able to accept data does so, and every other machine that has data attempts to send it downstream. If no data is flowing, consider the most upstream machine that could receieve it. There must be one, because the sink can receive, if nothing else. The machine upstream of our starved recipient cannot receive, and so has data. But under our rules it should, and can, be sending it down, so we have a contradiction. In fact, back pressure ensures that simple pipelines have a chain of full buffers exerting back-pressure from the slowest system upstream to the source, throttling it down to send data no more quickly than the slowest system can process it, and a chain of empty buffers below the slowest system, as systems downstream of it pass on data as fast as they can get it.
The typical client-server request-response is a pipeline whose sink and source are located on the same machine; the client. By our argument, this will not lock up if sink and source can function independently. It can (and sometimes does) lock up if this is not the case, for instance if the client tries to write all of the body of a long request before reading the response, and server(s) are capable of transforming requests into responses piecemeal, but not of holding all of a long request. More complex systems can be quite vulnerable to full or partial gridlocks; ensuring that this does not happen is a non-trivial part of advanced routing schemes for parallel computers. As with deadlocks between processes over locks, one possible scheme is to ensure that cycles in a dependency graphs cannot arise.
I am in favour of people working alone when they can. For instance, a Sandia Labs study by George S. Davidson found that, for complex problems, individuals working alone produced better solutions than group brainstorming, using any of the currently available methods and collaboration tools (News Release, Sandia National Laboratories, November 29 2007). Nevertheless, it can help to have to explain something to somebody else; it is not unknown to answer a tricky question yourself when asking it of somebody else, without them ever saying a word. This is where the teddy bear comes in. According to "The Practice of Programming", by Kernighan and Pike, it is said that a computer science laboratory offering a help service required their callers to explain their problems to a teddy bear before approaching the duty helper. A significant number of their clients found that they needed only to talk to the teddy bear. I was reminded of this when a classical musician said that she often used a teddy bear as an audience for recording sessions. Peer review is recommended more often than it is practised. It requires two people's time, and can be traumatic for both of them. I don't really know how gifted teddy bears are at peer review or software walkthroughs - but they don't cost much; it might be worth a go.
Kernighan and Pike recommend Teddy Bears for debugging, and also describe a large number of very helpful tactics much more specific to debugging; at the risk of starting an argument, I will mention that "The Pragmatic Programmer", by Hunt and Thomas, suggests a rubber duck. Any readers of this document may consider themselves colleagues of Nimrod, who sits about 20cm tall, and was recently recruited, for 99p, from a British Heart Foundation charity shop. His main contribution, so far, is to remind me to at least mouth my words as I review them. If I reread my own writing without doing this I find that my concentration on what I am reading decreases steadily, until I am doing no more than glancing at the words as I progress through it on autopilot.
The cost of peer review, per line of code or documentation, varies with the rigor of the peer review. It also varies with the intellectual demands of the material being reviewed, but large changes in this are uncommon (and should stay so; any requirement for extraordinary gifts or concentration is a major failing in a design, not a sign of excellence). The cost of failing to find a mistake varies widely, and the amount of time spent on peer review should reflect this. Even within a single system, the costs of a mistake varies widely, depending on where and when it is made. Mistakes at the requirement stage could invalidate all subsequent work. Mistakes at the design stage may invalidate coding, but mistakes during coding may be fixed without further knock-on changes (unless, of course, they reveal mistakes made at earlier stages). So it makes sense to be more careful reviewing and checking designs than code.
"Writing Effective Use Cases", by Alistair Cockburn, is an exceptionally good book on use cases. These can be very useful even when computers are not explicitly involved. For instance, Cockburn describes use cases for a business process as a whole. I have been at least reminded of them when considering the purchase of a new car, and when planning the partial refit of my kitchen. The questions "when do you actually plan to use this feature?" and perhaps "what did you use the current one for, in practice?" should come up automatically when you prepare a set of use cases, and your final use cases should answer them.
A use case describes the behaviour of a system receiving requests from something outside it, known as the primary actor, in order that the primary actor, and perhaps others, can accomplish some goals. A simple use case might take the form of a dialog, in which the primary actor makes a series of requests, each resulting in a response from the system, More complex use cases may introduce other participants, and may document more than one possible flow of events. For instance, errors may occur during processing, which will either be corrected, restoring the situation so that it carries on as normal, or be uncorrectable, in which case a failure occurs. The usual conventions sacrifice complete generality; they do not allow you to specify the entire conditional logic of any possible program within a single use case. In return, it is quite likely that, if you stick within these conventions, what you write will be understood, and understood to mean what you meant to say.
Use cases describe part, but not all, of the requirements of a system, as a series of concrete examples. Because they are specific, they can be easily checked, and they describe things in terms familiar to those who have practical experience of similar systems. Because they are merely particular examples, they do not provide a complete description of the behaviour of any system, and there are some areas that they don't touch on at all; in "Effective Use Cases" section 1.10, Cockburn estimates that use cases cover perhaps a third of all requirements.
A requirements specification could state that a system's behaviour was to be only that specified by the use cases. This would almost certainly not be complete in theory, because an impractical number of different use cases would be necessary to list all possible events with which a real system would have to deal. That gap could be filled by 'common sense' judgements that would in fact have to draw on a large pool of shared background knowledge (there is more, for instance, to building an automatic car number plate recognition system than acquiring a large database of pictures of number plates annotated by the ASCII characters displayed by the number plate in the picture).
In practice, simply defining the behaviour of the system to be a reasonable interpolation of a set of examples given as use cases would not ensure that the design of the system was complete. This does ensure, by definition, that reasonably competent implementors will not produce an incomplete implementation; the implementation is complete if it can fulfill all the use cases in the design. It ignores the probability that the use cases do not in fact exhaust the system behaviours that are actually required in practice, and there is no theoretical consistency check that can be used to establish completeness of a set of use cases. Cockburn section 4.1 suggests that brainstorming sessions be used to generate lists of stakeholders (anybody affected in any way by the operation of the system), and, for each stakeholder, the interests of those stakeholders in the system. He also suggests (section 1.4) that writers brainstorm for failure cases, and ask how the system is to cover them, reporting that this regularly uncovers important information. Another approach is simply to generate more and more use cases until it becomes clear that the recently generated use cases are largely covered by earlier use cases, so that no progress is in fact being made. Chapter 13 (Taking stock of the specification) of "Mastering the Requirements Process" by Robertson & Robertson suggests that you look at the system boundary to find "Business Events" (abstractions of the interactions between the system and its environment) and check that the requirements prescribe the necessary behaviour to cover all of these events, and also any work required if one of these events should have happened but did not.
There are plenty of examples of systems that need to work well in cases that occur only rarely. Defence against malicious input is an obvious example. Medics need to be aware of the existence of rare but life-threating diseases that they may never see in their entire career. In "The Art of Computer Programming Volume 2" Knuth mentions (section 4.3.1) that "In the case of Program D, it is also necessary to find some test cases that cause the rarely executed parts of the program to be exercised; some portions of that program would probably never be exercised even if a million random test cases were tried. (See exercise 22.)." Program D is not some esoteric tour de force of theoretical computer science; it divides one positive multiprecision integer by another, using a version of the algorithm taught to all of us in primary school.
Normally the frequency with which a particular use case is exercised in real life is not important; it has to be implemented all the same, whether it is used once a second or once a millenium. This is just as well, because people aren't good at estimating such things. Here is a quote from "The Effective Executive", by Peter Drucker:
I sometimes ask executives who pride themselves on their memory to put down their guess as to how they spend their own time. Then I lock these guesses away for a few weeks or months. In the meantime, the executives run an actual time record on themselves. There is never much resemblance between the way these men thought they used their time and their actual records.
More recent studies have provided subjects with a bleeper that goes off at random intervals and asked them to write down what they are doing when it goes off.
Use cases are also examples. The respectable, mathematically rigorous, job for an example is as a counter-example; it proves that some theory cannot be true by acting as evidence against it. One black crow proves that "all crows are white" is false, but no amount of black crows can prove that "all crows are black". This is similar to the role of use cases in testing, except that here we are testing a theory, instead of a program.
This apparently trivial role can be very powerful. A single example might save large amounts of time spent searching for a way to do something, by showing that what is being attempted is impossible. One real life example is the problem of summarising alarm logs to calculate Service Level Agreement penalties. If the penalty depends only on the number of alarms generated and the time the system spent in the alarmed state, then a count of instances and a count of seconds in alarm state will summarise an alarm log of any desired length. What happens if the Service Level Agreement needs to know when both the primary and backup system were in alarm at the same time? Can we produce summaries of the logs from both systems, of a similarly short and predictable length, in such a way that we can combine the two to calculate the Service Level Agreement penalties?
After some time spent searching for such a scheme, I considered the case when the log from the first system contained only a single, very brief alarm. The combined SLA penalty then tells us whether the other log showed that its system was in the alarmed state at this instant. By choosing between a large number of single-alarm logs I can therefore reconstruct the entire alarm history of the second system from the second log. Therefore the 'summarised' alarm history for the second log has to contain all the information present in an ordinary log file, and no equivalent of "number of alarm instances and total time in the alarmed state" is possible.
Examples may allow you to discard possible solutions very quickly. This can be useful because sometimes there is a simple and effective solution whose merits aren't immediately obvious. One way to search for such things is to consider a list of simple solutions, quickly discarding those that the examples show to be faulty, and then spend more time concentrating on the remainder. Another way is to work on simplified questions that you can actually solve, in the hope that the solutions to these special cases will at least hint at the general solution. Ideally, this approach would work by finding the simplest problem capable of being solved only by the general solution, solving it, and then observing that this solution works for the general case as well. If the difficulty of answering questions grows rapidly with their complexity, then a strategy of answering questions in increasing order of complexity will be reasonably efficient even if the work of answering all but the final question is wasted; the work spent on answering earlier questions will amount to only a small fraction of the work spent on the final question. The strategy known as iterative deepening relies on this.
Only a fraction of the work done on a new design is justified regardless of circumstances and context; universally valid and useful components tend to become absorbed into utilities, infrastuctures, and languages, as their repeated usefulness and applicability becomes known. Most work is justified by reference to requirements, and is wasted if that requirement is incorrect. Transmitting even a single new idea demands much of both reader and writer, and a requirements document with less than a hundred requirements is small. It is difficult to review and evaluate an idea before seeing it in action, so even perfect communication of requirements would not guarantee that all work done is useful. It is therefore extremely likely that work will be wasted due to a mistake in, or a mistake in the interpretation of, at least one requirement.
Recognising this risk is a big step forward, because we can then benefit from a whole subject in its own right, the subject of risk management. Risk management is an agreed best practice of project management; it is mentioned, for instance, in "Managing successful projects with Prince2" (UK Office of Government Commerce; I cannot resist reacting to the title by pointing out that the title of the book appears to make it applicable to only a small proportion of UK government projects).
I like the short section "Defining and Managing Risk" in "Information Systems Project Management" by Jolyon Hallows. It lays out a five-step process for managing risks: identify them, categorise them, mitigate them, plan for them, and manage them. I have already identified a possible risk: "incorrect requirements," so let's move on to the next stage.
Risks are categorised by severity, and severity reflects two components; the probability of a risk occurring, and the impact if it does. According to the chapter "Software-Development Fundamentals" in "Rapid Development" by Steve McConnell, the typical project experiences a 25 percent change in requirements. According to "Facts and Fallacies in Software Engineering" by Robert L. Glass, one of the two most common causes of runaway projects is unstable requirements (Fact 23) and requirements errors are the most expensive to fix when found during development but the cheapest to fix early in development (Fact 24). Therefore the risk associated with requirements has both high probability and high impact; it should be managed as a matter of routine.
We now attempt to mitigate the risk by reducing its probability, its impact, or both.
We can reduce the probability that requirements will be misinterpreted by getting closer to the customer. Every attempt at communication assumes a certain amount of shared background knowledge on which to build; even messages beamed out to space in search of alien civilisations do this, sending descriptions of scientific facts or pure mathematics, in the hope that the message will illuminate the medium rather than vice versa. I like to try and find out the background to the customer's requirements and the customer's business; if our software will be controlling equipment, I might search the web for general information about the equipment, from sales brochures to Wikipedia articles, as well as reading and re-reading the ICDs (Interface Control Documents) provided by the customer. An organisation can get closer to the customer by literally inviting representatives over, to act as SMEs (Subject Matter Experts) so that they can provide informal explanations and advice at short notice.
One way of reducing the impact of a mistaken requirement is to find the mistake as quickly as possible. One well-known way of confirming that you have understood a message is to explain it in your own words to the sender. Projects with an early analysis phase typically allow (or, better, invite) the customer to review it. The tricky bit is writing something that both the customer and the developer care about, in language that they both understand. One idea is to start writing and agreeing the acceptance tests as soon as possible. These are good concrete examples, which may interest the customer, and should reveal any discrepancies between customer's and developer's understandings of more abstract concepts. This approach would be natural, and relatively quick, if use cases were used. Another approach is to concentrate on a small number of critical requirements; we might expect that a few of the requirements have a very large influence in the design of the system. Perhaps one or two requirements are greatly increasing the cost; does the customer really need them? Perhaps the design will be completely invalidated if a few assumptions deduced from the requirements do not hold; can the customer confirm them? If you set out to explain to the customer what you understand by their request you risk boring both parties. If you set out to tell the customer which of their requirements are driving the proposed solution, you are telling them something new; even if they are also expert implementors of computer systems, you will be telling them which of several possible approaches you are taking, as well as the consequences of that approach.
The next opportunity for mistakes to show up will come from any paper prototyping (see GUI design and the Cognitive Walkthrough), then results from real prototyping, and then the scheduled completion of particular tasks. If one task capable of validating each requirement is scheduled to finish as early as possible, the risk of subsequent failures will be lower.
A project may attempt to transfer its risks, usually back to the customer, by getting them to sign off on particular approaches or design choices. In the best case, this motivates the customer to review the developer's reasoning and detect mistakes early. In the worst case, this is a zero sum game of developer against customer. This will not attract repeat business.
Planning for risks covers contingency plans, where you decide what you will do if the risk comes to pass. Modular development, and evaluating designs against likely changes, are instances of this. See Parnas and Modular Design.
Managing risks amounts to keeping the whole thing under review. We might note down when evidence is found confirming a requirement. After a mistake is found, we might search for other similar mistakes. Reviews should be scheduled just before starting major tasks. If the requirements are inspected only at the beginning of the project and at the end, those searching them at the end will be looking not for illumination but for arguments to use against the other side.
After writing that "one well-known way of confirming that you have understood a message is to explain it in your own words to the sender" I searched for early examples of this, and found none. A more common approach is for those making a request to monitor the performance of that request. In "the effective executive" Drucker states (Chapter 6: "The elements of decision-making") that "...military organizations learned long ago that futility is the lot of most orders and organized feedback to check on the execution of the order. They learned long ago that to go oneself and look is the only reliable feedback." (Numbers 13 v. 31-33 in the Old Testament is an early illustration of the unreliability of reports; one commentary states that this certainly existed by the fourth century BC, and that many people date it much earlier). "Ship It!", by Richardson and Gwaltney, advises customers to arrange for frequent demos (Chapter 5, item marked with a triangle and the number 27, points out that "Demos are much better than documentation; demos are real, and documentation is just talk."
I once considered that designers were burdened by interfering customers badgering them for information and overloading them with requests for design documents to comply with various standards, such as the (now obsolete) ESA PSS-05 (see e.g. http://www.ess.co.at/ECOSIM/ESA.txt or search for PDF versions). That view was at best incomplete; giving customers as much visibility as possible not only complies with contract terms but also reduces the risk to developers, if done as early as possible. I would still say that demonstrations of early versions would be both more informative and less expensive to produce than the Detailed Design Documention required by PSS-05. I would direct people hoping to learn from PSS-05 to the sections on the Software User Manual; this requires both task-centred/tutorial and reference/descriptive sections, and can be produced in advance of the software it describes. If you can produce such a manual before actually writing the software, you may provoke the customer into giving you some useful early feedback. You will also ease integration testing by making it possible for all your testers and developers to at least operate all parts of the system, whether or not they understand the implementation of those parts. See I love it when a plan comes together.
Military systems are an extreme case of requirement risk; they cost large amounts of money, it may take a war to find out for sure whether the delivered product is any good or not, and even then you might get the wrong answer. Chunk Yeager's autobiography contains an example of fighter pilots who missed severe stability problems with an F-100 jet because they were entranced by its sheer speed (chapter "Flying in the Golden Age") and an example of two practice dogfights between an American Sabre jet and a captured Russian MiG 15 being determined by the skills of the pilots, not the performance of the planes (chapter "Outflying the Russians"). Yeager's evaluation in his autobiography is that the Sabre had better weapons systems and equipment, but that the MiG had greater thrust and acceleration under some conditions. Later on, John Boyd introduced a quantitative method for comparing the maneuverability of different fighter jets and found the F-4, F-104, F-105, and F-106 less maneuverable than the MiG-17, MiG-19, and MiG-21. This informed fighter design starting with the F-15 ("The Mind of War: John Boyd and American Security", by Grant T. Hammond).
The development and evaluation of military systems and procedures is now a well-documented discipline, making use of a wide variety of techniques, both low-tech (wargaming on maps) and extremely high-tech (distributed simulations) using large numbers of military personel (brigade scale war games) or none at all (simulations conducted entirely within a computer). Some of this experience may be transferrable to non-military situations where we wish to assess, or simply understand, a package of requirements before implementing it (or indeed, as the military examples show, to understand the real implications of what we have implemented, and shown satisfies its requirements). The book "The logic of war-fighting experiments", by Kass, describes the design and evaluation of campaigns of experiments. He describes campaigns of experiments because he believes that no single experiment is likely to simultaneously exercise the capability to be evaluated, achieve statistical significance, prove that this significance is indeed due to the new capability, and demonstrate something of proven relevance to the real world, all at the same time.
The idea of paper prototyping would not come as a surprise to somebody who has evaluated the impact of new equipment on military tactics by moving symbols across a map on a table-top. Kass (chapter 10) talks about surrogates, giving the example of a clipboard (standing in for an as yet undeveloped device) which displays exact enemy locations. Kass (chapter 7) makes a contrast between surrogates and prototypes, pointing out that as evaluation progresses from the use of surrogates to the use of prototype equipment the results of evaluation are likely to change from optimistic (because real equipment is not likely to provide all of the advantages of the surrogate) to pessimistic (because real equipment is not likely to be as unreliable or limited as the prototypes). Kass's book is heavily influenced by work in experimental psychology, and references "Experimental and Quasi-Experimental Designs for Generalised Causal Inference", by Shadish, Cook, and Campbell, intended for working social or psychological researchers. A more approachable book, derived from courses given at Yale, is "Statistics as Principled Argument", by Abelson.
Modelling and simulation also used to develop satellites and Spacecraft. The simulators ESA procure to train satellite operators are so detailed that they can run the operational satellite on board software, simulating the processor chips, and its support logic, and communicating via the standard telemetry formats to operator workstations that would also be capable of controlling the real satellite. The development process therefore not only creates a working simulator for operator training, but also checks out the operational on board software. Nevertheless, some concessions are made to practicallity; typically the physics of fuel flow and the electrical potentials within the power system are not simulated in detail, and neither are the details of the data returned by the on board science equipment. Fidelity to the science equipment is not within scope, and would be much more complex. More detailed simulation of fuel and power could be provided using software not much more complex than that required for the simulated orbit calculations to show the effects of commanded thruster actions, but providing a simplified view of some aspects is held to improve the operator training provided by the simulation.
One definition of dependency is that component A is dependent on component B if design changes to component B can force design changes in component A. "Large Scale C++ Software Design", by John Lakos, tells you a great deal about analysing and managing the dependencies within a system. You can draw a directed graph whose nodes are the components within a system, and whose edges show that component A is dependent on component B. If there are cycles in this graph, say linking A and B, then it is impossible to build, test, or understand any single node involved in the cycle without simultaneously building, testing, or understanding all the other nodes in the cycle. This makes everybody's life much more difficult.
"Large Scale C++ Software Design" is well worth reading, but it is less influential than it might be because there is a simple way to (pretty much) solve these problems. For each component, define one or more interfaces, as C++ abstract base classes, or C header files, or Java interfaces, or COM IDL definitions, or whatever your language happens to support (and - not by chance - modern languages will generally support something amounting to an interface). Define the behaviour that any implementor of the interface must provide, and you have finished writing the interface. Because you have only a definition and not an implementation, this interface is not dependent on anything else. Now go away and implement the component. This component is of course dependent on the interface it implements. It may also be dependent on other interfaces that it uses. but it is not dependent on the components that implement those interfaces, so a dependency cycle never arises.
The code that uses an interface will need to get hold of an implementation of it at run time. That, too, will probably be a well-known feature of whatever version of interface your language provides. COM has CoCreateInstance(), and Java (Since 1.6) has class java.util.ServiceLoader<S>, while various Java frameworks, such as NetBeans and Spring, have their own alternative methods. Typically, once you hold of one interface for a package, you will be able to call methods on it to get hold of the other related interfaces.
This drastically oversimplified description doesn't say how you test a component if implementations of the interfaces that it uses are not available. In theory you use stubs where necessary; implementations of the interface suitable only for testing. In practice stubs take time to write, and no test short of real use can ever be completely satisfactory. The well-worn phrase "test what you fly, and fly what you test," has a pretty good track record. So it is a good idea to look at the dependency structures of different design candidates when choosing between them. It is well worth making it as easy as possible for people to test their work, by removing the requirement for them to obtain other parts of the system, and learn how to use them as test drivers. Cumbersome test procedures cost you both by increasing the time it takes to test and also by discouraging people from testing.
There is some positive benefit in not implementing an interface before writing use cases, psuedocode, or even real code, that use that interface. Whether you do this or not, you will probably find that the interface you originally designed has some features missing, some features that aren't quite right, and some features that you don't actually need. It is cheaper to find this out before implementing the interface. Such early checks, and the errors that they discover, are the only justification for a separate design stage; if any part of any a design could be produced flawlessly on demand, designs could be produced "just in time", and a perfectly satisfactory methodology would be simply to keep a separate design log, answering questions from other people or coding almost at whim, and noting down parts of the design in the log as the needs of the situation forced you to come up with them. Conversely, a design that is not subject to review, and exercised with something akin to use cases, confers no added benefit; it might just as well have be written on the fly. In "The Mythical Man-Month", chapter 11, Brooks told us to "plan to throw one away; you will anyhow." In the revised 1995 edition, he adds a chapter of revisions. Part of this states that incremental development, starting with a small skeleton system, will replace the requirement to rework entire systems with a requirement to rework only parts, but possibly including the redefinition of interfaces. If design is to pay off, it must reduce the chances of such reworkings.
Lakos also comments on a suggestion originally brought up by Parnas; that is, that the boundaries of a component need not correspond with the obvious physical boundaries of the system. For example, a component accessed via a network connection could be defined in terms of the protocol carried over the connection, so that client and server were separated by the connection, or could be defined in terms of an API, with a library implementing that API provided that the client could call. In this second implementation, the boundary between client and server lies within the client machine, and the server extends across the network connection. Parnas speculated on the possiblity of language support that would allow client and server code to be connected at the finest possible level, corresponding in modern terms to inline functions followed by global optimisation. The benefit of this is that the cost of crossing the boundary between client and server becomes very low, so that the the interface definition need not be designed with these costs in mind. A modern example of fine connections is the experimental Swift system, in which a single program, annotated to include security information, is split into server-site Java code and client-side Ajax JavaScript by a compiler which takes into account both the security annotations and the costs of crossing the server-client boundary when deciding which bit of code should go where (various at Cornell, SOSP'07, October 14-17, 2007).
Lakos points out the practical benefit of being able to swap different versions of a component in and out very easily, for instance by replacing a single binary file. This allows us, for instance, to deploy a new version of a component with security patches in it very quickly, and suggests that physical boundaries should be aligned with design boundaries. This argument does not apply to the Swift system, where the client-side code is downloaded from the server, so swapping both sides at once is the only sensible procedure.
Many of David L. Parnas's classic papers are available on the Web, but I would still like to recommend "Software Fundamentals; Collected Papers by David L. Parnas". Confusingly, the authors of this book, as far as catalogues are concerned, are Hoffman and Weiss (its editors).
The paper "On the Criteria to be Used in Decomposing Systems into Modules" points out (with examples) that there are a number of different possible ways of cutting a system up into modules. It proposes one based on "information hiding". The modules, and their interfaces, are designed so that each module encapsulates an implementation decision that it hides from its clients. Parnas shows that the resulting design is easier to change if any of those decisions turn out to be wrong, or are superseded by requirements changes. In "Designing Software for Ease of Extension and Contraction", he recommends early identification of those items that are likely to change. This paper also recommends the elimination of cyclical dependencies, in what he calls the "uses" relationship.
More recent pieces on designing APIs, such as Tulach's "Practical API Design," emphasise the importance of designing interfaces to which it will be possible to make backwards-compatible changes. You could view many of the proposals there as ways for the designer of the API to prohibit clients from using the API in ways not specifically allowed by its designer in order to fulfill the use cases of the API; this is a design-time equivalence of the computer security principle of least privilege. For instance, if you declare a class or a method to be final, you prohibit the user from subclassing or overriding it. This then means that a second version of the API can modify the class without having to worry about breaking code in client classes that inherit from it.
In "Use of the Concept of Transparency in the Design of Hierarchically Structured Systems", Parnas suggests that when a module is designed to present a higher level view of an underlying mechanism, that we consider whether every possibility available to those with direct access to the underlying mechanism is available to those who only use the higher level. The module may not provide access to some possibilities for good reason (to guard against error, or to enforce safety or security properties). It may perfectly well not provide access to some possibilities to simplify its implementation. It should not do so just because its designer didn't notice what they were doing.
When UML came out, there was a great deal of confusion between modeling languages and development processes, perhaps because most previous modeling languages had been associated with a particular development process, typically with assurances that this single development process would work in all possible situations. In fact UML never claimed to be a development process in its own right; the Rational Unified Process now claims to be suitable for an enormous variety of different circumstances, but at least it claims to achieve this by allowing it to be tailored to fit different situations.
Today, even staid books like "Software Engineering: a Practioner's Approach" (European Adaptation), by Pressman, (adapted Ince) describe a variety of development processes (chapter two), and advocate tailoring within this variety. They even describe a variety of different possible team structures (chapter three) and different positions on the spectrum from dictatorial control by a single boss to academic anarchy, depending on the problem to be solved.
In keeping with the aims of this document as a whole, I will not attempt to suggest a development process even for the projects I have experience of; instead I will describe a few places where UML, or use cases, can be useful.
A large UML diagram without textual notes, especially one that has been auto-generated, is almost useless. As a checklist for the sort of information that should come with a diagram, I refer outside UML to an excellent and widely influential book on graphs, "The Elements of Graphing Data", by Cleveland. This book states (Introduction) that "Graphs allow us to explore data to see overall patterns and to see detailed information; no other approach can compete in revealing the structure of data so thoroughly." But it does not claim that a graph can function without textual annotation. It says (Chapter 2) that figure captions should do the following.
Textual documentation without UML diagrams is not necessarily useless. If UML diagrams were outlawed on a whim, I would use use cases and interface specifications. The top level use cases describe the system to the customer. Lower level use cases describe a portion of the system that may appear as one line in a higher level use case, checking that the interfaces they use are sufficient, and showing how to accomplish that single line in a higher level use case. As Cockburn points out, you can move down a level by asking "how?", and move up a level by asking "why?".
A use case tells you how the system will accomplish some goal, but we have heard that use cases do not describe all the requirements of a system. This means that, to demonstrate that the rest of the system requirements are fulfilled, we need some form of per-requirement information. As the security of a system is shown by a security case, and its safety shown by a safety case, there should be a document that shows that the system satisfies its requirements. Typically, this is a table with a row, or rows, dedicated to each requirement. Like many of the documents described here, this is a document useful not only for what it shows, but also for what it prompts its writer to do.
I use the American spelling in honour of "Data Modeling for information professionals" by Bob Schmidt, a good introduction to Data Modeling liberated from detailed consideration of relational databases. This is good, because data modeling is still useful in their absence. It is the basis of O-O class models, ontologies look an awful lot like data models, and SNMP MIBs use many of the same tricks as relational databases, in representing whatever structures the customer requires as simple tables. Even if you don't have an explicit data model in your design, you might still chose to consider the data available to people who query your system, or the data it requires of its users. When extending a system, you may start by looking around the system to find out if it already contains the data required to support your extension.
Using O-O terminology, a simple data model describes a number of classes. Each class defines a number of attributes, each of which have an associated type. The pool of data that this data model describes consists of a number of instances, each a member of at least one class. For each attribute defined in its classes, the instance can provide a value. For instance, a data model might define a Person class, with attributes name, height, and date of birth. A person instance might then be (David Beckham, 6' 0", 2 May 1975). The data model might also define relationships between classes, together with constraints on their cardinality. These turn into links between individual instances. If we have a football team instance then my data model (now revealed to be sadly out of date) might have a link between "David Beckham" and "Manchester United", entitled "plays for". A simple set of possible cardinalities is "one to one", "one to many" (or "many to one" if you look at it in the other direction), and "many to many". If the set of teams is restricted to Premier League teams then it is reasonable that this link be defined as "many to one". If the England football team is included, then it must be defined as "many to many", since at that time David Beckham played for both Manchester United and England.
I have included these details partly so that I can explain how data models are represented in tables, because this involves a neat trick. You can imagine a table with columns "name", "height", and "date of birth" containing the Person class. It is convenient to add a column guaranteed to form a unique identifier, known as the primary key. In some cases we can use one of the data fields already present, or a combination of them, but it is frequently a good idea to use a single new field with no other purpose. This shields the system from changes that might stop an existing field working as a good identifier, such as a change of name, or the appearance of two people with the same name. So we will add "Id" to both our classes, and make David Beckham number 1 within "Person" and Manchester United 14 within "Football Team".
If "plays for" is a many to one relationship, we can represent this by adding a "plays for" column to the Person team and setting this to 14 for David Beckham, to show that he plays for Manchester United. This works if we assume that each person plays for exactly one team. If we have a null id (typically 0, or -1) we can also represent many to zero or one relationships in this way, so "A. G. McDowell" has "plays for" set to -1, since I do not play for any football team. The new column must go on the class with the "many" side of "many to one" because there would be only one value available associated with Manchester United and it cannot hold the 29 different values that would be required to identify the entire Manchester United squad. You need only number one side of the relationship. We can go from David Beckham to Manchester United by looking at the "plays for" field in the David Beckham row of the table. We can go from Manchester United to a list of squad members by finding all the rows in the Person table with "plays for" set to the "Id" value of Manchester United. This simple two-table plan does not work for many-many relationships, and a counting argument shows that nothing resembling it can be made to work. If there are a possibilities on the left of the relationship and b on the right, then there are ab pairs that might or might not be related, so 2ab different possible sets of relationships. If we have a field with possible values 1..b on the a side, and one with possible values 1..a on the b side, then we have baab possible ways of filling in these fields. If a = b = 8 then we have 264 possibilities and 248 ways of filling in our table, so no convention can represent all possibilities uniquely.
This counting argument gives us a pretty good hint at the trick for many-many relationships; introduce a new table dedicated to the relationship. We might have a "plays for" table, with fields "player" and "team". We can then fill in (1, 14) to show that David Beckham plays for Manchester United and (1, 100) to show that he also plays for England. We retrieve the list of players for a particular team by looking for all rows in this table with "team" set to a particular value, and the list of teams a given player plays for likewise. Such relationship classes may also attract additional attributes, such as "goals scored by". Gathering together attributes for a class, and linking classes with one to many relationships are second nature to most people, so if you start having more trouble with a design than you expected, there may well be a many to many relationship lurking around somewhere.
A data dictionary is a textual or tabular document giving more detailed information on the classes, attributes, and relationships, within the data model. Typically it will give the type and default format of each attribute, together with a short written description of what it means. Relationships will be documented by documenting the attributes containing identifiers that implement them. What I have called the identifier is also known as a primary key. Where the value of such a key is given, typically in a different table, to implement a relationship, this is known as a foreign key.
The diagrams now commonly used for data modelling are redrawn descendants of those described by Chen in "The Entity-Relationship Model - Toward a Unified View of Data", (ACM Transactions on Database Systems, Vol 1, No. 1, March 1976 - and available on the web as a classic paper). In section 4.1.3 Chen shows how you can go from the entity-relationship diagram to a database schema and states that the resulting database schema has properties similar to one that has undergone a transformation into third normal form. If the entity relationship diagram already contains redundancy (for instance, two attributes that always have equivalent values, or one relationship that can always be constructed by putting two other relationships together) then turning it into a relational database schema will not remove this redundancy. On the other hand, if there is no structure beyond the entity relationship diagram, and if each entity has a unique primary key not determined by its attributes, then every possible way of filling in the tables has an interpretation obedient to the entity-relationship diagram, so the translation to relational tables has not introduced redundancy, and the schema is at least in third normal form.
P.S. if you see a real life data model displayed as a class diagram or entity-relationship diagram, especially one generated by an automatic tool, you will probably be staggered by a forest of boxes and lines of stunning complexity. Don't feel bad about this; what you are seeing is really hundreds of different diagrams printed on top of each other. If you are lucky, there will be some documentation that describes the data model piece by piece. If not, you could try and construct some for yourself by going back to the use cases and finding the bits of the data model that support them, one by one.
Writing is harder than reading. Once you have got an underlying data model, you can provide readers with information derived from this model in as many different forms and formats (views) as you wish; not only as web pages, PDF, or GUI screens, but also as summaries, detailed statements, customised reports, responses to queries, and so on. Adding new views is an easy way of linking to new pieces of software and keeping new customers happy. This can produce a sort of redundancy; perhaps one item in a view is always the sum of two other items, from the same or different views. The problems with redundancy include the space taken up, the increased amount of time needed to understand a more complex model and its internal connections and constraints, and the costs of keeping everything in step. Because the redundancy in a system with multiple views arises from having different products produced from a single underlying data model, not redundant in isolation, none of these things are a problem there.
Writing is harder if you have to try and accept a write request made from the viewpoint of one of the views, when that view is not a one-to-one mapping of the underlying model. If more than one state of the underlying model can produce the same view, then, when presented with a write request coming from the view, you don't know which of several possible states in the underlying model to switch to. If there are states of the view that can never be produced by any state of the underlying model, then you will be stumped if somebody asks you to change the model to produce them.
The model-view-controller pattern (presented as the Observer pattern in "Design Patterns: Elements of Resusable Object-Oriented Software", by Gamma, Helm, Johnson, and Vlissides) is therefore common. A single underlying model can be viewed in many different ways, but accepts controls in only one way, more strongly linked to the underlying model and its implementation. GUI widgets are the original example of this, but systems of multiple machines linked via a shared database can also fit this pattern, as can processes linked in a publish-subscribe system, where one client publishes information identified by a topic, and other clients can arrange to receive notifications of all publications identified by particular topics by subscribing to them.
As long as clients only have to read to use the system, you can have as many of them as you want, and adding extra clients for testing, debugging, or new development should not disturb it. This property can be lost if clients scribble over information to record the fact that they have processed it. One mechanical way of getting round this is to give each client its own area to scribble in. Very often, they don't need to scribble at all, for instance if new information contains version numbers or timestamps; then clients need only remember how far they have got. The SNMP network management protocol was designed to allow multiple network managers to work with the same network agent, so it is a good source of examples of this. For instance, where agents count the number of network packets they have seen, they do not usually provide a way of resetting that count. Instead, managers who want to record the number of network packets seen over a given interval subtract the counter value at the start from the counter value at the end.
If you do end up with a system where it is possible that more than one client will write to a given data value, it is well worth adding a means of recording the identity of that client, even if only in a log file. This reduces the embarassing delay between "I know what is going wrong; the bottle count is being reset now and then," and "and it's being done by line 4576 in bottleOpps.java."
State machines are used widely to model and to implement real time systems. See for example "Real-Time UML", by Bruce Powel Douglass. Although UML includes a version of Harel's hierarchical state charts, large systems typically consist of a large number of simple state machines connected together, rather than a small number of state machines with complex hierarchical structure; the unit of abstraction in a state machine diagram corresponding to a subroutine in a program is a single state machine, and not a portion of a state machine.
As an example of a small state machine that does significant work, consider one of Harel's examples for hierarchical statecharts: the digital watch ("Statecharts; A visual formalism for complex systems", by David Harel). Harel produces a single hierarchical state machine that describes all the functions of a model digital watch, but this is not the only way to proceed. My watch displays time, telephone number memories, alarm, stopwatch, countdown timer, and dual time. A top level state machine would have just six states, corresponding to which mode was selected. Using a state machine for this reflects the way people think about such multi-modal devices. But the individual functions (such as the alarm settings, which holds four different alarm times, and whether they are enabled or disabled) are not necessarily best modelled by state machines. We might use the model/view/controller pattern here, at most building a data model and providing pictures of the display in each mode.
In a digital watch, the top level state machine chooses between different model/view/controller modes, but switching mode does not change the underlying behaviour of the watch; switching to alarm mode does not suspend the countdown timer. In other systems, a top level mode might start and stop processes for particular modes, or set default parameters. It could do this in the entry or exit actions for its modes. Such systems could then use mode selection as a high level control action, allowing direct access to the parameters for low level control. A simple example of this would be the control of a redundant pair of systems, with the mode indicating the primary system. Switching mode would change the primary system. The new primary could run using the same set of underlying parameters as the old, with these parameters settable by user intervention. Alternatively, the new primary might use a default set of parameters until commanded otherwise. The choice between the two would probably depend on whether the parameter settings in the old primary might have contributed to the need to switch to a different one.
The Data Flow Diagram (DFD) is the only one of the three main diagrams used by structured analysis not to have a direct equivalent in UML. The other two were Entity-Relationship diagrams, which have been extended and redrawn to become class diagrams, and state transition diagrams, which have been extended to become Harel hierarchical statecharts. The other two diagrams were developed and established outside the search for program development methodologies. Entity-Relationship diagrams were introduced as a tool for database design in Steve Chen's classic paper "The entity-relationship model: toward a unified view of data", ACM Transactions on Database Systems, 1:9-36, 1976. State diagrams derive from Finite Automata, which "Finite Automata", by Mark V. Lawson, dates back to Stephen Kleene in the 1950s (also giving references to Mealy and Moore machines in 1955 and 1956 respectively). Data Flow Diagrams were invented by Larry Constantine and published in "Structured Design", Yourdon Press, 1975. A data flow diagram can be seen in the 1974 IBM Systems Journal Article "Structured Design," by Stevens, Myers, and Constantine, but its semantics are not defined, and it exists to support the arguments of the paper, rather than as an example of a major software development artifact in its own right.
A very good (and full) introduction to structured analysis, by Ed Yourdon, is now freely available on the internet. This is "Just Enough Structured Analysis" (JESA, e.g. http://yourdon.com/strucanalysis/wiki/index.php?title=Introduction). Structured analysis is a process for designing computer systems that flourished before the introduction of Object-Oriented Software. Arguably structured analysis was superseded not because of the results of a technical comparison between Object-Oriented programming and Structured Programming, but because structured analysis tended to go with a waterfall lifecycle, and one in which a lot of effort was put into the first stage, an analysis of the existing system. This consumed a lot of time, without any tangible benefit to the organisation.
A data flow diagram shows processes, data stores, and terminators. Processes are active and do the work of the system. Data stores store and retrieve data. Terminators are the system boundaries, and its interface with the outside world. Three different symbols for these (for example circles, flattened blobs, and squares) are linked by arrows showing the flow of data between them when the system is working. See chapter 9 of JESA for more details. Since many systems are deliberately built from components that fall into these classes, and communicate amongst themselves by exchanging data, this does a useful job. In particular, a top level diagram, in which the system appears as a central process, does a pretty good job of describing the scope of a system. People who have been exposed to structured analysis expect to see such a diagram, known as a context diagram (JESA chapter 18). Multiple levels of DFDs can be used, of course, with each process on one level decomposed on a lower level into a data flow diagram, to create a hierarchical system design.
CACM June 2007 published a paper by Jeyaraj and Sauter, "An Empirical Investigation of the Systems Modeling and Verification Tools." In this, they exposed students attending business school to data flow diagrams and use case diagrams, and looked to see how well the students then understood the interactions between the system described and its environment, the internal interactions of the top level components of the system, and the information transmitted within it. For users trained in methodologies, Data Flow Diagrams were more effective than Use Case Diagrams for four of the five measures recorded, and this difference was large enough to reach statistical significance (there was no statistically significantly difference for the fifth measure). Can we expect to see the rehabilitiation of DFDs, perhaps under an assumed name, within UML? On this evidence, they may be useful at least when explaining some features of the design to users who are not themselves software development specialists.
Here is a data flow diagram for a simple system, which monitors some information through local sensors, saves it in a database, and exposes it to remote queries via HTTP. This diagram was created using Dia as a simple graphics editor.
The UML use case diagram has a rather misleading title; it does not show what happens during a single use case, but shows how a group of use cases are related to each other, and to the actors who take part in them. Lines between use cases show where one invokes or extends another. Lines between actors and use cases show which actors take part in which use cases. This diagram is one way of showing a high level clustering of the system, but it is not directly equivalent to the data flow diagram.
Here is a use case diagram that at least shows many of the same components as the data flow diagram. It says more about what the system is used to do, but less about how it is constructed. This may be better or worse than a data flow diagram for particular occasions, but it shows that it does not replace a data flow diagram.
The deployment diagram is another high level UML diagram. According to "The Unified Modeling Language User Guide," by Booch, Rumbaugh, and Jacobson, these are typically used in one of three ways, "To model embedded systems," "To model client/server systems," or "To model fully distributed systems." If you break the rules slightly to include things not part of the system, such as its environment, this allows you to draw a high level diagram showing the scope of the system and its interface to its environment. You can use links to show connections between components, much as in a DFD. Again, this is a high level diagram capable (when stretched) of showing the system in its context.
Here is a corresponding deployment diagram (drawn with ArgoUML: at the moment I can't see how to make nodes look like anything other than cubes). It is a bit clunky, but much closer to the original data flow diagram.
The UML activity diagram can be (mis-)used to produce something that does almost the same job as a data flow diagram. Squat blobs can serve as processes (performing activities that never terminate), and dashed arrows between them, including rectangles to show the objects transferred, serve to show the flows of data between them. The problem here is that UML activity diagrams are intended to be variations on state diagrams, and the states represented by the squat blobs are supposed to be transient, performing an activity that takes a finite time before moving on to a different state. An activity is supposed to be drawn to show a workflow proceeding from start state to end state, with its component workers proceeding from state to state as this happens.
In "Writing Effective Use Cases", Cockburn describes three sorts of tables to be used specifically to establish the scope of the system under design, a task which he regards as crucial.
Here is a use case brief for our example system:
| Actor | Goal | Brief |
|---|---|---|
| Local Operator | Configure Poll Rates | Bring up a GUI from the local console to display and modify the sensor poll rates. These changes will be saved in the database. |
| Local Operator or Remote Analyst | Display Data on Request | Use a web page hosted by the system to view simple graphs and tabular displays of the data collected, and to request files in CSV format covering specified time periods and sensors, so that they can perform more detailed analyses on other systems |
| Time | System is fed with data | The sensors are polled at configurable rates to retrieve the data (unlike SNMP, standard Modbus does not allow slave devices to call for attention) |
There is no diagram in the pure use case tradition (which tends to use text and tables, anyway) that shows all of the communication paths between system components, or between the system and its environment. This information could be inferred from the collection of use cases, but it is not concentrated in any one place.
That everybody makes mistakes is a well-worn phrase, but it is one that has been scientifically investigated (by people with an incentive for disproving it) and found to be correct. In "A Guide to Practical Human Reliability Assessment", by Barry Kirwan, we hear that investigators working out the failure rate calculations for nuclear power plants found that, in the best possible circumstances (a control room team using exceptionally good procedures in a well designed system with independent cross-checks) a failure rate of 1E-5 must be taken as a "human performance limiting value". For a skilled operator working alone in the best possible circumstances the rate is, at best, 1E-4.
System designers must therefore accept that their system's justifications cannot assume that operators always do the right thing. The rest of us can take some comfort from that fact that, yes, everybody does make mistakes. The same book covers error reduction mechanisms. These could be of interest to anybody forced to work with a badly-designed GUI, or even a well-designed GUI where their error rate is unsatisfactory, or somebody giving advice to a less computer-familiar user. Here are some of their suggestions. They rely only on pen and paper; neither computer support nor familiarity with computers is required. The error rates quoted above assume that they have been used where necessary ("exceptionally good procedures").
These procedures are not intended to allow untrained people to operate complex systems, but to guard against human error by competent people familiar with the system. To be effective, they must be followed even when the operator thinks that they could do the job in their sleep, lest it appear that they did.
A pragmatic approach to GUI design is that you put something (almost anything) together quickly and show it to the customer, then expect to redo everything based on their feedback. The customer is always right, generally in an unpredictable way if you are not aware of how they have been influenced by experience with their current and previous systems. Once you have been round this cycle a few times with different GUIs, you may be able to anticipate their reaction better, or perhaps even agree on a GUI style guide.
The quickest way to put together a GUI to elicit customer feedback is not to build a proper GUI at all, but a paper prototype (yes, literally drawn on pieces of paper that you move around in front of the customer to mimic what a computer would do). This has been investigated, and works at least well enough to be described in books such as "Paper Prototyping" by Carolyn Snyder. I have no experience with it, myself. I do sketch out GUI layouts on paper before attempting to implement them in Java Swing, and I find this very useful. The excellent writeup Effective Layout Management: Short Course uses sketches extensively, primarily to decide how to use the different Java layout managers to provide the required GUI layout.
I believe that the first GUI to be shown to the customer should be validated with an informal cognitive walkthrough first, even if this walkthrough is performed only by the GUI implementor, in private. This technique is explained briefly in "Interaction Design: beyond human-computer interaction", by Preece, Rogers, and Sharp. You work through use cases with the GUI, asking three questions at each stage: "will users know what to do?", "will users see how to do it?", and "will users see whether they did it right?". This can't replace the customer's knowledge of the real world, but it reduces the risk of something awful surviving to the final system, or the customer being so appalled by the initial GUI that they lose all confidence in the implementors. One real-life design choice killed by the cognitive walkthrough is the punning icon. I once saw a button displaying what to me seemed to be a picture of half a tree. That, I was told, allowed the user to view the log files. I have nothing against puns or puzzles as such, but such an image neither reminds a user unfamiliar with the system that they might want to view log files, nor makes it clear how to do this if they decide to do this for themselves.
I should not mention books on user interface design without also mentioning "Designing the User Interface" by Ben Shneiderman. Its coverage of the cognitive walkthrough is even briefer than that in "Interaction Design:...", but if you want one book to tell you how to design excellent user interfaces from a standing start, especially those displaying large volumes of information, this is it. It also contains the wonderful visual information-seeking mantra, "Overview first, zoom and filter, then details on demand". Schneiderman worked on a number of different projects in which this turned out to be the necessary guiding principle.
This is another phrase from "The Pragmatic Programmer". If you find yourself writing a second copy of anything from a single number to a shovelful of code, your reflexes should immediately prompt you to rewrite things so that only one instance remains, whether by referring to a defined constant, or by calling a subroutine/method/procedure. One case in a million this is the wrong thing to do; perhaps it really is a coincidence that the same words are required in two places, and that one will shortly need to be changed in ways that should not also affect the other. But even in this case, the change is still easy to make, as long as you recognise the situation; replace the reference instead of modifying the shared definition. This case is so rare that your reflexes should be towards referencing a shared definition, not using the cut and paste function of your editor.
What we learn from data modeling, or database design, applies here as it applies everywhere else. If you use cut and paste where you should have used a single definition and two references, then any change made to one section must also be made to the other, at the very least because it is the right answer there too, and perhaps also because consistency is important in its own right. It is very difficult to ensure that this will happen with cut and paste. Furthermore, the increased volume of code will slow down human readers, make it more difficult to achieve any given level of test coverage, and probably make the completed program larger and slower. Fortunately, the need to create a definition, and make multiple references to it, is so widely recognised that almost every format in which you write will both support it and make it convenient. It occurs often enough that, once you recognise these facts, constant practice will provide you with the reflex to create definitions at, or even before, second use.
In "Large Scale C++ Software Design", Lakos points out that, just occasionally, sharing code is more trouble than it is worth. This can happen if a small section of code is required by two different modules, and placing it in any available module would introduce new dependencies between modules. I suspect that this also risks organisational problems to do with the long term maintenance of the shared code. I would not claim that duplicated code is more trouble than the alternative in absolutely every circumstance, but I will note that these exceptions are rare enough, and so rarely observed by a single programmer, working on their own section of the system, that the reflex impulse to convert second instances from copies to references of a shared definition is still immensely valuable, even if, on reflection, you decide to override it now and then.
Source code control systems, the procedures for putting code in them, and the procedures for generating software from them, can be hedged about with a bewildering number of rules and rituals. We need to remember that these things are intended to satisfy some quite simple requirements, of which the following are the most important.
Problems can arise if the build system depends on software or configurations that are not tracked or replicated. This can happen, for instance, with Microsoft build systems, which can depend on global settings, and the presence of software development kits. Typically it is only easy to get hold of the latest version of the SDK, which may not be the one originally used to build the system. One way to guard against this is to keep a separate build machine, and to archive a copy of its operating system and configuration, using a utility such as GHOST. This is not a complete solution, however, because the GHOST image may only work with particular hardware configurations. You could look to see what the customer's approarch is to replacing the target hardware, and do the same with the development hardware. If they are buying spares and looking for support guarantees to maintain the same target hardware over a twenty year period, you should probably do the same with the development hardware. If they plan to replace the target hardware after three years, you will probably want to change the software at that time to support the new hardware (and its new operating system) anyway, so adjusting to a different version of the development hardware will be no big deal.
Life gets more complicated if you want to split your development team over several sites, and if you want to support multiple versions or multiple releases of the software. Some source code control systems (e.g. Subversion) work perfectly well across a WAN. Others (e.g. Microsoft SourceSafe) can struggle. With multiple versions and multiple releases you may have a choice between putting the customisation in the source code control system, as branches and versions, or putting the customisation in the product, with configuration files. Minor changes will be equally cheap by either means, at least for a while. For large changes, and long term maintenance, I favour putting most of the work into the code; if nothing else, programming languages are designed to cope with complex situations, and people are used to using them in this way. I have seen configuration control systems that supported an almost arbitrarily complex tree of branches and versions, but I remain unconvinced that this was the most cost-effective way of solving the problem. One possible exception is when the target hardware has to be kept as cheap and simple as possible, so putting size and performance constraints on the code. The TinyC operating system and the nesC programming language (for sensor networks composed of large numbers of small computers) introduce constructs into the C language to feed a preprocessing stage that supports similar flexibility at compile time to that which object-oriented languages support at run time. Here only a few files would need to be maintained separately for different configurations. See e.g. "Software Design Patterns for TinyOS" by Gay, Levis, and Culler.
Integration testing is when the plan is supposed to come together, but usually doesn't, at least not without a lot of work. By the time every part of even a relatively small project is put together, no one person understands the whole picture, the users have realised that what they asked for isn't what they really want, there isn't enough equipment to go round so people are treading on each other's toes, and it's not clear where the bugs are, what triggers them, or even whether they are bugs at all.
The reason that you may be feeling helpless and lost is that you have just been demoted from programmer to user. That's right; for the large portion of the software with which you are not intimately familiar, you are no more than a user, and usually an untrained and unwilling user at that. A software user manual, required by some software standards (see Requirements, Mistakes, and Risk) would help a lot. Unfortunately, one is not always available until after the system is finished. Fortunately, we are all supposed to be experts in writing user friendly software. (see e.g. GUI design and the Cognitive Walkthrough. Try walking through the steps taken by a client programmer, trying to use your software).
Some aspects may be common to all user-friendly software As far back as "The Elements of Programming Style" (Kernighan and Plauger; my edition is (c) 1978, 1974) we have the following principles:
These principles directly address two of the three questions from my simplified summary of the cognitive walkthrough; they answer "will users know what to do?" and "will users see whether they did it right?" but not "will users see how to do it?", because the sort of text-based input file used in Kernighan and Plauger's heyday confronts users almost literally with a blank sheet of paper. There is no direct analogue of a GUI when one program is talking to another; I have had reasonable results from a variety of type-checked, heavily commented, interfaces, including SNMP MIBs, COM IDL, and Java interface definitions.
Simple lack of equipment is becoming less common as machines get cheaper and more powerful. More and more often, equipment is available, but only a few people know how to set up a system complete enough to test. To get round this, we need to have a simple automated install procedure, including priming the system with whatever test data is needed to make it function. If you really do have plenty of equipment, you can make life easier for the install procedure by standardising on one particular configuration. It also helps if open source components are used where possible, so that component software licenses aren't a problem.
You may not have control over the input format your module must accept, but you can still chose to validate it carefully. You very often have more control over configuration and parameter files than over the main input file; chose a format that allows you to comment the configuration and parameter files, and provide commented example files for your users, even if they will not actually be used, for example because the install script generates the real files from scratch. You may not have control over the output data you have to provide to other modules, but you usually have the freedom to output records of your choice to a debugging or tracing stream. This output, or rather the combination of this output and the required output of the module, should be self-explanatory. It will often be a good idea to include much of the required output in the trace string. If the resulting trace is indeed self-explanatory then, when a bug apears near your module, you should be able to work out whether there is a problem with the module itself, or the inputs it received. In practice, you will probably end up adding trace statements (and perhaps extra self-checking code) to throw light on tricky or hard to reproduce bugs.
You are already providing the operational version of your module. Would it be easy for you to provide a test version, perhaps driven by simple scripts, that mimics it? Your clients may find it easier to test their software with this than with the real software; scripts are predictable, consume few resources, and need little setting up.
By now, your reflexes should be set to keep you well away from the rough edges of the tools that you are using; if at all possible you will use only the most predictable and stable features of your programming language and target environment. This makes your work easier to do and to maintain, and minimises the risks when you move to a new version. Be warned that, when you write a system that is driven by another, you should not test it by keeping the driving system in its most predictable state. You need to exercise more of the envelope of driving systems than of driven systems, because your system needs to cope with whatever the driving system throws at it in the real world. You need to exercise, in test, every capability of the driving system, however obscure, that will give your system a different style, sequence, or type of input.
This section was written to be an escape plan, not a post mortem; if you were thinking about all of this before you started the project, please stop reading and start writing, or point me at existing writeups and references (designNotes at mcdowella.demon.co.uk). If you are currently working through integration and don't have this stuff, it can all be pasted on at the last minute; perhaps not well, but well enough to be useful. If you can't introduce install scripts or other new controls to make things more user-friendly see Even Astronauts use Checklists. If you are worried about whether you have the time even to write checklists, wait until you have wasted at least as much time as it would take you to write them and then see Factors of Two and the Ski Rental Problem. If you can't create some sort of log entry, you probably are in trouble. If there is a good technical reason for this (e.g. the deployment system just doesn't have the room) consider testing on a simulated system, or a prototype, in which this reason doesn't apply.
You may be faced with a component that simply fails to work with bad input, without giving any additional information. Try starting off with a valid example and then slowly altering it in small steps, testing one small step at a time, and retreating when you make a step that breaks it. One problem with this approach is that you can spend a lot of time failing to find steps that work because one of your basic assumptions is wrong. A particular case of this is modifying the wrong data file, or program, and then wondering why nothing you try seems to work. One check for this is to alternate between a known working version and a grossly broken version, perhaps reverting to a simpler configuration or test, if that is necessary for you to find a known working version. More generally, interleaving work on a number of different approaches can help; if you go down N different paths of length Li, then the time to completion is N min(Li) (see also "Universal Optimal Search", or Levin Search - for example at Universal Search). These days internet search is a pretty good way to find new ideas, especially if you keep revising your search terms to take advantage of what you have learned so far.
Once you have deployed a system you have at least two environments to deal with; the operational environment where your programs are working for real, and the development environment(s). Ideally you will have a development environment available that exactly simulates the operational environment (see the movie "Apollo 13") but there will still be differences. Mistakes in the operational environment will be more expensive, if only because they are visible to customers, and finding out what happened will be more difficult.
Development has a cycle known under a variety of aliases: (requirements, analysis, design, implement, test) in the large scale of the Rational Unified Process, (edit, compile, debug) in the small scale of the working hacker. Military strategy has a cycle that is out of phase but otherwise remarkably similar: the (Observation, Orientation, Decision, Action) cycle of John Boyd, and Wikipedia describes Deming's (Plan, Do, Check, Act) cycle, tracing it back to Bacon's discovery of the scientific method. Given this apparent universality, we can assume that this cycle survives deployment; it slows down enormously, and the steps require much more infrastructure, but the basic structure is the same.
When developing new code, you can probably stay in the development environment until it everything is ready to be redeployed in the operational environment: nothing has changed. Things really slow down when you have to fix bugs reported from the operational environment. The quickest way back into the development environment is to replicate the bug there. A nice reproducible bug, a clear and accurate bug report, and an exact copy of the operational environment will make this easy. Here are some pieces of infrastructure that help you get close to this - mostly to do with logging, making the Check-Observation-Debug phase in the operational system more efficient:
A great deal of testing flows directly from the requirements, or, better yet, the use cases. The program is supposed to do A. We ask it to do A and check that it does this. If the internal structure of the program is sufficiently complex, for instance because of concurrency, no single test will do as a check. If A is sufficiently complex, for instance because it is a statistical or mathematical calculation, it may not be obvious whether the answer provided is correct or not. Here are some observations aimed at such problems.
In "The Art of Computer Programming," Knuth defines algorithms and calculates the resources they consume, especially cpu time. One way to write a fast computer program is to create such a detailed model, and to use it to choose, combine, and tune candidate computer algorithms for a particular task. This is often feasible for programs that take up to a page of psuedo-code, especially if they have been developed with such analysis in mind.
With larger computer programs and systems, we design based on back-of-the-envelope calculations or experience of similar systems, get it working, and then try and work out where all the resources are being consumed. If you are lucky, your development environment will have a profiler, which can record information from a running computer program and tell you where it is spending its time. According to that venerable (but still useful) book, "The Elements of Programming Style," the term "profile" was coined by Knuth in an article in Software Practice and Experience, April 1971. If you cannot get hold of a profiler, your first job is to write subroutines you can call from within your program to do the same job.
Profilers surprise you; very often a large percentage of your program's time is spent in a small area of code that was never expected to be expensive, or that has been completely forgotten about. That is their value; a short time spent tuning this area will speed up your program dramatically. A large amount of time spent fine-tuning code elsewhere will leave this code burning just as much time as before, and so show little performance benefit: if 90% of your time is spent in area A, then speeding up the rest of the code so that it takes no time at all will reduce the total time from 100 seconds to 90 seconds, whereas halving the time in area A will reduce the total time from 100 seconds to 55 seconds. Furthermore, it is sometimes possible to speed systems up by moving work out of the expensive area, from inside an inner loop to code that is executed only on startup, from code that is run per-user to code that is run per-system, or from code in the server to code in the clients. This can be an easy way of getting spectacular speed-ups - if you know where the expensive area is.
The unpredictability of performance implies the following:
It is necessary to have an appreciation of the way that the costs of a particular algorithm grow with increases in the input size, to make sure that the feel you get from running it on small test data does not mislead you. Most algorithms in practical use fall into one of a small number of complexity classes, and most books on algorithms will tell you which class each one belongs to. The definitive books on algorithms and their cost (in so far as a reasonable number of books have any chance of covering a huge field) are Knuth's volumes of "The Art of Computer Programming." These contain a much more detailed analysis than most other books. In practice, most people use non-trivial algorithms via libraries and utilities. Most people's requirements for "Sorting and Searching" (Volume III of Knuth) can be satisifed by using collection classes for in-memory work, and databases for on-disk work. Your collection class documentation should contain some indication of its cost: this is in the specification of the C++ STL, and there is information in the JavaDoc for e.g. java.util.TreeMap<K, V>, which also points at another good (and comprehensive) book, Cormen, Leiserson, and Rivest's "Introduction to Algorithms".
It is usual to include a table showing how the costs of different classes of algorithm grow with the input size. I have chosen to use input sizes that reflect the size of different groups of people, so that you can get a feel for the size, and imagine having to design a system that coped with groups of people of that size. All but the first of the numbers come from Wikipedia. Here is an explanation of the rows of the following table:
I have included the following cost classes, where Lg N = log base 2(N). Some approximations and assumptions are involved here, but the gap between classes is so large that it is rare to find a case where this matters.
In the table, I have assumed that an elementary operation has a cost of 1.0E-9, so that you could interpret the rows as rather optimistic predictions for the time in seconds to do some operation on the input data, using a well tuned algorithm on a modern multi-cpu server machine, with all cores in use. I have added emphasis for everything beyond 3.6E3 seconds/one hour. The main message of this table is that algorithms which appear to take no time at all on test data might or might not acceptable when fed the volumes of date required in real life, so knowing the rough complexity of your algorithm can help you a lot when you come to work out which.
| Example | N | c*Lg(N) | c*N | c*N*Lg(N) | c*N2 | c*N3 | 2N |
|---|---|---|---|---|---|---|---|
| Test data/class size | 30 | 4.9E-9 | 3.0E-8 | 1.5E-7 | 9.0E-7 | 2.7E-5 | 1.1 |
| Portpatrick | 585 | 9.2E-9 | 5.9E-7 | 5.4E-6 | 3.4E-4 | 2.0E-1 | (infeasible: >1E167) |
| RSPB staff | 1545 | 1.1E-8 | 1.6E-6 | 1.6E-5 | 2.4E-3 | 3.7 | (infeasible) |
| Albert Hall (max ever) | 9000 | 1.3E-8 | 9.0E-6 | 1.2E-4 | 8.1E-2 | 7.3E+2 | (infeasible) |
| Villa Park | 42573 | 1.5E-8 | 4.3E-5 | 6.6E-4 | 1.8 | 7.2E+4 | (infeasible) |
| Gloucester | 123205 | 1.7E-8 | 1.2E-4 | 2.1E-3 | 1.5E1 | 1.9E+6 | (infeasible) |
| RSPB members | 1E6 | 2.0E-8 | 1.0E-3 | 2.0E-2 | 1.0E3 | 1.0E+9 | (infeasible) |
| Greater London | 7,512,400 | 2.3E-8 | 7.5E-3 | 1.7E-1 | 5.6E4 | 4.2E+11 | (infeasible) |
| UK | 60,587,300 | 2.6E-8 | 6.1E-2 | 1.6 | 3.7E6 | 2.2E+14 | (infeasible) |
| World | 6.5E9 | 3.3E-8 | 6.5 | 2.1E2 | 4.2E10 | 2.8E+20 | (infeasible) |
Standard Collection classes are now available for most modern languages, supporting Lg N search, N Lg N sorting, and so on. Since they are standard, it is reasonable for you to assume that readers and maintainers of your code will be familiar with them, so that framing your solution in terms of the collection classes makes it more readable. It usually makes it quicker to write, and more efficient too; collection classes are usually much more carefully tuned that typical user code. I recommend that you become familiar with them, and use them where you can.
Chapter Seven of "Programming Pearls," by Jon Bentley, covers the idea of doing rough "back of the envelope" calculations where possible, to check for feasibility (and, more generally, consistency, or perhaps sanity). A related idea is the idea of "sensitivity analysis." You might be deterred from performing some calculation, or even simulation, by the large number of different pieces of information you need as input. You can probably guess most of them, perhaps to within a factor of two, but what is the point of that? The point is that most of these pieces of information may not have much effect on the outcome. If you run your calculation with the "pure guesswork" inputs, and then make a second run, and perhaps a third, with one of the inputs altered, you can compare the two answers and see if that input seems to be making much of a difference. In a lot of real life cases you will find that only a few of the inputs, or only a few of the requirements, or only a few of... are really crucial. A sensitivity analysis will then tell you which of the inputs you really need to know. If, after doing your best to get good values for just those inputs, you still don't know them, you can at least say, for example, that "this system is feasible as long as we don't have to cope with more than 15 operators using it simultaneously."
There are, of course, situations where you need to know all of the inputs in detail to say anything sensible about the outputs. One example, constructed to have just this property, is a cryptographic algorithm, such as DES; you are not supposed to be able to see the underlying plaintext until you have all of the bits of the key correct. The problem here is that there are so many complex connections between the inputs that you can't really learn much from a single input until you know all the others as well. If you are lucky, your situation will be simpler than that. The sort of pattern people look for is when the output is the sum of the inputs, or the product of the inputs, or where you can at least say that, for some input, increasing that input will increase the output no matter what the other inputs are set to.
There are schemes for measuring the influence of different inputs in complex situations. These are worth bearing in mind, because the presence of even two factors, such as nature and nurture, seems to make it very hard for us to understand what is going on. One scheme is the Variance Decomposition described in section 8.5.2 of "Dynamic Models in Biology" by Ellner and Guckenheimer. Suppose that you have observations of a system with inputs Xi, so that you can see the Xi, as produced by some random distribution, and the single output Y, which is a function of the Xi. We write Xi- to represent all of the other inputs except some particular Xi. For now, pretend that Var(Y) = 1. If it's not, we scale our measure of Y so that it is. This is convenient because then the numbers we are about to define come out as values between 0 and 1. The Total Effect of parameter i is then E(Var(Y|Xi-)), by which I mean the average variance seen after fixing every input except i, or a measure of how much more you know once you find out Xi, when you already knew everything else. The Direct Effect is Var(E(Y|Xi)), which is the variance in what you expect from knowing just Xi, or a measure of how much you know given only i, and not the other inputs.
The names "Total Effect" and "Direct Effect" might lead you to conclude that the Total Effect is always greater than the Direct Effect, and this is in fact the case. I will provide only a sketch of a proof of this. Regard Xi- as a single parameter, B, and Xi as a single parameter A. Then Total Effect is E(Var(Y|B)) and Direct Effect is Var(E(Y|A)). Split up Y(A,B) as Y(A,B) = E(Y) + [E(Y|A) - E(Y)] + [E(Y|B) - E(Y)] + [Y(A,B) - E(Y|A) - E(Y|B) + E(Y)] = a + b + c + d. There is enough cancellation that E(ab) = E(ad) = E(bc) = E(cd) = 0, and all the other cross-product expectations are also zero. Then the Direct Effect is Var(E(Y|A)) = Var(b) (Var(a) = 0). and the Total Effect is Var(Y) - Var(E(Y|B)) = Var(b + c + d) - Var(c) = Var(b) + Var(d).
There is a clever way of sampling from the situation to work out these quantities. If we have two observations with exactly the same X values in all but component i, then we can take half their squared difference as one observation of E(Var(Y|Xi-)), which is one component of the Total Effect (or will be, once we have scaled Y to variance one). E(Var(Y|Xi-)) = Var(Y) - E(Var(Y|Xi)). If we have two observations where only Xi is held fixed, half their squared difference is a contribution to E(Var(Y|Xi)), from which (since we will scale to set Var(Y) = 1) we have the information we need to work out the Direct Effect. Now consider a table such as the following (for four variables).
| 1111 | 1112 | 1122 | 1222 |
| 2222 | 2223 | 2233 | 2333 |
| 3333 | 3334 | 3344 | 3444 |
Think of each digit not as a random value, but as the sequence number of a randomly chosen value. If you move along this table left to right and then top to bottom, you change only one digit at a time. This gives you the differences you need to accumulate for the Total Effect, and may also make it easier to compute the values. If you look at a single column as a whole, each cell comes from a completely different generation of values, so you can compute the total variance from that without worrying about dependencies between two Ys calculated from samples that have the same value in one variable. Finally, if you compare a cell with the cell above and to its right (or, for the cell on the far right, with the cell on the far left), you will see that they have the same digit in just one position, which gives you the differences you need to compute the Direct Effect.
Except when benchmarking, we do not search for the one best design, minimising a single measure of cost. It is as well to consider what we actually do. I don't dispute the existence of a one best design; the mathematical theory of utility and the cynical pragmatism of budgeting both suggest that all comparisons can, in theory, be reduced to a comparison between two numbers.
One practical problem with this is that we don't know enough to compare the relative worth of an hour spent rewriting the user interface with an hour spent reviewing a database schema. And even if we could calculate the effects on the system as delivered exactly, any non-trivial system will affect many people in widely different ways, making the definition of the one best system sensitive to the way we account for the different sorts of gains, and losses, it brings to its different stakeholders.
Another problem is the instability of the very best. It is possible to have two very different approaches yield comparable results. It may even be common, where competing companies offering different technologies spend just enough money on development to make sure that the other does not have a clear lead. In these circumstance, the absolute best answer might change every month, with one approach overhauling another, yielding an incremental benefit tiny in comparison with the cost of changing approach.
There is an lovely little (75-page) book in the Sage series called "Multiple Attribute Decision Making: An introduction", by Yoon and Hwang, which surveys a variety of methods for making decisions in pursuit of multiple goals. It starts by referring to Ben Franklin's letter to Priestley (1772), in which he describes his "Moral Algebra," This consists of accumulating lists of pros and cons for a particular proposition, written down over a period of days so that reasons are not likely to be overlooked, and summarised so that the entire list can be taken in at a glance. Franklin then attempts to reduce the problem by finding balancing statements on either side and striking them out. He does not claim that this is an exact science, but points out that the very act of displaying all the arguments for and against a proposition at one time is valuable in its own right.
I have summarised Franklin's letter because I think that it is a valuable procedure, likely to repay the time taken to do it abundantly in many cases, but it is not the procedure that we usually follow when designing computer systems. In the Sage book, the procedure we follow is known as "conjunctive satisficing." In other words, for each attribute that we are interested in, we have a notion of "good enough," and we will accept any solution that does "well enough" on all attributes. We use this informally, when we stop considering solutions after having noticed a serious drawback, and we use this formally, when we accept a list of requirements, with the obligation of demonstrating that our system passes those requirements. This is not as glamorous a procedure as the dedicated search for the one best system, but it solves both of the problems I mentioned above. We do not have to make comparisons between apples and oranges to make a decision; two solutions that pass both the "works with apples" and "works with oranges" test are both good enough, while any solution that fails to work with apples, or fails to work with oranges, is not good enough. We are less likely to have to switch between radically different solutions in response to small changes. If a solution passes all of its tests with some margin for error, then small changes to our knowledge, to the environment, or to the technologies we rely on, are unlikely to render it unacceptable.
One person's requirements are another's ICD (Interface Control Document): somebody who knows that you will produce a component that satisfies a set of requirements can start working on another component that depends on those requirements immediately. Somebody who knows only that you will produce a component that minimises some cost function may have to wait until you have finished to see where it turned out to be most efficient to put in a lot of effort and achieve a lot, and where it turned out to be most efficient to save time and cut losses. Even supposing that they have inside information about some extra characteristic satisified by your design, or some extra bonus that it provides, they may not wish to use it. Doing so is risky, because if that extra feature is not described in the requirements it will be the first feature to disappear, if something has to be traded away to save development time, or to save computer resources, either because of unexpected problems encountered during development, or because of unexpected success later on; success so bountiful as to invite us to port the system into a different environment, reimplementing its components, while of course ensuring that they still satisfy their requirements, so that the system will still work, perhaps on cheaper hardware, or undergoing a heavier load. In practice, of course, the requirements specification will be written with one eye on whether it is achievable or not, and may have to change as more information is gained while attempting to implement it. This is reminiscent of optimisation algorithms that reduce optimisation to satisfiability, making multiple attempts to find satisficing solutions, requiring that the solution found cost at most some value, and changing that value with each iteration.
The search for the "good enough", rather than the best, is associated both with decreased cost (the so-called law of diminishing returns) and unnecessary reduction in quality. Both are plausible. Consider a situation where it is possible to chose between a number of different solutions, each with an associated cost and benefit. As we decrease the benefit required, we increase the number of different solutions we can choose from. With a larger number of satisficing solutions, we might hope to find one of lower cost. If it is possible to chose multiple solutions, adding both cost and benefit, the effect is stronger yet. A good strategy here is to consider solutions in order of increasing cost/benefit ratio, until enough are found to produce the required benefit between them. This is optimal at its natural breakpoints; any choice in a different solution and not in this one will have a cost per benefit ratio at least that of the worst one chosen, so the alternative solution, if it delivers at least the benefit of the optimal solution, has at least its total cost. Each increase in benefit required comes with increasing cost per gain of benefit. A non-optimal solution to the problem in either case (except perhaps in terms of thinking time) is to chose solutions at random and stop as soon as we find a good enough one. It is at least possible that such a process will terminate with a solution that is both more expensive and of less benefit than many available alternatives.
Increasing returns are often associated with what programmers might call 'batching', and accountants might call 'amortization', where the cost for n items is A + Bn, so that the cost per item in a batch of n is B + A/n. When considering only the costs in time of design and development, the 'start-up cost' A might be the cost of starting work in an area, understanding the problem, and deciding on a solution. Once you have done this, it is a good idea to stick with it until you have produced a properly encapsulated solution, complete with whatever documentation, examples, and unit tests are necessary to ensure that this cost need not be incurred again.
Viewed another way, the law of diminishing returns should influence your attitude to risk in different circumstances. The law of diminishing returns suggests that, because of your ability to cherry-pick, each addition of an increment in some good produces a smaller real return. If you are taking risks that might lead to increments being lost, then what you will see is that successive decrements are increasingly more damaging. So if you been unlucky and have lost one increment, remember that losing a second increment will be more damaging still; it may pay to sacrifice more time and resources to precautions against losing that second increment than you spent in an attempt to lose the first increment.
If your requirements allow you to lose a factor of two, you can cope with a surprising amount of uncertainty. One well-known example is the "Ski Rental Problem," which is more general than it sounds. You require some equipment for an unknown period of time. You may either buy it, for a large one-off cost, or rent it, for a smaller cost per time period. Which should you do? The factor of two strategy is to rent it until you have spent as much on rent as it would cost to buy, and then to buy it. Compared to the strategy that you would use if you knew ahead of time how long you would require the equipment, this costs you at most twice as much. The worst case is when you find that you no longer need it just after buying, in which case you have spent twice as much as the optimal strategy; all other cases are better. Many such problems are analysed in mathematical detail, in "Online Analysis and Competitive Analysis", by Borodin and El-Yaniv.
A now well-known example is the storage allocation strategy behind classes that provide growable areas of contiguous store, like the Java ArrayList, and the C++ std::vector. These have to decide how much storage to allocate when the user wants the area to grow. The package needs to allocate a completely new area of storage and copy the old contents into it. Allocating more than is required is wasteful of store in the short term, but may allow a subsequent request to grow the area of store to be completed without a second allocation. One strategy is to double the size of the area when the current area is insufficient. This means that you may use almost twice as much store as is required, but that when you move to a new area of store, half of the items that you move will be moving for the first time. Half of the remainder will be moving for the second time... The total number of moves, per item, after a large number of doublings, is S = 1/2 + 2.1/4 + 3.1/8 + 4.1/16 + .... The standard trick is to split this as S = 1/2 + 1/4 + 1/8 +... + 1.1/4 + 2.1/8 + 3.1/16 + ... The first sum you can recognise (or look up on the internet) as a geometric series, and the second is a smaller version of S, so S = 1 + S/2, or S = 2. If you follow this strategy then, no matter how often the user asks you to grow the storage area, you will use at most twice the amount of store required and copy each item, on average, only twice. A factor of two is not necessarily optimal here; in fact Koenig suggests a factor of 1.5, which is just below the golden ratio, (1 + sqrt(5)) / 2. This means that the there is some chance that we will eventually wish to allocate a region of store smaller than the sum of all the store once allocated and now freed, so that it can be reused. For example, if we start off with 1 and round up, we get 1, 2, 3, 5, 8, 12, 18 < 1 + 2 + 3 + 5 + 8 = 19, 27 < 1 + 2 + 3 + 5 + 8 + 12 = 31, and so on.
Consider the problem of sizing a "text" field in a log record. Each log record contains some fixed length information (time, severity, component...) and a single fixed-length field that will be used to handle textual values of variable size. We don't know ahead of time how large our values will be, so we decide that we will use multiple log records for large values, repeating the fixed length portion. If our requirements allow us to waste a factor of two in space, we can make this work using a text field that is the same size as the sum of the sizes of the fields dedicated to fixed length data. If the variable-length text is of zero length, then the entire portion allocated for it is wasted, and we are inefficient by a factor of two. Stretches of text of steadily larger size reduce the wasteage until they are large enough to require us to use two log records, at which point almost all of the second log record is wasted. As the size of the variable length value increases, our inefficiency decreases, until we need to use three log records, of which only about one and a half records are useful; the first group of fixed-length fields, and the first two regions dedicated to the textual value that requires just over two of these regions. For very large values, we can neglect the fixed fields, and see just a means of storing very large values that uses only half of the available area.
This is well known to be a fool's game, but I include it here because it can crop up in less obvious guises. For this example, consider the game of tossing fair coins. You choose the stake. If the coin comes down heads (with probability 1/2) you win an amount equal to your stake. If the coin comes down tails, you lose your stake.
Since this is a fair game, you would expect, on average, to neither win nor lose; every coin toss has an expected profit of zero. Obvious doubling up is a strategy that purports to do better than this, by carefully chosing the stakes. Suppose we bet one pound at first. If we win, we put away the pound and start again with a one pound stake. If we lose, we raise the stakes for the next toss to two pounds. If we win there, the combination of the two tosses brings us a net gain of one pound, from a loss of one pound and a gain of two pounds. We then start again with a one pound stake. If we lose our two pound bet, we bet four pounds on the next gamble, hoping to win four pounds to pay off our previous losses and leave us, again, with a one pound profit after three coin tosses. This looks like a sure-fire way to win one pound per series of coin tosses. But we also know that every coin toss has an average gain of zero, regardless of the stakes. What is going on?
The problem for the gambler is that they only have a finite amount of money to gamble with. This restricts the number of consecutive losses that they can survive and, when they finally run out of money, they have lost big. Suppose that I have 1023 pounds when I start a new series of coin tosses. To run out of money the coin must come up tails 10 times a row, when the total loss will be 1 + 2 + 4 + ... 512 = 1023. This happens only one time in 1024, but when it does I lose all my 1023 pounds. This exactly balances the times that I win: -1023 * 1 /1024 + 1 * 1023 / 1024 = 0. So what I have is not a scheme for turning an even game into a game in which I win money, but a scheme for turning a game in which I win 1 pound half the time and lose 1 pound half the time into a game in which I win one pound 1023 times out of 1024 and lose 1023 pounds one time in 1024.
So far, so obvious; doubling up in gambling games is pretty stupid. But let's put it into different clothes, noticing that, although you lose badly - devastating badly - when you lose, you don't lose that often. Suppose that our reaction to a failure is to claim that we have learned from it, and to try again. In fact, we will not only try again, but try harder, investing more resources to capitalise on our new knowledge and reclaim our losses. After all, if we didn't, this investment would just go to waste (anybody who knows what is meant by the term 'sunk costs' should be smelling a rat now). We did this in the past, and it worked. We have learnt that determination and courage will allow us to dig ourselves out of such holes. If we really have learnt enough from the experience to make a second attempt a worthwhile risk, well and good. I'd still worry about the risk of investing more resources, and possibly expanding the scope of a project. But if the odds haven't improved, now or in the past, then we are just doubling up without realising it, and our recoveries in the past weren't the result of courage and, determination, but the result of the fact that doubling up doesn't fail very often. But when it does fail, it fails completely, wiping everything invested in it.
Here is a checklist of questions you might wish to consider when reviewing a design.