This document describes a number of examples on the idea of scheduling work to be done on a software project to reduce risk. In section 12.6 of "The Unified Software Development Process", by Jacobson, Booch, and Rumbaugh, we hear that "The use cases are prioritized in the order in which they - or scenarios of them - should be dealt with in iterations." In the next paragraph we hear that "We rank the use cases in order of the risk that they embody."
This is all very well, but what is risk? Is a use case that might take 10 or 11 months riskier than one that might take one day or two weeks? What about one that might not be possible at all, but is only a "Wouldn't It Be Nice If?" in the three-position ranking of "Must Have," "Should Have," and "WIBNI". To try and throw some light on this I have simplified everything drastically to get something that I can model on a computer.
I eliminate the various levels of importance by considering only the total time taken to arrive at a solution, settle for an alternative, negotiate a compromise, or just plain give up. I eliminate the depedencies between use cases by assuming that a preliminary design will produce enough interface definitions that people who would otherwise need to wait for previous work packages to be completed can use stubs or mock objects (dummy or toy implementations, suitable only for testing) instead.
What does this leave us with? Avoiding a problem that I assure you does occur in real life, when, at the end of the project, we are waiting for one person to complete a long one-person job that could and should have been started earlier, but was not, through poor scheduling of work, or sheer oversight. In this toy model we have a list of one-person jobs and two identical workers. Each worker takes the next available job off the list and works on it until they are finished. They then return to the list and again take the next available job. If we have two 1-day jobs and one two-day jobs, then the list 2,1,1 is better than the list 1,1,2 because the elapsed time with the first list is 2 days, with one person completing the 2-day job while the other completes two 1-day jobs, but the time for the second list is 3 days, with each worker completing a 1-day job before the first worker works alone for two days.
We can be pretty sure that the above problem, simple though it is, will be very hard to solve exactly. In fact, even the case with known durations is hard to solve, and if we could solve all cases of the probabilistic version above, we could solve the deterministic version by creating an example where all possibilities for a task's length were the same, or as near the same as would make no difference. Section 4.1.1 of "Deterministic Scheduling Theory", by R. Gary Parker, shows that this problem (deterministic parallel processor makespan, with two or more processors) is NP-Complete.
Risk comes from uncertainty. To introduce this I have given each job three possible durations, not just one. These materialise as the real duration with probabilities 0.185, 0.630, and 0.185. These rather odd probabilities are used by the extended Pearson-Tukey Method to produce discrete approximations to continuous probability distributions. For a number of common continuous probability distributions the discrete distribution is a reasonable approximation if you set the values associated with the three probabilities to the 5%, 50%, and 95% percentile values of the original distribution.
I have restricted myself to three-point discrete distributions, only two workers, and to small integer durations because, for those conditions, it is relatively straightforward, given a list of tasks in order, to work out the complete probability distribution for the final duration. If we have an idea of what sort of final probability distribution we want, we can decide which task orderings are good and which are bad. So what sort of probability distribution do we want?
I have concentrated on one of the percentiles of the probability distribution for the final distribution. I am most interested in the 95% percentile, which is to say the time by which there is at least a 95% probability that the complete list of tasks will have been completed. I don't expect customers to benefit very much from deliveries that are significantly ahead of schedule; they may very well be unable to use them until other deliveries arrive. I do expect customers to be inconvenienced by delays, so I am looking for a figure of merit that rewards schedules that are very likely to deliver on time. An alternative approach would be to set a deadline and reward the schedule most likely to deliver by this deadline. To do this sensibly you need to decide what a realistic deadline is. If you found the schedule (or just possibly schedules) with the shortest 95% percentile and then set that time for the deadline, then one of these schedules would also be the schedule most likely to deliver by the deadline, since any task with a larger 95% percentile would have a probability of less than 95% of delivering by the deadline.
My program generates between 2 and 16 tasks for each list. For each task, it produces a median duration of ((x + 4) * 10), where x is normally distributed with mean 0 and standard deviation 1, rounded to the nearest integer and set to 0 if less than 0. It then generates a uniform random number d and sets the two other possible durations as (1 - d) times this and (1 + d) times this.
For the 95% percentile duration measure, I generated 1000 task lists, and used six different strategies to rank them. The first 'strategy' is no strategy at all; it simply accepts the tasks lists in the order given which, since they are generated at random, is equivalent to shuffling them into a random order. The next three strategies are to sort the task list into decreasing order of the shortest possible, median, and longest possible time for each task. The fifth strategy is to sort them into decreasing order of the difference between the longest possible time for that task and the shortest possible time.
The final possibility is much more time-consuming; I start off with the order I get by sorting into decreasing order of longest possible time, and work out the duration according to the current target percentile (initially 95%). I then use backtrack search to work out the best ordering possible, swapping k of the tasks in the list with the first k of the tasks in the list. This wastes some time because there is a symmetry; you can swap the order of the first two tasks in the list without changing the distribution of the final duration, because this simply swaps the assignment of the first task done by each worker, and the workers are identical. I prefer simpler code to the possible time saving. I don't claim that this is the best way of solving this problem, or even the most cost-effective way; it turns out that don't need to find the absolute best answer to demonstrate that none of the simple strategies always finds the best answer. For these runs, I chose k = 2, to get 1000 tests done in a reasonable period of time.
To compare these various strategies over the 1000 tests, I looked at the best time for any of the six strategies, and expressed all the times as fractions beyond that. Here is a table of statistics for the resulting fractions. Left and right are the left and right ends of a 99% confidence region for the mean from the t statistic. I have not given the minimum, because all strategies have a minimum of 0 (there are all at least equal best at least once).
Remember that the figure given for a single schedule is an exact 95% percentile duration for that schedule; the randomness introduced here does not come from the fact that each task chooses between three possible durations at random, but from the fact that each task list is randomly generated.
| Strategy | Mean | Median | Max | Left | Right |
|---|---|---|---|---|---|
| Random | 0.0502 | 0.0338 | 0.427 | 0.0455 | 0.0549 |
| Best Case | 0.0714 | 0.0502 | 0.447 | 0.0657 | 0.0772 |
| Best Guess | 0.0140 | 0.00672 | 0.295 | 0.0120 | 0.0160 |
| Worst Case | 0.00805 | 0.00375 | 0.0984 | 0.00708 | 0.00901 |
| Range | 0.0194 | 0.0103 | 0.191 | 0.0173 | 0.0215 |
| Computer Search | 0.000770 | 0.000 | 0.0495 | 0.000503 | 0.001038 |
Computer search doesn't win every time, but it's clearly the best strategy of those considered here. The Max column shows that it is never as much as 5% worse than the best strategy. This toy situation is much simpler than real life, but there are correspondingly more sophisticated scheduling tools; this might indicate that they can pay their way, assuming that real life can also produce accurate enough 3-point schedules.
If we interpret the Unified Process suggestions as worst case, then it comes a pretty good second best, never as much as 10% behind the best answer. Note also that the 99% confidence intervals for these two best strategies are disjoint from each other, and from the other strategies, confirming that these are likely to be the two best strategies, and in that order.
Best Guess (ordering in decreasing order of median duration) is at least better than ordering in decreasing order of range between worst and best estimates. Assuming that best possible duration for each task and then ranking in decreasing order is actually worse than performing the tasks in a random order; presumably it saves the nasty surprises to the end. Performing the tasks in a random order would make providing improved estimates easier later on; the first N tasks could be treated as a statistical sample. Unfortunately, it is too dangerous, with a maximum duration of over 42% greater than the best schedule.
Here is a normal probability plot of the results (graphics by Minitab 13):
The distributions obviously aren't normal, but you can still see the following:
For the 90% percentile duration, the main points are still true, although the results are slightly more confused. I will give the table and graph as before. The program is such that I believe that the individual durations should be uniformly ≤ the corresponding ones for the previous (95% percentile) run, but because all the figures here are fractions from the best answer in each case, no such simple relation need (or does) hold.
| Strategy | Mean | Median | Max | Left | Right |
|---|---|---|---|---|---|
| Random | 0.0505 | 0.0319 | 0.427 | 0.0455 | 0.0554 |
| Best Case | 0.0740 | 0.0474 | 0.447 | 0.0677 | 0.0804 |
| Best Guess | 0.0148 | 0.00646 | 0.323 | 0.0124 | 0.0171 |
| Worst Case | 0.00791 | 0.00366 | 0.117 | 0.00689 | 0.00892 |
| Range | 0.0195 | 0.00964 | 0.214 | 0.0173 | 0.0217 |
| Computer Search | 0.000738 | 0.000 | 0.0213 | 0.000545 | 0.000930 |
The big question for the median is: does it now make sense to pay attention to the median duration of each task, instead of the worst case? The answer is yes (slightly); a paired t-test shows that ordering tasks by decreasing worst case leads to larger median total duration than ordering by decreasing task median (p = 0.009, 95% confidence interval (0.000377, 0.00265). Here are the table and graph as before.
| Strategy | Mean | Median | Max | Left | Right |
|---|---|---|---|---|---|
| Random | 0.0350 | 0.0245 | 0.362 | 0.0316 | 0.0384 |
| Best Case | 0.0361 | 0.0264 | 0.413 | 0.0329 | 0.0394 |
| Best Guess | 0.0109 | 0.00545 | 0.0938 | 0.00966 | 0.0122 |
| Worst Case | 0.0125 | 0.00495 | 0.220 | 0.0109 | 0.0140 |
| Range | 0.0280 | 0.0132 | 0.324 | 0.0247 | 0.0313 |
| Computer Search | 0.000791 | 0.000 | 0.0362 | 0.000557 | 0.00102 |
For the following task list, only computer search gives the right answer (the order given here).
| Best | Median | Worst |
|---|---|---|
| 36 | 50 | 64 |
| 9 | 34 | 59 |
| 24 | 42 | 60 |
Because there are only three tasks and two workers, the only real decision is which task to do last. For the best choice, the resulting 95% percentile duration is 96. Sorting on any of the columns leads us to do the task (9, 34, 59) last, and get a 95% percentile duration of 101. The remaining choice can be reached by deciding to do the task with the smallest range first; we then get a 95% percentile duration of 98.
Here is an optimum schedule with a 95% percentile duration of 147:
| Best | Median | Worst |
|---|---|---|
| 4 | 30 | 56 |
| 33 | 52 | 72 |
| 9 | 43 | 77 |
| 8 | 39 | 71 |
| 47 | 48 | 50 |
Here are three schedules chosen to maximise the chance of getting 147 or under:
| Best | Median | Worst |
|---|---|---|
| 4 | 4 | 4 |
| 33 | 52 | 72 |
| 9 | 43 | 77 |
| 8 | 39 | 71 |
| 47 | 48 | 50 |
| Best | Median | Worst |
|---|---|---|
| 30 | 30 | 30 |
| 33 | 52 | 72 |
| 9 | 43 | 77 |
| 47 | 48 | 50 |
| 8 | 39 | 71 |
| Best | Median | Worst |
|---|---|---|
| 56 | 56 | 56 |
| 33 | 52 | 72 |
| 9 | 43 | 77 |
| 8 | 39 | 71 |
| 47 | 48 | 50 |
What is the point of this? well, if you work out the probability distributions for these tables, and merge them together in the proportions 0.185 : 0.630 : 0.185, the resulting probability distribution has a 95% percentile of 146 - below that of the first, fixed, schedule. This corresponds to knowing ahead of time exactly how difficult the first task is. But if you look at the possibilities, you can see that you can tell enough about the difficulty of the first task by the time you need to choose between schedules. If the first task takes time either 4 or 30 then it will finish before the second, and you know for certain as soon as that happens. If it takes time 56, then you know just after time 30, before any task has finished, by ruling out the other two alternatives.
So the point of this is that we have a counter-example to an implicit assumption in the Unified Software Development Process; the assumption that the optimum behaviour is to draw up a list in priority order ahead of time and follow it through no matter what happens (of course I don't think anybody ever expected to take that completely literally, but I still like to have a reason why not). In at least this one example, there is a (very small) gain to be had in return for being more flexible.