CP: Why can't we predict?

Corporate Planning

Why Can't We Predict?

Author: Neil Duncan; Source: New Scientist, Vol. 136, Issue 1841 (03 October 1992, page 47)

Nobody is very good at prediction. Racing tipsters usually pick the wrong horse, and weather forecasters cannot say how much rain will fall, right here, tomorrow afternoon. It is even worse in some fields of study, where there is no general agreement on what is happening now, let alone what will happen tomorrow. Economists, for example, cannot tell us when the recession will end, or even what a recession is. Nutritionists disagree on whether animal fat is bad for us (what sort of fat, how much, for whom, what do you mean by bad, and how long have I got?). The trouble is, in all cases, that their model is probably wrong. It may be a computer-based simulation model, or a theoretical model, or one derived by statistical analysis of data. But, whatever sort it is, it must do three things - include all the relevant factors, cover their whole range of values, and properly reflect what they do.

The first problem is to decide which factors are relevant. You may try to throw in everything but the kitchen sink, and prune the model later, but you are stuck with the factors for which values are readily available. It would be silly, for example, to model the speed of traffic in terms of the proportion of drivers with blue eyes, if it would take an expensive survey to identify that proportion. And you have to allow for the values of factors to change: in economic statistics, for example, a deficit is sometimes 'revised' into a surplus.

When it comes to deciding just how each factor affects the output, there are four things to consider - shape, thresholds, interactions, and lag. Shape refers to the basic mathematical form of the relationship; thresholds are discontinuities in relationships; interactions are present when the effect of a factor depends on the values of one or more of the other factors; and lag means that the output is affected not by the current value of a factor but by an earlier or later value.

The shape of a relationship may vary from a simple straight line to something that needs half a page of algebra to describe it. Sometimes, a nonlinear effect can be represented by using a simple linear expression and transforming the factor - by taking its reciprocal, for example - but this does not always work. Deciding the shape of the relationship is a craft not a science.

Thresholds are levels where the effect of a factor changes. They could be represented by a transformed factor: for a single threshold, for example, all values of the factor below the threshold level could be replaced by zero. A special case of the single threshold arises in 'catastrophe theory', where a system can absorb all the effects of a particular factor until a certain level of that factor is reached - when the system collapses. Thresholds appear also in 'chaotic systems', where a small perturbation of a stable or steadily-evolving condition can be so greatly magnified by a feedback mechanism that the whole process suddenly goes berserk. Needless to say, systems like this are hard to model reliably.

A different sort of threshold effect may occur in a system in which the output is subject to several independent constraints. In traffic, for example, a speed limit might be irrelevant on a very busy road, but the level of traffic might be irrelevant where a series of bends is the main constraint on speeds. This sort of structure is easy to build into a simulation model, but harder to observe in real life. Going back to our traffic example, different drivers are likely to respond differently to the various constraints, blurring their overall effects.

Interactions between factors change their effects. For example, an increase in traffic flow of 500 vehicles per hour may reduce the average speed of cars by 5 kilometres per hour on a flat road but by 20 kilometres per hour on a hill - even though the hill itself would have no noticeable effect on car speeds. Such an interaction can easily be represented in a simulation model or an analytical model. But in observations for an empirical model, interactive effects could be masked or diluted - just as thresholds may be blurred by variations between drivers.

So are empirical models inferior? Not necessarily. Simulation models are always more detailed, but their greater detail may be inappropriate. People adapt to circumstances, and therefore any system involving people is likely to be fairly insensitive to changes in those circumstances. In this case, it is not just that the detail is hard to observe: it may not actually be there.

Lag in the effect of a factor may be more of a problem than all the rest put together. For example, the speeds of vehicles at any point are likely to be affected by the road layout on the approach (positive lag), by conditions at the point (zero lag), and by the drivers' assessments of what lies ahead (negative lag). In some fields, lag may be crucial. It has recently been suggested, for example, that conditions in the womb can affect that child's children, 25 years later; and, in economics, the seeds of a boom or a depression can be seen (or overlooked) in various indicators for years beforehand.

It is hard enough to detect simple lagged effects, when you do not know how far to look back (or forwards). To make matters worse, the lag itself may be subject to interactive or threshold effects. In a really complicated system, the possibilities are virtually unlimited. Animal fats may be bad for you but only if you eat more than 100 grams of saturated fat every day, and only after 20 years if you take no exercise or 40 years if you drink plenty of red wine, and not at all if you are Italian. Unless, of course, your mother smoked heavily while she was carrying you, in which case . . .

Real-life systems are complicated, and it is not surprising that future events are so hard to predict. Also, making predictions is more difficult in matters that involve people than in those that do not: if molecules had drivers, chemistry would still be in its infancy.

The human element probably explains why economics and nutrition are viewed as pseudosciences, but 'chaos' seems to be the most plausible excuse for meteorologists. We should not blame those who work in these fields for their inability to predict; their failing is their reluctance to admit it.

Neil Duncan worked for the Transport Research Laboratory at Crawthorne, Berkshire, for 35 years and is now retired.