Tuesday, September 12, 2006

The Wrongness or Rightness of Models: (Or, “if this model is wrong, then I just don’t wanna be right!)

The ASU MRG (Modeling Reading Group) held its second meeting last Friday to discuss a book chapter by Naomi Oreskes and Kenneth Belitz called “Philosophical Issues in Model Assessment” (in Model Validation: Perspectives in Hydrological Science; 2001; M.G. Anderson and P.D. Bates eds.). Though aimed at hydrologists, we found that this article stays true to its title, providing a short and, at times provocative summary of some basic issues that anyone using models might want to think about. Topics range from philosophy and logic to political, cultural, and social issues that might be important in thinking about how we represent natural processes in simplified models.

The big thesis underlying the piece is that we (scientists? ...they seem to put the onus on scientists…) should not appeal to models for any sort of prediction about the world. Rather, “one can gain insight and test intuitions through modeling without making predictions. One can use models to help identify questions that have scientific answers.”

Why shouldn’t we use models to predict? First, and, perhaps the most important, is the logical impossibility of model validation. There are a number of potential reasons for this, like non-uniqueness (the fact that more than one model may give the same result), compensating errors (errors that cancel each other out within certain conditions), or the difficulty of estimating unlikely events. In other words, even when we have observational data with which to compare a model, we have no way to prove that the underlying causal structure leading to what we observe is the same as that leading to our model result.

In addition, the authors cite a number of practical reasons that models may prove unhelpful when used to predict, such as the common “illusion of validity,” in which something is accepted as truth merely because it works, or the cultural and political pressure to predict the “right” outcome, or temporal and spatial divergence in which the future may be quite different from the historical data used to “validate” a model.

In general, I think our group found the concepts pertaining to model validity to be fairly straightforward.

Model Wrongness

Our most spirited discussion centered around what we saw as an inconsistency in the authors’ interpretations of examples they presented to demonstrate the difficulty of model validation. If a model predicts a certain outcome based on continued trends, and as result human behavior changes, was that model wrong? Indeed, was its use at all inappropriate?

The authors would have us believe that yes, the model was incorrect and, no, it should not have been used. But here is where we think Oreskes and Belitz fall into their own trap: just as it would be impossible to tell if the model in question is correct, it is also impossible to tell if it is wrong – especially if the model result itself induces a change in (human) behavior feeding back into the very system being modeled.

What is missing in all this talk about model validation is a discussion and characterization of model “wrongness,” (perhaps this would resemble the many layers of uncertainty in a model). It is perfectly plausible (though unknowable) that the model of groundwater levels captured physical processes accurately. It was wrong about human behavior, but then again, this was precisely the point – to understand the consequences of a sustained human behavior.

This is not to argue against the main conclusions of the chapter, with which I believe we all agreed, but merely to point out that to discuss models in terms of being “wrong” misses the point of the chapter’s more nuanced conclusion that models can be useful but dangerous tools precisely because we cannot know if they are right or wrong. The water model examples suggest that there can be an excruciatingly fine line between using models to “gain insight and test intuitions” (as the authors advocate) and predicting the future.

A few other ideas and issues that came of our discussion:

Oreskes and Belitz make the interesting argument in the section on systematic bias that models will tend to be optimistic. They note that very rare events are almost impossible to describe in terms of their probability and may even be unknowable. The logic, then, is that “in highly structured, settled human societies, unanticipated events are almost always costly…” and thus the optimistic bias. We felt this was an interesting point, which must certainly be valid in many cases, but that it should probably not be generalized. We see even from the examples in the chapter that models can be quite pessimistic, especially when modelers are worried that a behavior change is needed. It is important to recognize that models have a social dimension to them – they are tools used to reveal certain types of information in certain ways. So, yes, models (especially long term ones) may miss major unpredictable events, but they may also be structured to yield doom and gloom scenarios that direct human behavior.

The authors touch on this when they mention that part of modeling successfully may involve the production of results that are pleasing to a modeler’s constituents. This certainly seems plausible in the case of the modeler concerned about dropping groundwater levels. Can we think of models as just a set of goals? Matt suggested that a starting point for this line of thinking might be the example of an entrepreneur developing a five year business model. In terms of goals or desired outcomes, how is this similar or different from modeling the water table?

Finally, picking up again on the idea of model wrongness, what are the implications of taking one’s model to a policy maker? I ask this because, in my mind, it may actually change the model itself. By using a model to inform policy, are we by definition forcing social dynamics into a numerical model that had previously been concerned only with water flow (for example)?

There are many other appetizing tidbits in this chapter that would generate an interesting discussion (perhaps a comment from Zach, and maybe even the authors themselves is forthcoming?), but I will stop here with the final recommendation that this chapter is a great starting point for anyone looking to sink their teeth into some of the fleshier, juicier issues in models for both natural and social sciences. Enjoy!

1 comment:

Anonymous said...

As Ryan describes, Oreskes offers numerous reasons for caution in using model predictions in decision-making. Does our discussion about the misuse of models for prediction apply to areas of science and engineering where they use much more specific models focusing on particular aspects of the world? Engineering seems to use models quite often for predictive decision-making purposes, and it works well. At first blush, this could refute Oreskes's cautions against prediction. However, I think Oreskes's arguments should be valid in "simple" engineering models just as much as in "complex" climate science models or societal models.

I'll try to illustrate the point with my own experience, which is likely common in engineering. I used to work on finite element analyses (FEA, using ANSYS) of environmental control valves for a new airplane. FEA takes a three dimensional model and partitions its volume into small elements. FEA uses a set of mathematical relations from stress engineering to translate stress caused by loads on one element to elements throughout the entire body. I would partition a 3-D valve model, specify its material properties and apply the forces and stresses that the valve was projected to undergo. The output would indicate the maximum stress in the valve and predict its ability to bear the load. We would "validate" the essentials of the model by vibration in qualification testing, but failure in those tests was so costly (as stipulated in customer contracts) that we wouldn't test until we had every confidence in the FEA prediction. In deciding whether to test, the model prediction was critical, but the decision was always made by my boss (who was seemingly an artist when it came to relating minute model characteristics to actual behavior on the vibration table).

I don't think the successful predictive use of "simple" science and engineering models does anything to lessen Oreskes's argument. Why does prediction work in engineering but not in Oreskes's examples? Perhaps she focuses on seemingly more "complex" models that deal with phenomena that are less understood than stress analysis. Although system complexity is an important factor, I suspect that it is less important than most would think. Perhaps engineering models are fundamentally different because Oreskes's examples deal with human behavior and policy relevant decisions-which are never incorporated in mechanical engineering models. These are definitely issues to keep in mind.

Unlike someone named Ryan (!), I'm highly disinclined to attribute different model practices to a fundamental difference between science and engineering. Many of Oreskes's claims about the limitations of scientific models apply toward engineering: there were surprises that we didn't anticipate in our models, my boss always knew more about the valve's behavior than the model could show, and often we were conscious of errors in our models that we didn't quite know how to fix. Although you could greatly increase the likelihood of predictive success in our models, model fallibility always existed. Given that Oreskes is right about engineering on so many points, any attempt to attribute different model-usage to a disciplinary difference between science and engineering runs a severe risk of over-generalizing. I bet there can be examples from "science" where models can be used just as effectively, and engineering examples where model usage never works.

There may be a more relevant cause: for some engineers, the decision-making structure and institutional culture are well attuned to incorporate model predictions properly. At the end of the day, the people who make the decision (should we put this valve into qualification testing, etc) are all experienced engineers, who are not afraid to complement the model with their own knowledge. Because the model predictions are used by the responsible "knowledge makers" themselves (and not someone looking for a quick justification), this could make the decision-making role of models fundamentally different than in Oreskes's examples. I also think that the process for valve qualification incorporates some of Oreskes's advice about continually monitoring the predictive output of models. For example, the hierarchical process of valve qualification always incorporates a balance between testing and prediction. The method to "validate" models has been well-developed in the lab. Critically, it is institutionally encouraged to learn from your past decisions ("we lost X money on that bad test: if you ever have another failure, you're fired!").

The successful role that model predictions play in engineering (and elsewhere in science) is surely a huge part of why people want to use GCMs. So identifying what exactly the difference is should be worthwhile. We should resist making general claims about how different disciplines use models, and should instead explore the ways epistemic culture incorporates predictions in decision-making. Oreskes's arguments are seemingly sound for engineering models, and the success of some mechanical engineering models seems to correlate with their adherence to some of her epistemic prescriptions.