There’s a scene in the movie Monty Python and the Holy Grail where the stalwart heroes first behold the castle of Camelot. Their awe at its size and beauty abruptly ends when one of them points out, “It’s only a model.”
Patrick Carter loves that scene. The Washington State University biologist hears the same thing from his students all the time.
“Ninety-nine percent of undergrads and beginning grad students will pooh-pooh models,” he says, “when in fact, virtually everything that they know as facts in biology is model-based.”
From the structure of the atom to how memory works, models permeate science. Carter says even he didn’t fully appreciate that until a few years ago, when he started working with mathematician Richard Gomulkiewicz on a model of how exercise behavior is inherited. Now he enthusiastically points out models that are so deeply embedded in our minds that most of us don’t realize they’re there.
He offers an example. Our DNA has a double-helix structure that looks something like a twisted ladder. Right?
“The evidence is overwhelming that that’s true,” says Carter. “But that understanding is all model-based. No one’s ever seen it.”
Gomulkiewicz says modeling is built into the way science works.
“You don’t just blindly gather data, ever,” he says. “You’ve got some model in your mind that you’re investigating. We’re all modelers.”
A conceptual model, such as our mental image of the DNA spiral, helps frame research questions and make general predictions. A numerical model uses math or statistics to describe the image and make quantitative predictions about it.
That sounds abstract, but models are judged by a ruthlessly practical standard. A model that consistently makes bad predictions or, worse, no testable predictions at all, gets pitched.
Only a model that consistently makes good predictions becomes part of our way of looking at the world. The model of DNA as a double helix, for instance, matches experimental evidence gathered over decades.
“That’s why we have so much confidence in the model,” says Gomulkiewicz. “It’s done such a great job for us.”
Some scientists make their models themselves. Others, like Carter, team up with someone who understands the questions they’re trying to answer and has the math chops to put the whole thing down in numbers.
Crop scientist R.C. Johnson called on statistician Marc Evans. One of four members of WSU’s Department of Statistics who consults with researchers throughout the University, Evans has a background in biology and a fondness for challenging problems.
“This one was fun, because it was a non-standard question,” says Evans.
What Johnson was doing was essentially plant breeding in reverse. As lead scientist at the Western Regional Plant Introduction Station, a USDA-sponsored seed bank, he, along with other staff, is charged with maintaining more than 73,000 samples of seeds, sprigs, roots, or bulbs from more than 2,700 species of plants-with as little change from their original genetic make-up as possible.
“A breeder wants to encourage genetic change through selection,” says Johnson. “We want to do just the opposite of what breeding does. We want to keep the population as diverse as possible.”
His seeds and other material, gathered from sites all over the world, can be preserved in cold storage for a few years or a few decades, but eventually must be grown to produce new seeds that can be returned to storage.
Johnson faced a problem shared by all seed-bankers: every time he grew his plants, genetic diversity was lost. In some cases, important genetic changes occur within a single generation.
Some loss of diversity is unavoidable because of the way parental genes get divvied up among the offspring. Just as with humans, not all of mom and dad’s traits show up in the kids.
The main culprit, though, was the way the seeds were collected. The standard method was to cut all the heads from each plant and harvest all the seeds. But plants of the same kind produce different numbers of seed heads, even when they’re grown side-by-side. A small plant putting out one tenth as many seeds as its neighbors won’t contribute equally to the seed pool. Then, when researchers pull out a handful of seeds from that batch to grow another generation, odds are that handful will contain very few seeds from the low-producing plants. After growing and harvesting a few generations like this, the smaller plants-and their genes-can be lost completely.
For the purposes of a seed bank, that’s very bad news.
“Gene banks aren’t so much a thing that you have for a problem right now,” Johnson explains. “It’s like an insurance policy. You don’t know what’s coming”-and a scrawny, low-producing plant might turn out to be the only one in the group with the genes to fight a new disease or provide a breakthrough drug.
The only way to avoid losing the little guys was to hand-count the same number of seeds from each plant. For small operations that grow a few dozen plants, that might be workable; but Johnson and his colleagues grow thousands of individual plants every year. He needed to find a way to collect a similar number of seeds from each plant without ruining his budget, his schedule, or his employees’ eyesight.
Enter Evans. Using statistical theory and Johnson’s data on the number of seeds per plant, he came up with a simple and elegant sampling model. It predicted that taking just four seed heads from each plant would secure up to 95 percent of the genetic diversity of the parent generation. Taking more-even dozens more-would gain little additional diversity.
Johnson was thrilled. Four heads per plant is doable, even in his large-scale operation. He tested the model on plots of several species of grass. It worked. As a bonus, it can also be used by the people who collect the wild materials, to ensure they bring home samples that represent each plant equally.
Johnson is now encouraging colleagues around the world to institute the four-head sampling model in their operations.
Keeping It Simple
Evans says he tries to keep his models as simple as possible, while still answering the questions that need to be answered. A more complicated model may be more realistic, but a simple model is easier for the researcher to understand, apply, and write about.
“Simple is almost always best,” he says. “Simple usually does exactly what you want.”
Mathematician V.S. Manoranjan explains, “A model is not going to capture every little detail you have in your process, because nobody knows all the little details. What it’s going to do is give you a caricature of the process which captures the essential components.”
The trick, says Gomulkiewicz, is identifying which details are important enough to include.
“For students who are just starting out in modeling, their biggest mistake-not mistake, but inclination-is to include absolutely everything they can think of. . . . There’s an art to figuring out just how much detail is understandable, but also captures the important features of the system you’re looking at.”
While simple is best most of the time, in some cases it’s just not possible.
Brian Lamb and Joe Vaughan of WSU’s Atmospheric Research Laboratory run one of the nation’s largest air-quality tracking systems. Local and state agencies rely on their predictions to issue pollution alerts and check for compliance with air-quality standards.
Their primary model, AIRPACT (Air Indicator Report for Public Access and Community Tracking), is a gargantuan beast ravenous for data. Every day, it gobbles up predictions of temperature, wind speed, and other weather conditions from a meteorological model run by the University of Washington.
Vaughan and Lamb also feed it information on about 100 different chemicals and 150 chemical reactions that occur in the air over western Washington and Oregon. Some of the pollutants are emissions from cars, industrial p
lants, or other sources. Others, such as ozone, form when emitted chemicals react with each other in the air.
The model digests all that information and calculates how each contaminant spreads when the air is still, blows steadily, changes strength, or swirls around obstacles such as trees or buildings.
“And this is still a pretty large simplification of what actually occurs in the atmosphere,” says Lamb. “Even today, we don’t have the computer horsepower to try to be more explicit than that.”
AIRPACT started with an atmospheric chemistry model developed by researchers at many institutions. Vaughan, Lamb, and their WSU team customized the model to fit conditions in the Pacific Northwest. They also modified it to include selected “air toxics,” such as 1,3-butadiene, a carcinogen, and nitrogen oxides, which contribute to acid rain.
At first they used the model to analyze historical episodes of pollution, but their emphasis soon shifted to making daily predictions of air quality. Now, computers work overnight to provide the next day’s predictions, which are posted on the AIRPACT Web site by 6 a.m. A visitor to the site (airpact.wsu.edu) can see the model’s predictions of where each pollutant will be, at what concentration, at each hour of the day.
And the AIRPACT beast is still growing. Lamb and Vaughan are expanding it to include all of Washington, Oregon, and Idaho. They’ve also begun work, in collaboration with the University of Washington, the U.S. Forest Service, and the National Center for Atmospheric Research, on a project designed to predict what air quality in the region might be like in 50 years.
One influence they’re watching for is pollution from Asia. The Pacific Ocean is a cleaner neighbor than, say, Detroit, but it doesn’t block atmospheric contamination. Air gets around. Monitors in the region have already detected airborne pollutants traced to China’s burgeoning industrial base.
“Everybody has an interest in being able to look upstream in terms of the airflow,” says Vaughan. “It’s all one atmosphere.”
How Do They Know It’s Right?
One puzzle modelers face is what it means when real-life observations don’t match the model’s predictions. Sometimes the model simply hasn’t been fed the right information. For a while, AIRPACT’s maps showed no emissions at all along a segment of Interstate 5. Vaughan and Lamb knew that couldn’t be right. They checked the model, and sure enough, it was missing the emissions data from that area.
Sometimes the model is wrong, and figuring out how to correct it can lead to a whole new understanding of the subject.
Other times, the model is right, and the discrepancy alerts researchers to something about the real situation that they just haven’t found yet. Lamb recalls that when they first started AIRPACT, it consistently predicted high ozone concentrations in an area south-southwest of Puget Sound. Ozone usually peaks downwind of major urban centers. There were no ozone monitors nearby, so his team didn’t know what the predicted hotspot meant.
“We couldn’t say there was an obvious flaw in the model. So [the Washington Department of Ecology] put a monitor out there, and in fact they started to see some elevated ozone levels,” says Lamb.
“It was a nice example of where the model suggested there might be something going on that people hadn’t realized, and then they went out and looked for it and found it.”
The key in that case was having the right kind of monitor in the right place. Lamb says monitors have become a tough sell with some agencies. “They really love this modeling stuff, and one of the things that we hear is, ‘Oh, we can just do the modeling. We can cut back on the monitoring.’
“And we keep saying, don’t do that. We need more monitoring, not less.”
While Vaughan and Lamb model what happens in the region’s air every day, Carter and Gomulkiewicz are trying to model a very different kind of complexity.
They head a team of eight other researchers and modelers from around North America that won a $2.1 million grant from the National Science Foundation to develop models of the inheritance of traits that change as another trait or some factor in the environment changes. “Function-valued traits,” as they are called, include such features as an animal’s growth rate related to the environmental temperature and its activity level related to its age.
These traits are much harder to analyze than the either/or, pink pea/white pea kinds of traits addressed by classical genetics. Without a good model, understanding how they are inherited has been nearly impossible.
Carter studies wheel-running behavior of lab mice. In mice, as in humans, the tendency and ability to exercise varies over the course of an individual’s life.
“Your physical activity today is influenced by your physical activity yesterday and maybe when you were 20 years old, and maybe even when you were 40 years old,” he explains. “If you want to understand evolution of a trait like that, you can’t be thinking in terms of specific ages. We already have an idea of how single points like that evolve. We don’t understand very well how an entire trajectory across a lifetime evolves.”
In addition to creating their own models, the team plans to produce a software program other researchers can use to customize a basic model to suit their own experimental system, whether it involves slime molds, dabbling ducks, or redwood trees.
It’s a tough task. Making a model is always a process of trial and error, repeatedly checking the model against reality and adjusting it as needed.
Modelers don’t expect a model to be “right” from the start. They may even make a model wrong on purpose. Gomulkiewicz says he sometimes uses assumptions that are clearly wrong, in order to keep the model simple enough to be workable.
His attempt a few years ago to model the co-evolution of plants and their pollinators bogged down in a welter of genetic detail-until he had his model assume that both the plant and the insect had just one copy of each chromosome, rather than the usual two.
“Of course, you’re always nervous, because you know you made these assumptions that aren’t right. But maybe whether or not you have one or two copies of the gene doesn’t really matter in the end,” he says.
The proof, as always, is in the predictions; and models with intentionally incorrect assumptions “often make pretty good predictions,” says Gomulkiewicz. “They’re never going to be great predictions, but there’s a trade-off between how sharp your predictions are and how easy your model is to handle.”
Ultimately, the value of a model rests on whether it illuminates what we see in the lab or in the field.
Statistician Richard Alldredge says a colleague from another university summed it up best: “All models are wrong, but some are useful.”
“They’re not capturing all of the truth,” he explains. “They’re only approximations. But some are useful. And that’s where my philosophy is: OK, let’s try this model, let’s see if it is useful. Does it help us understand something about the real world?”