The Markus Covert group’s Cell paper from last week represents a tremendous achievement toward a more systematic understanding of how biological cells work. For decades much of molecular and cellular biology has focused on single genes and single pathways. This was partly out of necessity given the cell’s astronomical complexity. And it was also due to evolutionary dogma which viewed the biological world as so many organic contraptions strapped together one way or another. The result was a rather limited perspective of cellular biology. As Bruce Alberts explained in 1998:
We have always underestimated cells. Undoubtedly we still do today. But at least we are no longer as naive as we were when I was a graduate student in the 1960s. Then, most of us viewed cells as containing a giant set of second-order reactions: molecules A and B were thought to diffuse freely, randomly colliding with each other to produce molecule AB—and likewise for the many other molecules that interact with each other inside a cell. This seemed reasonable because, as we had learned from studying physical chemistry, motions at the scale of molecules are incredibly rapid. … But, as it turns out, we can walk and we can talk because the chemistry that makes life possible is much more elaborate and sophisticated than anything we students had ever considered.
Covert’s work is another step on the way toward not underestimating cells. Science, rather than dogma, has a way of doing that. As Alexis Madrigal at The Atlantic put it, “the depth and breadth of cellular complexity has turned out to be nearly unbelievable, and difficult to manage, even given Moore's Law.”
From the 128 computers running in parallel to the 500 megabytes of state data generated for a single cell cycle, Covert’s simulation of Mycoplasma genitalium—the smallest known free-living organism of a mere 525 genes—is immensely complex.
But nonetheless the simulation is not particularly detailed. It is a so-called mesoscale simulation, meaning it takes a grander view at the cost of omitting the fine-grain details. Consider a mesoscale simulation of an automobile, for instance.
It might model the engine by accounting for the rate at which fuel is burned and the level of torque that is produced. That would omit the stresses and strains of the engine block, the temperature of the metal, the action of the valves allowing oxygen to enter the cylinder, the electrical signal that ignites the spark plug, and a million other details.
The high-level engine model, accounting only for the fuel burn and torque, would be worthless to anyone interested in designing engines. But it is appropriate for, say, the problem of modeling the economics of surface transportation.
So it is with Covert’s simulation. A tremendous wealth of data are omitted, mostly out of necessity. The data are either unavailable, would drive the simulation compute resources through the roof to include, or is beyond current modeling and simulation technology.
And while omitting these data is appropriate for mesoscale cell simulations, there is a large gray area. Exactly which details are needed and which can safely be ignored in a mesoscale cell simulation?
The automobile simulation could ignore the details of the engine operation because there is a decoupling between phenomena at different time scales. Sub millisecond dynamics, for example, wash out and are irrelevant when studying annual trends.
This decoupling is well understood in machines that we build. It is less well understood in biology. In fact the lack of such decoupling can be important in molecular simulations, where phenomena on different time scales can interact.
Exploring such issues will be one of many scientific uses of whole-cell simulations. But don’t expect results too soon. It is a big problem and even simulating the standard E. coli bacteria, with 10 times more genes, is far beyond today’s state of the art.
This is to say nothing of populations of unicellular organisms, or the more complex multicellular organisms. If Covert’s simulation needs half a gigabyte of data for a single cycle of M. genitalium then imagine where the compute requirements go with, for instance, structures with millions of the vastly more complex mammalian cells.
Of course the high-and-going-higher compute requirements imply something about the ever-increasing evolutionary requirements as well.
Both the simulation state data and perhaps more importantly the models that input and output those data, all must have been created by random mutations and the like. Unlikely events must have conspired to design and assemble everything represented in Covert’s simulation, and much more. Care to explore the simulation’s Kolmogorov complexity (the code is available here)?
The more we learn, the more unlikely the “fact” of evolution becomes.