Ten Thousand Bits?
Did you know Mars is going backwards? For the past few weeks, and for several weeks to come, Mars is in its retrograde motion phase. If you chart its position each night against the background stars, you will see it pause, reverse direction, pause again, and then get going again in its normal direction. And did you further know that retrograde motion helped to cause a revolution? Two millennia ago, Aristotelian physics dictated that the Earth was at the center of the universe. Aristarchus’ heliocentric model, which put the Sun at the center, fell out of favor. But what Aristotle’s geocentrism failed to explain was retrograde motion. If the planets are revolving about the Earth, then why do they sometimes pause, and reverse direction? That problem fell to Ptolemy, and the lessons learned are still important today.
Ptolemy explained anomalies such as retrograde motion with additional mechanisms, such as epicycles, while maintaining the circular motion that, as everyone knew, must be the basis of all motion in the cosmos. With less than a hundred epicycles, he was able to model, and predict accurately the motions of the cosmos. But that accuracy came at a cost—a highly complicated model.
In the Middle Ages William of Occam pointed out that scientific theories ought to strive for simplicity, or parsimony. This may have been one of the factors that drove Copernicus to resurrect Aristarchus’ heliocentric model. Copernicus preserved the required circular motion, but by switching to a sun-centered model, he was able to reduce greatly the number of additional mechanisms, such as epicycles.
Both Ptolemy’s and Copernicus’ models accurately forecast celestial motion. But Copernicus was more parsimonious. A better model had been found.
Kepler proposed ellipses, and showed that the heliocentric model could become even simpler. It was not well accepted though because, as everyone knew, celestial bodies travel in circles. How foolish to think they would travel along elliptical paths. That next step toward greater parsimony would have to wait for the likes of Newton, who showed that Kepler’s ellipses were dictated by his new, highly parsimonious, physics. Newton described a simple, universal, gravitational law. Newton’s gravitational force would produce an acceleration, which could maintain orbital motion in the cosmos.
But was there really a gravitational force? It was proportional to the mass of the object which was then cancelled out to compute the acceleration. Why not have gravity cause an acceleration straightaway?
Centuries later Einstein reported on a man in Berlin who fell out of a window. The man didn’t feel anything until he hit the ground! Einstein removed the gravitational force and made the physics even simpler yet.
The point here is that the accuracy of a scientific theory, by itself, means very little. It must be considered along with parsimony. This lesson is important today in this age of Big Data. Analysts know that a model can always be made more accurate by adding more terms. But are those additional terms meaningful, or are they merely epicycles? It looks good to drive the modeling error down to zero by adding terms, but when used to make future forecasts, such models perform worse.
There is a very real penalty for adding terms and violating Occam’s Razor, and today advanced algorithms are available for weighing the tradeoff between model accuracy and model parsimony.
This brings us to common descent, a popular theory for modeling relationships between the species. As we have discussed many times here, common descent fails to model the species, and a great many additional mechanisms—biological epicycles—are required to fit the data.
And just as cosmology has seen a stream of ever improving models, the biological models can also improve. This week a very important model has been proposed in a
new paper, authored by Winston Ewert, in the
Bio-Complexity journal.
Inspired by computer software, Ewert’s approach models the species as sharing modules which are related by a dependency graph. This useful model in computer science also works well in modeling the species. To evaluate this hypothesis, Ewert uses three types of data, and evaluates how probable they are (accounting for parsimony as well as fit accuracy) using three models.
Ewert’s three types of data are: (i) Sample computer software, (ii) simulated species data generated from evolutionary / common descent computer algorithms, and (iii) actual, real species data.
Ewert’s three models are: (i) A null model in which entails no relationships between
any species, (ii) an evolutionary / common descent model, and (iii) a dependency graph model.
Ewert’s results are a Copernican Revolution moment. First, for the sample computer software data, not surprisingly the null model performed poorly. Computer software is highly organized, and there are relationships between different computer programs, and how they draw from foundational software libraries. But comparing the common descent and dependency graph models, the latter performs far better at modeling the software “species.” In other words, the design and development of computer software is far better described and modeled by a dependency graph than by a common descent tree.
Second, for the simulated species data generated with a common descent algorithm, it is not surprising that the common descent model was far superior to the dependency graph. That would be true by definition, and serves to validate Ewert’s approach. Common descent is the best model for the data generated by a common descent process.
Third, for the actual, real species data, the dependency graph model is astronomically superior compared to the common descent model.
Let me repeat that in case the point did not sink in. Where it counted, common descent failed compared to the dependency graph model. The other data types served as useful checks, but for the data that mattered—the actual, real, biological species data—the results were unambiguous.
Ewert amassed a total of nine massive genetic databases. In every single one, without exception, the dependency graph model surpassed common descent.
Darwin could never have even dreamt of a test on such a massive scale.
Darwin also could never have dreamt of the sheer magnitude of the failure of his theory. Because you see, Ewert’s results do not reveal two competitive models with one model edging out the other.
We are not talking about a few decimal points difference. For one of the data sets (HomoloGene), the dependency graph model was superior to common descent by a factor of 10,064. The comparison of the two models yielded a preference for the dependency graph model of greater than ten thousand.
Ten thousand is a big number.
But it gets worse, much worse.
Ewert used Bayesian model selection which compares the probability of the data set given the hypothetical models. In other words, given the model (dependency graph or common descent), what is the probability of this particular data set? Bayesian model selection compares the two models by dividing these two conditional probabilities. The so-called Bayes factor is the quotient yielded by this division.
The problem is that the common descent model is so incredibly inferior to the dependency graph model that the Bayes factor cannot be typed out. In other words, the probability of the data set given the dependency graph model, is so much greater than the probability of the data set given the common descent model, that we cannot type the quotient of their division.
Instead, Ewert reports the
logarithm of the number. Remember logarithms? Remember how 2 really means 100, 3 means 1,000, and so forth?
Unbelievably, the 10,064 value is the
logarithm (base value of 2) of the quotient! In other words, the probability of the data on the dependency graph model is so much greater than that given the common descent model, we need logarithms even to type it out. If you tried to type out the plain number, you would have to type a 1 followed by more than 3,000 zeros!
That’s the ratio of how probable the data are on these two models!
By using a base value of 2 in the logarithm we express the Bayes factor in bits. So the conditional probability for the dependency graph model has a 10,064 advantage of that of common descent.
10,064 bits is far, far from the range in which one might actually consider the lesser model. See, for example, the Bayes factor
Wikipedia page, which explains that a Bayes factor of 3.3 bits provides “substantial” evidence for a model, 5.0 bits provides “strong” evidence, and 6.6 bits provides “decisive” evidence.
This is ridiculous. 6.6 bits is considered to provide “decisive” evidence, and when the dependency graph model case is compared to comment descent case, we get 10,064 bits.
But it gets worse.
The problem with all of this is that the Bayes factor of 10,064 bits for the HomoloGene data set is the
very best case for common descent. For the other eight data sets, the Bayes factors range from 40,967 to 515,450.
In other words, while 6.6 bits would be considered to provide “decisive” evidence for the dependency graph model, the actual, real, biological data provide Bayes factors of 10,064 on up to 515,450.
We have known for a long time that common descent has failed hard. In Ewert’s new paper, we now have detailed, quantitative results demonstrating this. And Ewert provides a new model, with a far superior fit to the data.