Correcting for the Effect of Missing Data in Measures of Disparity

Michael Rosario, Duke University

Photo of Michael Rosario

One goal of evolutionary biology is to understand how the diversity of organisms has changed over time. Although the fossil record provides information about how morphological diversity, or disparity, has changed across geological timescales, our study shows that measures of disparity can be biased when comparing groups with different amounts of missing data. To investigate the effect of missing data on disparity, we applied a genetic algorithm to a morphologically rich data set to simulate realistic data loss. This algorithm, which can be likened to a spinning carnival wheel game, was used to simulate data loss by incorporating patterns of missing data from real fossil data sets. We also applied this algorithm to previously analyzed data sets in order to test whether variation in the amount of missing data was responsible for statistical differences in disparity across groups. We were only able to remove 30 percent of the morphological data (lower than the average amount of missing data detected in fossil data sets) before significant differences from the original data set were detected. Additionally, reanalysis of two published data sets showed that variation in the amount of missing data has the potential to artificially strengthen or weaken statistical differences between groups. These findings suggest that differences in the amount of missing data in morphological data sets should be taken into account before drawing conclusions about disparity in sparse morphological data sets.

Abstract Author(s): M.V. Rosario, A.J. Smith, T.P. Eiting, E.R. Dumont