On October 16, 124 statisticians/data scientists gathered to hear from Professor Di Cook, of Monash University, deliver the Victorian Branch’s 2018 Belz Lecture. With a talk titled “Human vs Computer: when visualising data, who wins?”, Di’s lecture urged us to ponder whether we can be replaced by machines when it comes to assessing the fit of models using data plots.
Often, statisticians are required to assess the fit of statistical models; a task which frequently involves viewing residual plots or other types of data plots, and assessing whether the assumptions of the model appear to be valid. This task, although necessary, is often seen as a form of data analysis drudgery. To relieve the burden on statisticians, could these visual inspection tasks be performed by machines? Di noted that we first need to get some idea of how well humans perform such tasks. Di described how we can use the “line-up protocol” to do this: a human is shown a set of plots, with one plot corresponding to the true (non-null) model, and the rest to null cases. In much the same way as a criminal is selected from a police line-up, the human picks the plot that they believe corresponds to the true model. In this way, the plot itself becomes a data point, and by getting a large number of humans to perform this task (for example, using Amazon’s Mechanical Turk), hypotheses relating to plots can be tested. In the example Di showed, humans did better, when aggregated, than the usual t-test. Power to the people!
But how does one train a computer to perform this same task? The approach that Di described involved converting the data plots into images, and then training convolutional neural networks to classify these images into “good” or “bad” plots, in much the same way as neural networks are trained to classify whether or not a given image includes a cat. Although these neural networks did not perform as well as humans for the example presented, Di noted that we are perhaps not too far away from delegating some of the usual checks of residual plots to the machines: saving us from some of that drudgery!
We were honoured to have Professor Di Cook as our 50th Belz Lecturer: her cutting-edge research into statistical graphics, and her leadership in the statistical and R communities made her the ideal choice to celebrate 50 years of Belz Lectures. The annual Belz Lecture is named in honour of Professor Maurice Belz, who died aged 78 in 1975, and was the Foundation Professor of Statistics at the University of Melbourne. Maurice Belz was active in service not only to the statistical community, but to the broader University and Victorian communities: he was described as “very much the cultured cosmopolitan academic” (1).
Di’s slides are available on her website at http://www.dicook.org/, and a recording of her talk is available on the Victorian Branch’s meetup page: look under past events at https://www.meetup.com/Statistical-Society-of-Australia-Victorian-Branch/.
- J Gani. Some aspects in the development of statistics in Australia. Australian Journal of Statistics. 1976;18(1,2):1-20.