After a successful 2018 AGM for the Victorian Branch, 160 enthusiastic statisticians and data scientists gathered at the University of Melbourne to listen to Professor Howard Bondell talk about data and decision making. Howard focussed on two topics: informative missingness in recommender systems, and selecting sets of patient characteristics to inform optimal treatment decisions.
Howard’s discussion of recommender systems opened with a discussion of the now-infamous Netflix competition: Netflix offered a US$1 million prize to anyone who could develop an algorithm to predict user ratings of movies with a lower error rate than Netflix’s own algorithm. The key issue with the dataset provided to develop the algorithm was its sparseness: although each of 480,000 users could potentially rate each of 180,000 movies, 95% of the ratings were missing. Although several different approaches have been applied to this and similar datasets, the idea of informative missingness – that users will tend to watch movies that they think they will like, instead of randomly selecting a movie – had not previously been exploited. Howard pointed out that including the missing data mechanism in the prediction model improves accuracy and discussed his approach.
In the second half of his talk, Howard discussed optimal treatment decisions in personalised medicine: the aim being to select the best treatment for a patient given their individual characteristics. Howard’s focus was on the identification of the particular set of characteristics that are important in the determining which treatment is best for a patient. He highlighted the fact that just because a particular characteristic is a good predictor of an outcome, it does not necessarily follow that that characteristic is useful in treatment selection. Howard described his “no regret” approach to selecting sets of patient characteristics to be used for treatment selection: the aim being to select the set of characteristics that minimise the “regret” associated with a treatment decision based on a subset of characteristics being excluded.
Through these two topics, Howard provided excellent demonstrations of the benefits to decision making that can arise through the combination of data science technical know-how and statistical concepts. His talk will soon be available online at https://www.meetup.com/Statistical-Society-of-Australia-Victorian-Branch/events/248217741/.