Researcher Education and Development are pleased to invite you to attend a 2-part workshop
Bayesian Logistic Regression in Practice, using R or Autostat
Who should attend? Anyone interested in analysing how several explanatory variables (aka predictors, independent variables, or inputs) and how they are related to a dichotomous response variable (aka outcome, dependent variable, or output). E.g., what affects your survival in a shipping disaster? What matters: whether you are male or female, or your age and passenger class? Or what kinds of habitat & climate are associated with species presence?
What knowledge is assumed? Ideally you have conducted some form of regression before, and are familiar with the basic ideas of statistical modelling. This may have been in any statistical paradigm. With classical training you would understand: p-values to assess significance of effects, reporting of confidence intervals, and model diagnostics such as R-squared, AIC and residuals. With training in machine learning you may be familiar with cross-validation and variable selection through stepwise selection or lasso. You need no knowledge of Bayesian statistical modelling.
What preparation is needed?
Participants will use either AutoStat or R. To streamline the provision of data and notes, those using R will be invited to do so through the AutoStat GUI. Please bring your laptops. An email will be sent prior to the workshop with account details and login instructions. You will not be required to load any software on your personal computer.
What will be learned?
Part 1: Meaning and Eliciting Knowledge. We start by considering the role of variables in statistical modelling, and in particular, how these form the basic building blocks of a linear regression model. Importantly, we address the logic behind regression, and how this affects model-building and interpretation, when establishing: association (e.g. co-relation) between various factors; or causality, where some factors influence another. The first case study on shipping disasters examines how survival of people on board relate to factors such as demographics (age, gender) and class. A second case study on koalas examines how reported sightings relate to habitat and climate factors.
In this way, by examining linear regression through a model-based lens, we set the groundwork for introducing the concept of a logistic regression. The major difference between the two statistical models is the kind of outcomes that they are suitable for: dichotomous (like survival/death or presence/absence) vs continuous outcomes (like age and ticket cost or habitat/climate). We show how this major difference has subtle impact on implementation.
We show how explanatory factors—which may be continuous, dichotomous or categorical—contribute to explain the expected outcome. In this way, we build up your understanding of both linear and logistic regression by gradually introducing equations, at first using diagrams, then words, and real numbers. Only briefly do we refer to the Greek notation of classical statistics. We include a pictorial introduction to the logit transformation (aka “log odds ratio”) at the heart of logistic regression.
We then show how to use this understanding to elicit relevant information from experts or the literature. We show you several ways of talking with experts to elicit such information. For illustration: we analyse one shipping disaster, and then use this information to inform analysis of the “next” disaster; and analyse koala presence in one region, and then use this information to help map presence in another.
Part 2. Inference and computation for Bayesian logistic regression, including informative priors.
This second workshop, in a two-part series, focuses on more intermediate level concepts. We assume that you attended Part 1 (or have equivalent prior learning), which focused on how to understand and communicate the meaning of the model for logistic regression, and also elicited priors for the parameters. In Part 2, we progress beyond understanding the results of logistic regression to how to obtain these results.
We contrast classical Maximum Likelihood and Bayesian approaches to inference for logistic regression. Importantly this means exposing the limitations of a Frequentist approach, which can to some degree be addressed via Bayesian computation and/or inference. In particular, we consider interpretation of p-values, properties of the Wald statistic and other model fit criteria in the classical logistic regression framework. We explore the practical assessment of interactions, and strategies to counter common pitfalls, such as separation and perfect prediction. These are contrasted with Bayesian interpretation of credible intervals, highest posterior intervals, Bayes Factors and the use of posterior predictive checks to evaluate fit, and of convergence diagnostics to check computation. We demonstrate that the Bayesian approach circumvents issues of separation.
What is the main objective?
In this two-part series (Parts 1 and 2), we aim to develop your ability to critically understand and evaluate the results of a linear or logistic regression, produced in either a classical or Bayesian setting, and hence interpret output from standard statistical software and in published studies. Although you will gain hands-on experience doing logistic regression in your preferred software package (with support here for either R or Autostat), the emphasis will be on interpreting the outputs, which can be obtained using many different packages.
In this way, we guide you to develop basic statistical literacy skills in explanatory or predictive modelling using linear or logistic regression. In addition, we share techniques for eliciting and encoding prior information into statistical distributions. These not only consolidate and test your understanding of the regression models, they also prepare a foundation for many useful skills in: capturing the current state of knowledge before data is collected using an expert model (a prior predictive); preparing for the next study via modern techniques for design (e.g. simulations for sample size analysis); updating the current state of knowledge about effects (via priors in a Bayesian regression); or consolidating multiple sources of information via a classical meta-analysis (of effect sizes).
What form of teaching can I expect?
We emphasize an active learning approach, and encourage you to “try things out” as you go. Thus both workshops will oscillate between short presentations of concepts and activities, to give you time to practice, probe and discuss those skills. You may wish to work in pairs or threes, to overcome minor stumbling blocks in a timely fashion and also deepen your learning through critical debate and reflection. However, we support a preference to work alone.
We motivate and illustrate new concepts using a case study about shipping disasters.
Beginners will be encouraged to use the easy workflow and menus of the Autostat environment. Those with experience (or interest) in R may choose whether to use Autostat, and write R scripts that can be executed from within Autostat, or to work within R studio directly.
Assoc Prof Sama Low-Choy enjoys working with motivated investigators to answer questions that require statistical analysis. She is the Senior Statistician in the Office of the Pro-Vice Chancellor, Arts, Education & Law, Mt Gravatt campus, Griffith University. She takes a flexible and pragmatic approach, matching the problem, skills and resources to an appropriate paradigm: frequentist or Bayesian, parametric or non-parametric, or machine learning.
Dr Clair Alston-Knox is a a Senior Statistician with Pacific Analytics Group, Melbourne and is an Adjunct in the Office of the Pro-Vice Chancellor, Arts, Education & Law, Mt Gravatt campus, Griffith University.Her recent transition from an academia to a commercial environment has enabled her to become part of a large team of specialist analysts who are working on the development of AutoStat®, a new cloud based software providing access to a suite of statistical, ML and AI algorithms.
Daniela Vasco is a Ph.D. candidate at Griffith University. Her thesis in Applied Statistics is aligned with ARC Discovery Project on “Learning for Teaching in Disadvantaged Schools" led by Prof Parlo Singh and co-Principal supervisor is Assoc Prof Low-Choy. Her interests are statistical modelling of complex problems, model diagnostics, visualisation, and Bayesian inference and decision theory.
November 19, 2019
Part 1: Understanding and elicitation
November 28, 2019
Part 2: Inference and implementation
Gold Coast campus
10:00 AM - 4:00 PM
HDR candidates (external to Griffith University) : $150.00 per day
Members of either SSA or ASBA : $200.00 per day
Registrants external to Griffith University other listed than above : $250.00 per day
Please indicate your attendance by November 15, 2019
Click here to register.
Follow RED on Social Media for up to date event information