Queensland Branch Meeting, December


TIME:          4:45 for refreshments followed by talk at
                     5:00PM, Tuesday 5 December, 2017

COST:         Free

VENUE:      Queensland University of Technology, Gardens Point Campus, S Block, Room 307

       Members and guests are welcome to join the speaker afterwards at a nearby restaurant.

TITLE:          A DRY approach to efficient data analysis workflow and reproducible research

SPEAKER:  Dr Peter Baker, School of Public Health, The University of Queensland, Australia

ABSTRACT: The data analysis cycle starts a lot earlier than most researchers appreciate. Planning and organising workflow before collecting the first data value and flexibility as a project develops is important. Complex data analysis projects often consist of many steps that may be repeated any number of times. Like many statistical consultants, I have often found myself repeating the same steps when analysing data for different projects. Standardising our approach, reusing R syntax, writing our own functions and even incorporating them into R packages improves efficiency and saves time. So does using R Markdown and Sweave for reports and presentations. Substantial gains can also be made by modularising code and employing computing tools like ‘GNU make’ to regenerate output when syntax or data files change. My Don’t Repeat Yourself (DRY) approach also aids reproducible research, especially when combined with a version control system like git.

Since the early 90s, I’ve employed version control systems and ‘GNU make’ to “project manage” data analysis using GENSTAT, BUGS, SAS, R and other statistical packages. Also, I was an early adopter of Sweave and later R Markdown for reporting because it aids reproducibility and also fits nicely with this approach. My overall strategy will be briefly described and illustrated. For ‘GNU make’ pattern rules, preliminary R packages and examples see https://github.com/petebaker

BIOGRAPHY: Peter has worked as a statistical consultant and researcher in areas such as agricultural research, Bayesian methods for genetics, health, medical and epidemiological studies for thirty years. He is a Senior Lecturer in Biostatistics at the School of Public Health, UQ where he also acts as a senior statistical collaborator and adviser to several research projects in the Faculty of Medicine.


