The Statistical Computing and Visualisation section of the Statistical Society of Australia proudly presents the Di Cook Award webinar, promoting open-source statistical software development. This event will showcase outstanding work submitted to the 2025 Di Cook Award.
The award winner will present their winning project in a 30-minute talk, highlighting the motivation, methods, and key insights behind their work. Two highly commended applicants will also share their submissions, each delivering a
15-minute presentation that outlines the ideas, approaches, and contributions of their projects.
Together, these talks will provide an engaging overview of innovative software development represented in this year’s award submissions, offering attendees the opportunity to learn from and discuss exemplary work in the field.
Date: Wednesday 8th April 2026
Time: 7:00pm - 8:00pm AEST
Online via Zoom: details provided upon registration
Key Learnings from attending webinar
- Understand innovative software approaches: Learn about the design, development, and purpose.
- Explore practical implementation and techniques: Examine the tools, methods, and workflows used by the presenters to build and deliver their software solutions.
- Identify ideas for future development: Gain insights and inspiration that can inform participants’ own software development, research, or practice.
2025 Di Cook Award Winner: Harriet Mason
Title: ggdibbler: Add Uncertainty to Data Visualisations
For information to be passed into ggplot2, or any visualisation software, it usually needs to be expressed as data. This restriction prevents us from visualising inputs that are too uncertain to be expressed as single data values. This can include things like estimates, model predictions, bounded values or observations with large measurement error. While there is a wealth of visualisation software designed to address this problem, these solutions are often limited to specific plot or data types with bespoke syntax. The lack of generality has significantly restricted the adoption of uncertainty visualisation by the broader statistics community, and established uncertainty as something to be ignored. This problem is alleviated by ggdibbler. Now, users can simply replace a vector of data with a vector of random variables created using distributional and visualise them with ggdibbler. The power of ggdibbler is in its simplicity and flexibility, as the software allows you to visualise any combination of uncertain variables using the familiar syntax of ggplot2.
Bio: Harriet Mason

Harriet Mason is a final-year PhD student in computational statistics at Monash University. She is currently completing her thesis, which focuses on developing new theory and tools for uncertainty visualisation. Her research spans statistics, visualisation, and computation, and often results in research software, including the cassowaryr and ggdibbler R packages.

Highly Commended: Cynthia Huang
Title: xmap: Transforming Data Between Statistical Classifications
Social science research often involves harmonising data from multiple sources. For example, analysts often must resolve differences between country-specific occupation classification standards to compare labour statistics from multiple countries. Harmonised datasets involve both domain expertise and technical data-wrangling skills. Unfortunately, details of the harmonisation logic are often lost in the idiosyncrasies of bespoke data preparation scripts and ad-hoc documentation, making it difficult for others to validate or reuse harmonisation efforts. The {xmap} package addresses these challenges with a new framework and tools for data harmonisation using 'crossmap' tables. The crossmap framework unifies and simplifies the specification, implementation, validation, and documentation of recoding, aggregating and splitting operations. Crossmaps extend existing crosswalk/look-up table approaches to support one-to-many and many-to-many relationships between alternative classification standards, in addition to one-to-one and many-to-one recoding. The package also provides built-in safeguards to avoid data leakage and graph-based methods for standardised documentation.
Bio: Cynthia Huang

Cynthia Huang is a Post-Doctoral Research Fellow at LMU Munich working at the intersection of social data science, statistical programming and human-centered computing, with a particular focus on data preprocessing and visualisation principles and tools. She completed her PhD at Monash University, with a thesis entitled: "Unified Principles and Tools for Complex Datasets and Data-Driven Workflows".

Highly commended: Danyang Dai
Title: metaextractoR: An R Package for Streamlined Data Extraction in Systematic Reviews and Meta-Analyses
Systematic reviews and meta-analyses are pillars of evidence-based medicine, yet data extraction remains labor-intensive and prone to error. Although recent advances have demonstrated the potential of Large Language Models (LLMs) to assist in data extraction, their implementation has largely been limited to proof-of-concept studies, with barriers to broader adoption among researchers. To address these challenges, we present metaextractoR, an open-source R package designed to facilitate and streamline data extraction in systematic reviews and meta-analyses by integrating open-source LLMs via the Ollama framework. metaextractoR enables automated, structured, and reproducible extraction workflows within interactive Shiny applications, in alignment with Cochrane guidelines and veridical data science principles. Our package incorporates three modular Shiny apps to implement a human-in-the-loop framework: (1) a manual extraction interface for abstracts, (2) refining prompt engineering and model selection, and (3) validation of LLM-generated outputs. These apps enable researchers to iteratively collaborate with LLMs, with each abstract subjected to double extraction – once by a human and once by an LLM – to emulate double extraction recommended by the Cochrane Handbook. Notably, the package runs fully on local machines, with no need for API setup or external data transfer, maximising data privacy and accessibility. Robust logging features further enhance transparency and reproducibility by recording all prompt iterations and outputs. By providing a principled and user-friendly framework, metaextractoR lowers technical barriers and empowers the evidence synthesis community to conduct more efficient and transparent reviews.
Bio: Danyang Dai

Danyang Dai is a final year PhD student at the Queensland Digital Health Centre. She is an applied statistician and infectious‑disease epidemiology researcher specialising in large‑scale health data analysis, disease surveillance, and advanced statistical modelling. Her work spans multinational COVID‑19 studies, health‑economic modelling, and the development of reproducible analytic tools to support evidence‑based public health decision‑making.