Dr James Chipperfield of the Australian Bureau of Statistics (ABS) delivered the 2017 Ken Foreman Lecture on the 29th August. James has 20 years experience as an official statistician and is an Adjunct Associate Professor at the University of Wollongong. This annual lecture honours Ken Foreman, a pioneer who was instrumental in the introduction of sampling and time series analysis to Australian official statistics.
James gave a thought-provoking presentation reviewing ABS data integration developments, an arena with many tensions. For example, the Australian Commonwealth Government’s position (as per its Public Data Policy Statement) is to release non-sensitive data by default. The ABS, however, is governed by the Census and Statistics Act which has strict provisions on respondent confidentiality.
James spoke on the possibilities opened up by administrative datasets. These are cheaper for the statistics agency and respondent, and provide a high level of detail and large sample sizes. However, these datasets are primarily compiled for operational not statistical purposes. They may therefore lack quality control for some fields and their concepts and population may differ from the statistical target. They often lack breadth of content because they are typically used for a narrow purpose.
Integration between multiple datasets, including between surveys and administrative datasets, can increase the breadth of content available. For example, by data matching, both family situation and household income may be brought together into the same dataset. Data matching may be subject to linkage error, and James expounded on this problem and some solutions, including deterministic linkage and probabilistic linkage. James cited the D-MAC SAS macros as being a world-leading infrastructure for deterministic linkage developed by the ABS.
James noted that confidentiality is a key consideration in data linkage. Disclosure risk must be assessed in the context of
- people (i.e. those accessing data);
- projects (and the consequent data requirements);
- settings (where and how data is accessed);
- data (the actual data made available); and
- output (can confidential data potentially be inferred from output).
James discussed the methodology behind the ABS’ remote analysis server which allows remote access to datasets which would not be released via a CD Rom to users because of the unacceptable disclosure risk.
The well-attended presentation was followed by dinner at a nearby restaurant where there was lively conversation between statisticians in different sectors.
Dr Robert Clark
Canberra Branch Vice-President