Menu
Log in


CPD169 - Introduction to Big Data & Machine Learning

  • 6 Feb 2024
  • (AEDT)
  • 27 Feb 2024
  • (AEDT)
  • Online

Registration


Registration is closed

The Social Research Centre and the Statistical Society of Australia (SSA) are very proud to offer statistical training from the International Program in Survey and Data Science (IPSDS), a joint program of the University of Mannheim and the Joint Program in Survey Methodology at the University of Maryland. Places are limited, please register early to take advantage of early bird discounts and secure a place.

About the course:

The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media and smartphones, just to name a few. Such data are often referred to as "big data" and can be used to create value in different areas such as health and crime prevention, commerce and fraud detection. Big Data are often used for prediction and classification tasks. Both of which can be tackled with machine learning techniques. In this course, we explore how Big Data concepts, processes and methods can be used within the context of Survey Research. Throughout this course we will illustrate key concepts using specific survey research examples including tailored survey designs and nonresponse adjustments and evaluation.

Presenters: Prof. Frauke Kreuter and Prof. Trent Buskirk

Frauke Kreuter holds the Chair of Statistics and Data Science at LMU Munich, Germany and at the University of Maryland, USA, she is Co-director of the Social Data Science Center (SoDa) and faculty member in the Joint Program in Survey Methodology (JPSM). She is an elected fellow of the American Statistical Association, and received the Warren Mitofsky Innovators Award of the American Association for Public Opinion Research in 2020.

In addition to her academic work, Professor Kreuter is the Founder of the International Program for Survey and Data Science (IPSDS), developed in response to the increasing demand from researchers and practitioners for the appropriate methods and right tools to face a changing data environment and has extensive experience with online instruction.

Trent Buskirk is Novak Family Professor of Data Science and Chair of the Applied Statistics and Operations Research Department at Bowling Green State University as well as an Adjunct Research Professor at the University of Michigan and a Fellow of the American Statistical Association. His research includes the areas of Mobile and Smartphone Survey Designs, methods for calibrating and weighting nonprobability samples, and the use of big data and machine learning methods for health, social and survey science design and analysis. His research has been published in leading journals such as Cancer, Social Science Computer Review, Journal of Official Statistics, and the Journal of Survey Statistics and Methodology.

Timeframe:

Course duration: February 6 – February 27, 2024

Weekly meetings: Tuesdays 10 AM AEDT – 11 AM AEDT.

Course objectives:

The course will cover
•    An overview of key Big Data terminology and concepts
•    An introduction to common data generating processes
•    A discussion of some primary issues with linking Big Data with Survey Data
•    Issues of coverage and measurement errors within the Big Data context
•    Inference versus prediction
•    General concepts from machine learning including signal detection and information extraction
•    Potential pitfalls for inference from Big Data
•    Key analytic techniques (e.g. classification trees, random forests, conditional forests) to process Big Data using R, with example code provided

Weekly topics:

1.    Overview of Big Data; Working with Big Data, Classical Statistical Approaches vs. Statistical Machine Learning
2.    Model Evaluation/Validation, K-Means Clustering
3.    Nearest Neighbours, CARTS
4.    Random Forests


Software:

Example code in R will be provided. R is downloaded for free from http://cran.r-project.org/. Participants may also find https://www.rstudio.com/ a helpful interface to execute program code. For those new to R, there are many MarinStatsLectures available at https://www.youtube.com/playlist?list=PLqzoL9-eJTNBDdKgJgJzaQcY6OXmsXAHU

Prerequisites

We recommend good understanding of the material typically taught in undergraduate statistics courses and some familiarity with regression techniques and fundamentals of survey and data science. While not a prerequisite, familiarity with the R software package (base R or R using Rstudio) is strongly encouraged. If you are unsure whether you meet the prerequisites please email events@statsoc.org.au describing your background and experience with sampling.

Reading:

There is no required textbook. Useful recommended resources and reading will be provided to participants as part of course materials. 

Grading:
•    4 online quizzes (worth 5% each)
•    Participation in discussion during the weekly online meetings and submission of questions to the weekly discussion forums demonstrating understanding of the required readings and video lectures (20% of grade)
•    3 homework assignments (worth 20% each)


Early Bird Deadline:
Please book before 15 December 2023 to take advantage of the Early Bird Deadline.

Disclaimer:
Participants will receive access data for the online course, in particular to any learning platform that may be used. The rights of use connected to the access data are personally assigned to the participant. Passing on the access data is not allowed. Also, the temporary transfer to third parties is not permitted.
The right to use the transmitted access data, in particular with regard to any materials or video recordings provided, can only be exercised up to a maximum of 2 months after the program end. After expiration of this 2-months period, the access data will be deleted by Mannheim Business School (MBS). Before the expiration of this period, the participant may view the respective recorded course as often as desired and without time restriction.
If we have reasons to believe that the participant is abusing the right of use granted to him or that there is a violation of the terms of use, MBS reserves the right to change the participant’s access data as well as to partially or completely block the access or to prohibit the further use of the digital content.

Group bookings

For group bookings, please email events@statsoc.org.au with the names, email addresses, and telephone numbers  of the participants in the group.

Cancellation Policy
Occasionally courses have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen.

Cancellations received prior to two weeks before the event will be refunded, minus a $20 administration fee. From then onward no part of the registration fee will be refunded. However, registrations are transferable within the same organisation. Please advise any changes to events@statsoc.org.au.

For any questions, please email events@statsoc.org.au

Powered by Wild Apricot Membership Software