The Social Research Centre and the Statistical Society of Australia (SSA) are very proud to offer statistical training from the International Program in Survey and Data Science (IPSDS), a joint program of the University of Mannheim and the Joint Program in Survey Methodology at the University of Maryland.
Places are limited, please register early to take advantage of early bird discounts and secure a place.
Short Course Description
Social scientists and survey researchers are confronted with an increasing number of new data sources such as apps and sensors that often result in (para)data structures that are difficult to handle with traditional modeling methods. At the same time, advances in the field of machine learning (ML) have created an array of flexible methods and tools that can be used to tackle a variety of modeling problems. Against this background, this course discusses advanced ML concepts such as cross validation, class imbalance, Boosting and Stacking as well as key approaches for facilitating model tuning and performing feature selection. In this course we also introduce additional machine learning methods including Support Vector Machines, Extra-Trees and LASSO among others. The course aims to illustrate these concepts, methods and approaches from a social science perspective. Furthermore, the course covers techniques for extracting patterns from unstructured data as well as interpreting and presenting results from machine learning algorithms. Code examples will be provided using the statistical programming language R.
Timeframe:
September 24 – November 11, 2024. Weekly meetings at the following times:
▪ Week 1: Tuesday, September 24, 8:00-9:00 am AEST
▪ Week 2: Tuesday, October 1, 5:00-6:00 pm AEST
▪ Week 3: Tuesday, October 8, 5:00-6:00 pm AEDT
▪ Week 4: Tuesday, October 15, 5:00-6:00 pm AEDT
▪ Week 5: Tuesday October 22, 10:00-11:00 am AEDT
▪ Week 6: Tuesday October 29, 10:00-11:00 am AEDT
▪ Week 7: Tuesday, November 5, 10:00-11:00 am AEDT
▪ Week 8: Tuesday, November 12, 8:00-9:00 am AEDT
Course Objectives
By the end of the course, students will… ▪ have a profound understanding of advanced (ensemble) prediction methods ▪ have built up a comprehensive ML toolkit to tackle various learning problems ▪ know how to(critically) evaluate and interpret results from ''black-box'' models
Topics
1. Intro: Bias-variance trade-off, cross-validation (stratified splits, temporal cv) and model tuning (grid and random search)
2. Classification: Performance metrics (ROC, PR curves, precision at K) and class imbalance (over- and undersampling, SMOTE)
3. Ensemble methods I: Bagging and Extra-Trees
4. Ensemble methods II: Boosting (Adaboost, GBM, XGBoost) and Stacking
5. Variable selection: Lasso, elastic net and fuzzy/ recursive random forests
6. Support Vector Machines
7. Advanced unsupervised learning: Hierarchical clustering and LDA
8. Interpreting (Variable Importance, PDP, ...) and reporting ML results
Your instructor: Prof. Christoph Kern
Christoph Kern is Junior Professor of Social Data Science and Statistical Learning at the Ludwig-Maximilians-University of Munich and Project Director at the Mannheim Centre for European Social Research (MZES). He received his PhD in social science (Dr. rer. pol.) from the University of Duisburg-Essen in 2016. Before joining LMU Munich, he was a Post-Doctoral Researcher at the Professorship for Statistics and Methodology at the University of Mannheim and Research Assistant Professor at the Joint Program in Survey Methodology (JPSM) at the University of Maryland. His work focuses on the reliable use of machine learning methods and new data sources in social science, survey research, and algorithmic fairness.
Your instructor: Prof. Trent Buskirk
Current positions: ▪ Professor and Provost Data Science Fellow at Old Dominion University ▪ Novak Family Professor of Data Science, Chair and Director at Bowling Green State University ▪ Adjunct Research Professor at the University of Michigan
Dr. Buskirk is a Fellow of the American Statistical Association. His research includes the areas of Mobile and Smartphone Survey Designs, methods for calibrating and weighting nonprobability samples, and the use of big data and machine learning methods for health, social and survey science design and analysis. His research has been published in leading journals such as Cancer, Social Science Computer Review, Journal of Official Statistics, and the Journal of Survey Statistics and Methodology.
Prerequisites
Topics covered in Introduction to Machine Learning and Big Data (ML I), i.e.:
▪ Conceptual basics of machine learning (training vs. test data, model evaluation basics)
▪ Decision trees with CART
▪ Randomforests Familiarity with the statistical programming language R is strongly recommended.
Participants are encouraged to work through one or more R tutorials prior to the first-class meeting. Some resources can be found here:
▪ https ://rstudio.cloud/learn/primers
▪ http ://www.statmethods.net/
▪ https ://swirlstats.com/
▪ https ://www.rcommander.com
Grading will be based on:
▪ 4 homeworkassignments (10% each)
▪ 8 onlinequizzes (5% each)
▪ Participation in discussion during the weekly online meetings (20% of grade)
Early Bird Deadline
Please book before 5 July 2024 to take advantage of the Early Bird Deadline.
Disclaimer
Participants will receive access data for the online course, in particular to any learning platform that may be used. The rights of use connected to the access data are personally assigned to the participant. Passing on the access data is not allowed. Also, the temporary transfer to third parties is not permitted.
The right to use the transmitted access data, in particular with regard to any materials or video recordings provided, can only be exercised up to a maximum of 2 months after the program end. After expiration of this 2-months period, the access data will be deleted by Mannheim Business School (MBS). Before the expiration of this period, the participant may view the respective recorded course as often as desired and without time restriction.
If we have reasons to believe that the participant is abusing the right of use granted to him or that there is a violation of the terms of use, MBS reserves the right to change the participant’s access data as well as to partially or completely block the access or to prohibit the further use of the digital content.
Group bookings
For group bookings, please email events@statsoc.org.au with the names, email addresses, and telephone numbers of the participants in the group.
Cancellation Policy
Occasionally courses have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen.
Cancellations received prior to two weeks before the event will be refunded, minus the Stripe processing fee (1.75% + $0.30 per transaction) and an SSA administration fee of $20.
From then onward no part of the registration fee will be refunded. However, registrations are transferable within the same organisation. Please advise any changes to events@statsoc.org.au.
For any questions, please email events@statsoc.org.au