Menu
Log in


CPD 143 - Machine learning with Python

  • 13 Nov 2021
  • (AEDT)
  • 14 Nov 2021
  • (AEDT)
  • 2 sessions
  • 13 Nov 2021, 9:00 AM 12:00 PM (AEDT)
  • 14 Nov 2021, 9:00 AM 12:00 PM (AEDT)
  • Zoom link available upon registration
  • 0

Registration

  • Discounted registration for both (half) days for an SSA member.
  • Discounted registration for both (half) days for a non-member.
  • Discounted registration for both (half) days for an SSA student member.
  • Discounted registration for one (half) day for an SSA member.
  • Discounted registration for one (half) day for a non-member.
  • Discounted registration for one (half) day for an SSA student member.

Registration is closed

Statistical Society of Australia warmly invites you to a workshop on machine learning with Python, presented by Patrick Robotham from Linktree. This workshop consists of two sessions, on 13th (Saturday) and 14th (Sunday) of November. 

Patrick is a Staff Machine Learning Engineer at Linktree. He works to build production ready machine learning and statistical models and has 7 years of experience in industry.

 

WORKSHOP ABSTRACT

This two day workshop aims to enable data scientists to incrementally incorporate Python in their workflow. After an introduction of Python basics, the workshop focuses on developing Python models in a workflow framework that is most commonly seen in a production environment. Participants will benefit from a gentle introduction to Python on the first day before learning some powerful modelling concepts and tools on the second day. 


WORKSHOP CONTENT

Day 1 Getting Started with Python and Pandas

This is a hands-on course for learning the basics of Python and data manipulation with the Pandas library. 

We will begin this course with a gentle introduction to the basics of Python like variables assignments and data type conversions. We will then dive into Pandas which is the most popular package for manipulating tabular data in Python. We will end this session by making some basic plots for our data. Throughout the workshop you will program a sequence of Jupyter notebooks and gain experience in working with data in Python. 

At the end of this module you will be able to:

  • Understand the basic data types in Python and how to convert between them. 

  • Use the Python libraries pandas to import and manipulate data.

  • Use matplotlib to make basic visualisations on data. 

Day 2 Introduction to Machine Learning

This workshop will teach you how to use the scikit-learn library to construct regression/classification models, tuning model parameters and evaluating model performance. 

The scikit-learn library supports most of the standard classification, regression and clustering models that we regularly use everyday as statisticians and data scientists. In addition, scikit-learn offers a unique “workflow” framework that can wrap most data manipulations, scaling, imputations, tuning and evaluation together, which provides a consistent standard for machine learning model deployment.

The workshop will cover:

  • Use the Python libraries pandas and numpy to import and manipulate data.

  • Use scikit-learn to construct linear and tree-based models.

  • Know the difference between classification and regression.

  • Evaluate a predictive model with appropriate metrics and plots.

  • Improve a machine learning model using hyperparameter tuning.

  • Perform necessary scalings and imputation on the data. 

  • Standardisation of model deployment using pipelines.

Timetable

Day 1

Time

Task

Outcome

09:00

1. Running and Quitting

How can I run Python programs?

09:15

2. Variables and Assignment

How can I store data in programs?

09:35

3. Data Types and Type Conversion

What kinds of data do programs store? How can I convert one type to another?

09:55

4. Built-in Functions and Help

How can I use built-in functions? How can I find out what they do? What kind of errors can occur in programs?

10:20

5. Morning Coffee

Break

10:35

6. Libraries

How can I use software that other people have written? How can I find out what that software does?

10:55

7. Reading Tabular Data into DataFrames

How can I read tabular data?

11:15

8. Pandas DataFrames

How can I do statistical analysis of tabular data?

11:45

9. Plotting

How can I plot my data? How can I save my plot for publishing?

 

 

 

Day 2

Time

Task

Outcome

09:00

1. Quick revision and set up

A quick recap of Day 1

09:10

2. Regression Models

What is a regression model and how can we fit one using scikit-learn?

09:35

3. Classification Models

What is a classification model and how can we fit one using scikit-learn?

09:55

4. Dummy encoding, scaling and imputation

What kind of manipulations should we apply to our data before we can fit a model?

10:20

5. Morning Coffee

Break

10:35

6. Cross Validation

How is cross validation used to evaluate model performance?

10:55

7. Hyperparameter Tuning

How can we make our model more accurate and flexible?

11:15

8. Pipelines

How can we wrap all preprocessing steps and model tuning and evaluations under a consistent framework?

11:45

9. Revision

Q&A and reserved time for participants

 

 

Expenses:

Occasionally workshops have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen. Please note that the Society will not be held responsible for any financial loss incurred due to a workshop cancellation.

Financial Support:

Financial support for SSA Vic members can be sought. For further information, please see https://statsoc.org.au/News-and-media-releases/10424132.

Contact:

Please contact the organisers: Patrick Robotham (patrick.robotham2@gmail.com) and Kevin Wang (kevinwangstats@gmail.com) for further details.

Powered by Wild Apricot Membership Software