Menu
Log in


CPD145 - Time Series Data Mining with Python

  • 18 Nov 2021
  • (AWST)
  • 19 Nov 2021
  • (AWST)
  • 2 sessions
  • 18 Nov 2021, 1:30 PM 6:00 PM (AWST)
  • 19 Nov 2021, 1:30 PM 6:00 PM (AWST)
  • Online
  • 15

Registration

  • Discounted registration for SSA Members.
  • Discounted, prioritised registration for members of WA Branch of SSA.
  • Registrations for those who are not members of SSA.
  • Discounted registration for Retired Members of SSA.
  • Discounted registration for Retired Members of the WA Branch of SSA.
  • Discounted registration Student Members of SSA.

Registration is closed
The WA Branch of the Statistical Society of Australia warmly invites you to a workshop on Time Series Data Mining with Python, presented by Dr Manuel Herrera of the University of Cambridge (UK). This workshop consists of two sessions, on 18th (Thursday) and 19th (Friday) of November 2021.

Manuel is a Research Associate in distributed intelligent systems at the Engineering Department of the University of Cambridge and a Royal Statistical Society Fellow. He works on engineering statistics and predictive analytics for smart and resilient critical infrastructure, having many years of experience on the management and maintenance of the UK national infrastructure.

Workshop Abstract

This two-day workshop aims to enable students and practitioners in data science to add methodologies of time-series data mining to their skill-set for future applications both for academic and industry projects. After an introduction to Python for time series analysis, the workshop explores data mining techniques for pattern extraction in time series, ranging from dimensionality reduction to anomaly detection. Participants will benefit from data wrangling for time series analysis with Python on the first day and a practical overview of time-series data mining tools on the second day.

Prerequisites

Internet connection, basic knowledge of coding (e.g. in R) and time series concepts, ideally a Google Drive account (recommended but not compulsory).

Workshop content

Day 1 - Fundamentals of time series analysis with Python

This is a hands-on course for learning the basics of data wrangling and time series analysis with Python.

We will begin the course with a quick introduction to Python and the Google Colab environment enabling a Jupyter notebook service to run Python code on a web browser with no setup requirements. We will then explore the use of libraries such as pandas, numpy and matplotlib to data acquisition, timestamping, preprocessing and visualization. We will continue the session by introducing the fundamentals of time series analysis. Throughout the workshop you will gain experience implementing these analysis in Python in real-life case-studies.

At the end of this module you will be able to:
  • Get familiar with Python and the Google Colab environment.
  • Use the Python libraries pandas and matplotlib to import, preprocessing, and data visualisation.
  • Work on time series data analysis with the Python libraries pandas and statsmodels.

Day 2 - Introduction to time-series data mining

This workshop will introduce time-series data mining techniques using Symbolic Aggregate approXimation (SAX) with the specifically dedicated Python library saxpy, as well as with tslearn which provides more general machine learning tools for the analysis of time series data. We will see the benefits of the data dimension reduction using SAX, as well as its possibilities on the application further of clustering and classification techniques.

Matrix profile is a more advanced technique than SAX for time-series data mining. The workshop will introduce its theoretical basics while using the Python library matrixprofile for motif and novelty/discord discovery. The first, aiding to extract the most common patterns in a time series and the latter, to detect points and subsequences of potential anomalies. Other data mining problems, such as clustering and shapelet discovery for time series classification, will also be explored.

The workshop will cover:
  • Use the Python library saxpy to work with SAX on time-series dimension reduction, clustering and classification.
  • Explore the Python library tslearn for basic analysis based on SAX as well as for other machine learning techniques for time series.
  • Work on time-series data mining using matrix profile and the Python library matrixprofile.
  • Matrix profile analysis will include the discovery of time series discords that will lead to new possibilities for anomaly detection.

Timetable

All times in Australian Western Standard Time (AWST UTC+8).

Day 1

Time

Task

Outcome

13:30

1. Working environment

What is Google Colab about?

13:45

2. Basics of Python

How can I import/export time series in Python?

14:00

3. Basics of Python

How can I make preprocessing of time series data?

14:45

4. Basics of Python

How can I plot time series data?

15:30

5. Afternoon Coffee

Break

16:00

6. Basic patterns in time series

How can a time series be split into its main components?

16:15

7.Stationarity

How to identify if a series is stationary or not? How to make a time series stationary?

16:45

8. Missing data

How to treat missing values in a time series?

17:15

9. Basic analysis and forecasting

How to compute partial autocorrelation function? How to build a forecasting model using ARIMA?

17:45

10. Revision

Q&A and reserved time for participants

Day 2

Time

Task

Outcome

13:30

1. Intro to SAX

What is SAX about?

13:45

2. SAX representation

How can I reduce the dimension of a time series?

14:30

3. SAX for time series clustering

How can I use SAX for time series clustering?

15:00

4. SAX for time series classification

How can I use SAX for time series classification?

15:30

5. Afternoon Coffee

Break

16:00

6. Intro to matrix profile

What is matrix profile about?

16:30

7.Matrix profile for pattern discovery

How can I discover motifs and discords in a time series? Are those discords anomalies?

17:00

8. Other data mining tools

What are shapelets and how can I discover them in a time series? How can I make clustering of multiple time series?

17:30

9. Revision

Q&A and reserved time for participants

Registration Information

Members of the WA Branch of SSA will have priority access to registration for one-week before opening to participants outside of WA. Please contact ssa.wa.secretary@gmail.com for your registration code.

This workshop will be conducted via Zoom and Slack. An invitation to the Slack workspace will be sent to participants a few days prior.

Powered by Wild Apricot Membership Software