NSW branch: Recent meetings

Modelling dependence in log-normal asset returns using the Gaussian copula
Dr Zdravko Botev (UNSW), NSW branch meeting 31 May 2017

The NSW branch held its May meeting in direct competition with the first State of Origin match, but still attracted a good turnout of interested attendees, who were treated to an engaging and informative presentation by Dr Zdravko Botev. Dr Botev is a DECRA research fellow at UNSW, with an interest in adaptive Monte Carlo methods for rare-event probability estimation.

Zdravko’s talk focused on modelling dependence of log-normal random variables, a problem that is particularly important in financial applications due to their central role in the widely used Black–Scholes model for asset pricing. Specifically, he presented examples of applications in which simulating or estimating the probability of a rare event may be of interest, but is made difficult by the dependence between assets in a portfolio.

Although — as long as we are comfortable with its symmetry — a Gaussian copula may appear to be a sensible tool to use here, the dependence disappears in the extremes at an exponential rate, which would imply that it may not provide accurate results for rare events. However, Zdravko showed that this asymptotic behaviour only kicks in slowly, with dependence languishing in the tails for a very long time: an approximation that assumes independence only becomes accurate for probabilities of the order of 10–250! By refuting this common criticism of the Gaussian copula, Zdravko suggested that it may be reasonable to use in this context after all.

When considering rare event simulation, the speaker identified three factors that influence the relative error of our estimates: the dimension of the problem, the sample size, and the parameter that determines the rarity of the event of interest. He highlighted that it is particularly important that the relative error grows slowly with this parameter, and a good method will behave well with respect to at least two of these factors.

However, this is not enough: not only do we require the estimate itself to be efficient, we want the estimate of its variance to be efficient. An example was provided of a method that overestimates its own accuracy, which should be avoided. For this, we need to consider second-order efficiency.

The speaker tied these ideas together with the example of the Bayesian LASSO. By leveraging the mathematical details of the model to identify an efficient posterior sampling method, the number of steps required to keep the second-order error low grows polynomially rather than exponentially as the event of interest becomes rarer.

Clearly the work behind this talk was impressive in its depth and technicality, but Dr Botev was able to make it accessible to the diverse audience and it was very interesting to get an insight into the complex problems (and solutions) in this area.

Mark Donoghoe
on behalf of the SSA NSW Branch Council

 

Quality metrics for ongoing sample data collections
Dr Margo Barr (Sax Institute), NSW branch meeting 20 June 2017

The June meeting of SSA NSW branch saw a very eager audience gather at the University of Technology Sydney. This meeting we heard from Dr Margo Barr, who is currently Study Director for the 45 and Up Study at the Sax Institute. We were very fortunate to be able to hear lessons learned from 30 years of experience across epidemiology, public health and survey methodology.

A focus was placed on quality metrics for survey collections through the lens of a total survey error framework. Margo presented a generalised approach to assist in making quality assessments between surveys differing in methodology, key domains and/or geography. An example included a comparison of public health studies between different states and territories in Australia.

Discussion was then shifted towards industrialising the quality assurance mechanism within the context of many complex weight adjustments being applied to unit record data. Margo’s team has made use of an end-to-end cycle from data acquisition (interviews) through cleansing, editing, weighting, estimation and finally reporting. Quality checks at each stage in the cycle ensure sources of error are identified early and are able to be addressed.

Following the formal presentation was a very engaging Q and A, where Margo reflected on a variety of emerging considerations within the field of statistical collections and survey sampling. Such include: haphazard samples derived from administrative by-products or ‘big data’ sources; difficulties in compiling population frames with diminishing usage of landline telephones (and increased online presence); and the benefits of ongoing collaboration with academia.

Finally, many attendees took Margo to dinner at a nearby restaurant — everyone was very well-behaved. On behalf of the Statistical Society of Australia’s NSW branch, I’d like to thank Dr Barr for her time and fascinating insights.

Ryan Defina
on behalf of the SSA NSW Branch Council

 

Generalised linear latent variable models
Dr Sara Taskinen (University of Jyväskylä), NSW branch meeting 12 July 2017

The speaker at our July meeting was Dr Sara Taskinen, visiting UNSW from the University of Jyväskylä, Finland. She spoke on the use of Generalised Linear Latent Variable Models (GLLVMs) for modelling ecological data. These models are appropriate when abundance data on different species at each site may be well-modelled using a Generalised Linear Model (GLM) but such data is available at a large number of different sites, and abundances are correlated across species.

A low-dimensional vector of latent (standard multivariate normal) random effects for each site is introduced into the linear predictor. The latent variables, as well as providing a mechanism for explaining the observed correlation structure in the data, provide an interesting dimension-reduction feature: instead of representing each site by its (possibly high-dimensional) vector of covariates, the lower-dimensional vector of “estimated” latent variables provides an easier-to-visualise representation of the sites.

Indeed for the Finnish peatlands amoeba composition example, the estimated latent effects did a very good job of separating, in a two-dimensional scatterplot, the three types of peatland found: natural peatlands, peatlands used for forestry and “restored” peatlands that were formerly used for forestry.

The talk as a whole was a very good balance between a detailed discussion of the application and an in-depth, but accessible, explanation of the technical properties of the various methods involved in the work. In particular, Sara gave an excellent comparison of various computational strategies for dealing with the integrals one encounters when such latent variables are introduced. The usefulness of certain variational approximation methods, pioneered by local researchers, was highlighted.

Michael Stewart, Branch President
on behalf of the SSA NSW Branch Council

 

Automated vehicles, big data and road safety
Prof Ann Williamson (Transport and Road Safety Research Centre, UNSW), NSW branch meeting 23 August 2017

On 23 August 2017, the NSW Branch had the privilege to have Prof. Ann Williamson, Director of the Transport and Road Safety (TARS) Research Centre and Professor of Aviation Safety at UNSW Sydney, to give a talk at the University of Sydney entitled “Automated vehicles, big data and road safety”.

With automated vehicles only a few years away and driver assisted technologies already present in the most recent cars, many questions arise and opinions often diverge on the benefits of automation. Ann has thus generously shared some of her experiences and insights with the audience, on the challenges that come with this type of innovation.

Some issues in human interaction with automation are well known, for example the more advanced the control system is the more crucial the operation required by the human will be. Other issues also find their origin in the automated driver assist system which can misinterpret some roadside furniture (some situations have already been reported).

Altogether many factors need to be apprehended and evaluation studies of driver assistance are strongly required. Ann listed the important issues that such studies should provide insights on for automation to improve road safety. The increase amount of passive control required by the driver is challenging and the question of who drives when needs to be asked. Often a driver will regain full control of the vehicle when the situation is critical which will necessitate a quick reaction and strong driving skills as well as knowledge of the vehicle. The design of the vehicle (parabolic mirrors, A-pillar) as well as in the vehicle (warnings provided to the driver) are also critical.

To test new technologies, data are collected via naturalistic driving studies. In Australia, such a study is led by UNSW Sydney in collaboration with other Australian universities and several government and industry partners. The data collected comprises about 360 drivers and weighs in at over a few tens of terabytes. More information can be found at http://www.ands.unsw.edu.au.

Ann concluded her brilliant talk by saying that it is still too early to conclude that autonomous vehicles will improve safety. Her opinion is that the focus should be on automated the driving task, not the vehicle, drivers need to understand their role immediately or to be able to operate the assistive technology.

Ann’s talk was very well attended and the audience was very responsive to her talk which finished with a discussion session.

Boris Beranger
on behalf of the SSA NSW Branch Council

 

Genevera Allen Public Lecture and Workshop
28 & 29 September 2017

The September meeting for the NSW Branch doubled as a public lecture by Genevera Allen of Rice University, Houston. Genevera, a former student of Rob Tibshirani, is an active researcher with a high public profile, assisting scientists from diverse fields in making sense of big data. She spoke about some recent work of hers in the field of hierarchical clustering. In particular some recent new methods combine superior statistical performance, computational speed and interpretability (via a “dendogram”). Furthermore, adjusting a certain tuning parameter gives a “solution path” which allows researchers to “watch” how their data form into clusters.

However the best thing about the talk was Genevera’s use of examples. The centrepiece of these was an analysis of vocabulary from every US President’s inauguration speech, motivated in part by the claim made in certain quarters that President Trump’s inauguration speech was the “worst ever”. Unsurprisingly, the Presidents formed into quite well-defined clusters, mainly determined by time; the Founding Fathers formed a cluster, the civil war-era Presidents formed a cluster and in particular the post-war Presidents formed a cluster (in part due to the use of words such as “Soviet”, “nuclear”, etc.). President Trump did fall in the cluster of post-war Presidents, although when following the “solution path”, he was last to join!

The talk was a great success, prompting much discussion.

Genevera backed up early the following morning to conduct an Introduction to Unsupervised Learning workshop for the Society which had over 30 attendees. We were very lucky to have Genevera contribute to the Society’s activities in this way.

Michael Stewart, Branch President
on behalf of the SSA NSW Branch Council

Follow

Get the latest posts delivered to your mailbox:

Show Buttons
Hide Buttons