Statistical Society of Australia - Is k-fold cross-validation worth teaching?

Member Login Join Now

Back to topics

Is k-fold cross-validation worth teaching?

Show oldest replies on top

Subscribe to topic

15 May 2026 9:22 AM

Quote

Reply # 13632013 on 13628615

Chris Lloyd

Patrick,

I had not heard Neyman’s quote about induction. I guess he and Popper would not have had a beer together!

I think my main problem is that CV, in the context I have described it, does not answer the question I care about. I think I can see what its use is now, see my response to John.

Interestingly, AI warned me off looic – not that I was considering it. It claimed that it increases the variance of the predictive accuracy estimate.

15 May 2026 9:21 AM

Quote

Reply # 13632012 on 13628615

Chris Lloyd

John,

Yes, I agree with you. Training + validation for selection of model complexity and then testing data to estimate final accuracy. That is exactly what I have taught in the past.

After chatting with colleagues, it seems the best use of CV is to select a model from the training data, for instance the complexity of a tree. So instead of training + validation you us CV on a larger training set. But once you select the tree, you need testing data to evaluate its final accuracy.

15 May 2026 9:20 AM

Quote

Reply # 13632011 on 13628615

Chris Lloyd

Thanks you all for your interesting replies. I will respond in order, in separate posts.

Andrew,

I think that many questions can be answered from a frequentist perspective. What is the coverage of this confidence interval? What is the standard error of this estimate? There are Bayesian answers to these questions, but you can get a sensible and relevant answer without the need to invoke priors.

It seems to me that CV does answer a frequentist question, but it is not a question I want to ask. I want to know the out-of-sample predictive accuracy of this model, because I am going to use this model in future.

A proper conditional frequentist answer would be conditional on the model but no other aspects of the data. If you do not care about model mis=specification then bootstrap can give a good approximation to this. But we should care about model mis-specification. Bayesians do this automatically, because they condition on the entire data, which includes the selected model that it resulted in.

So, if you want to know out-of-sample accuracy, I think you need some out-of-sample data. Just no way out of it.

14 May 2026 10:52 PM

Quote

Reply # 13631725 on 13628615

Patrick Graham

Hi Chris,

That is an interesting post. I enjoy your posts and musings. Please keep them coming.

You may well be becoming more Bayesian - certainly seem to be having a few doubts about an important piece frequentist logic. What seems to be troubling you is the leap required to apply the results of a CV exercise, that evaluates your modelling strategy applied a set of subsets of your data, to your actual fitted model, i.e your modelling strategy fitted to the full dataset. This seems to require a principle similar to Neyman's idea of inductive behaviour - (applied to confidence intervals this is roughly "if we know that under repeated sampling we know that 95% of intervals obtained using our method for constructing 95% confidence intervals contain the true value, we may as well act as though the one 95% confidence interval we get to compute from the data we actually observe includes the true parameter value") If the leap from average performance of a modelling strategy in a CV exercise to the performance of the actual fitted model is problematic then it seems so is the leap from the average performance of interval estimators in hypothetical repeated sampling to the properties of the one interval that can be computed from the actual observed data. Hence your internal conflict - perhaps. Or is there a difference?

An interesting wrinkle in this discussion is the use of leave -one-out cross validation in the Bayesian world to approximate the expected log predictive density, as implemented in the looic package and in various papers by Vehtari and Gelman et al. You could write a paper for Bayesian Analysis criticising looic for not being Bayesian enough!

On the other hand, life is hard. If there is no test dataset available to evaluate models are better to stay with measures like WAIC that use all the data and try adjust approximately for model complexity?

Patrick

12 May 2026 7:06 AM

Quote

Reply # 13630614 on 13628615

John Maindonald

The benefits of k-fold cross-validation are surely 1) that it can be used to account for the selection effects of the strategy used to select the model and 2) that it uses all the data. Obviously, it does not account for any difference between the data used to train the data and the data to which it will be applied. For that a training/test approach is needed, or if one is being very careful, a training/validate/test approach. These sorts of approaches are crucial if AI is used in any substantial way, e.g., to select a model.

11 May 2026 5:16 PM

Quote

Reply # 13630323 on 13628615

Andrew Robinson

Hi Chris,

that's a characteristically interesting question. I think that your claim is correct: the leave-k-out cross-validation estimates the uncertainty that arises from modeling strategy.

But I don't see how you, as an avowed hard-core Frequentist, could fail to find that attractive! I've always been attracted to the idea of paying an honest uncertainty price for an honest modeling effort. I recall that Paul Kabaila has done considerable elegant work on confidence intervals with coverage that allows for model selection.

I think it's also an appealing way to get the students to think about what they are doing.

What would you do instead, if anything?

Andrew

6 May 2026 5:23 PM

Quote

Message # 13628615

Chris Lloyd

I have been interrogating AI about k-fold cross-validation for a course I am writing. I thought I understood it before, but I understand it better now. AI is really brilliant for this kind of thing. It is like talking to an enthusiastic RA who is smart and knowledgeable, makes some mistakes, but always sucks up to you because you hold the research grant. ;)

Anyway, it seems to me that the cross-validation is b-s, at least in any application I can think of. So I am intending not to teach it, or at least give it a couple of slides and say not to bother.

Hear me out.

CV gives you the accuracy of the modelling strategy, not the model you actually use. It estimates accuracy averaged over different possible models from different random subsets of the data. And then at the end, you apply the modelling strategy to all the data and say that the CV estimate described it. But it doesn’t describe the model; it describes the modelling strategy.

Now I am a pretty hard core frequentist. But in this case, I want to estimate future prediction accuracy conditional on all the (training) data that led to the model. I do not want to average over models that might have been!

It's like estimating my life outcome averaging over different decisions I could have made over the last 68 years.

That’s why Kaggle just uses partition – leaving some data out and judging the winner based on this test data.

Help me out here! Have I completely missed the point? Am I becoming a closet Bayesian in my old age?!

It could happen.!Fred Hoyle became a catholic after all....

Statistical Society of Australia (SSA)

PO Box 213

Belconnen ACT 2616 Australia

02 6251 3647

www.statsoc.org.au

ABN 82 853 491 081

Please direct enquiries to:

the SSA Team via email at

contact@statsoc.org.au

@StatSocAus

Privacy Security Sitemap

Website by Converge Design