Menu
Log in


Variation Explained

  • 12 Aug 2021 3:56 PM
    Reply # 10926393 on 10781234

    Hi Chris,

    to be clear: I advocate reporting and discussing both SD_y and SD_y|x, rather than their ratio (or 1 - ratio).  In many of my applications, these quantities have units that can easily be interpreted by stakeholders, so I find converting the model fit statistic into a unitless metric to be a backward step. 

    Cheers,

    Andrew

  • 11 Aug 2021 8:48 AM
    Reply # 10923526 on 10781234

    Thanks Andrew, Your measure sd(y|x)/sd(y) (or 1 minus that) is what I was suggesting. The reason for my post is that I have an Excel package that I jointly wrote and I am planning to change the standard summary of model accuracy to (a) adjusted r, not squared (b) sd(y|x) and (c) 1-sd(y|x)/sd(y).

    Another point of confusion for students is that for non-regression models the standard relation does NOT hold. Yet, standard pacakges report the r^2 statistic calculated as 1-MSE/VAR(Y). I find good students calcualte the correlation of fitted and observed and get a different answer. I then have to explain to them that the softeware calculates it using an alterantive formula that is known to be wrong for the non-regression model!

    It makes our profession look stupid and quixotic. At the very least, we could label it variation explained rather than r^2.

  • 10 Aug 2021 6:54 AM
    Reply # 10920812 on 10905377

    Andrew Robinson wrote:

    Hi Chris,

    I imagine that this application comes from the irrational passion for the additivity of the sums of squares that are associated with model components, and its triumphant but irrelevant invocation of Pythagoras.  

    R^2 is a sad example of preferring the elegant answer to the wrong question over the messy answer to the right question.  I have found it more useful to report and compare, sd_y to sd_y|x.

    In its sacrifice of interpretability in favour of geometry, R^2 is little more than Theory virtue signalling, not unalike the worship of the canonical link (or, indeed, the conjugate prior); canonised for algebraic elegance at the expense of statistical utility. 

    Warm wishes,

    Andrew 

    Well said.  Too often, analysis proceeds with what appears mathematically elegant.
  • 9 Aug 2021 4:36 PM
    Reply # 10905377 on 10781234

    Hi Chris,

    I imagine that this application comes from the irrational passion for the additivity of the sums of squares that are associated with model components, and its triumphant but irrelevant invocation of Pythagoras.  

    R^2 is a sad example of preferring the elegant answer to the wrong question over the messy answer to the right question.  I have found it more useful to report and compare, sd_y to sd_y|x.

    In its sacrifice of interpretability in favour of geometry, R^2 is little more than Theory virtue signalling, not unalike the worship of the canonical link (or, indeed, the conjugate prior); canonised for algebraic elegance at the expense of statistical utility. 

    Warm wishes,

    Andrew 

  • 29 Jul 2021 2:09 PM
    Message # 10781234

    We often say that r^2 measures variation explained. This is because for  regression (on training data at least) MSE is equal to VAR(Y) multiplied by 1-r^2. But variance does not measure variation. Variance is pretty meaningless. It is stdev that measures variation. 

    Why do we not use 1-RMSE/STDEV(Y) as variation explained*?

    *which equals 1-sqrt(1-r^2) is you really insist on connecting the two.

Powered by Wild Apricot Membership Software