Lewandowsky’s Fake Correlation
Stephan LewandowskyLewandowsky’s most recent blog post really makes one wonder about the qualifications at the University of West Anglia Western Australia.
Lewandowsky commenced his post as follows:
The science of statistics is all about differentiating signal from noise. This exercise is far from trivial: Although there is enough computing power in today’s laptops to churn out very sophisticated analyses, it is easily overlooked that data analysis is also a cognitive activity.
Numerical skills alone are often insufficient to understand a data set—indeed, number-crunching ability that’s unaccompanied by informed judgment can often do more harm than good.
This fact frequently becomes apparent in the climate arena, where the ability to use pivot tables in Excel or to do a simple linear regressions is often over-interpreted as deep statistical competence.
I mostly agree with this part of Lewandowsky’s comment, though I would not characterize statistics as merely “differentiating signal from noise”. In respect to his comment about regarding the ability to do a linear regression as deep competence, I presume that he was thinking here of his cousin institute, the University of East Anglia (UEA), where, in a Climategate email, Phil Jones was baffled as to how to calculate a linear trend on his own – with or without Excel. At Phil Jones’ UEA, someone who could carry out a linear regression must have seemed like a deity. Perhaps the situation is similar at Lewandowsky’s UWA. However, this is obviously not the case at Climate Audit, where many readers are accomplished and professional statisticians.
Actually, I’d be inclined to take Lewandowsky’s comment even further – adding that the ability to insert data into canned factor analysis or SEM algorithms (without understanding the mathematics of the underlying programs) is often “over-interpreted as deep statistical competence” – here Lewandowsky should look in the mirror.
Lewandowsky continued:
Two related problems and misconceptions appear to be pervasive: first, blog analysts have failed to differentiate between signal and noise, and second, no one who has toyed with our data has thus far exhibited any knowledge of the crucial notion of a latent construct or latent variable.
In today’s post, I’m going to comment on Lewandowsky’s first claim, while disputing his second claim. (Principal components, a frequent topic at this blog, are a form of latent variable analysis. Factor analysis is somewhat different but related algorithm. Anyone familiar with principal components – as many CA readers are by now – can readily grasp the style of algorithm, though not necessarily sharing Lewandowsky’s apparent reification.)
In respect to “signal vs noise”, Lewandowsky continued:
We use the item in our title, viz. that NASA faked the moon landing, for illustration. Several commentators have argued that the title was misleading because if one only considers level X of climate “skepticism” and level Y of moon endorsement, then there were none or only very few data points in that cell in the Excel spreadsheet.
Perhaps.
But that is drilling into the noise and ignoring the signal.
The signal turns out to be there and it is quite unambiguous: computing a Pearson correlation across all data points between the moon-landing item and HIV denial reveals a correlation of -.25. Likewise, for lung cancer, the correlation is -.23. Both are highly significant at p < .0000…0001 (the exact value is 10 -16, which is another way of saying that the probability of those correlations arising by chance is infinitesimally small).
These paragraphs are about as wrongheaded as anything you’ll ever read.
I agree that a simple “Pearson correlation” between CYMoon and CauseHIV in Lewandowsky’s dataset is -0.25. However, Lewandowsky is COMPLETELY wrong in his suggestion that this “signal” can be separated from outliers. In the Lewandowsky dataset, there were two respondents that purported to believe in CYMoon and disagree with CauseHIV (both were in Tom Curtis’ group of two super-scammers). I’ll show that these two superscammers make major contributions to the supposed “correlation”. Like Lewandowsky, I don’t believe that these two respondents are present “by chance”: I believe that they are present as intentionally fraudulent responses.
Add comment
Newest comments at top. Before posting a comment, please read the Terms of Service (click here). Long links are shortened but still work.
PLEASE report all spam/inappropriate comments using the 'Report to administrator' link. If you find your post gone, it's because you violated the TOS.

Comments
If you have to try that hard, maybe it just aint so....or would that be expecting too much?