More maths - Archaeology Blogs

Last time I finished with this matrix of scatter-plots, ordered by the magnitude of correlation. But what does it actually mean? Lets take a step back, and look at those derived variables. I ask R to describe the table of variables that I created previously, which include the notional ludic.interest variable and the Hard, Serious, Easy and People fun preference variables. These are handily additional columns created by R on the end of the table of original data, so I ask R to describe just those columns:

> describe(newdata[90:94])

This gives me a little table describing the variables. It’s where the mean values I quoted last week came from. Looking at it again this week its interesting to note the ranges of some of the scores, but the first thing I notice is that the Standard Deviation (SD) of the ludic.interest variable is noticeably lower than the fun preference variables. Those range between 15.31 for the Hard fun variable, and 16.75 for the Serious fun variable. While the ludic.interest variable is 11 (actually 0.11, but remember that the other fun variables are between 0-100 and ludic.interest between 0-1). The range of score for ludic.interest is tighter too:

VARIABLE	RANGE
ludic.interest	51
H	66
E	76
S	89
P	76

The Serious fun preference questions thus showed the most division among gamers. What’s particularly interesting is that the lowest score in that range is zero, so at least one respondent vehemently disagreed with all the statements associated with that preference. The same is true of the People fun variable.

That matrix at the top of the post suggests that despite (or because of?) the wide range of the Serious fun variable, its one that shows some correlation with all the other variables. Stronger correlation, in fact, than the People fun variable, which correlates poorly with the all the variables except Serious fun.

Lets look at that in more detail. The Serious fun variable correlates most with the Easy fun variable, the value of the correlation coefficient (r) = 0.52, plot the two variables with a regression line and it looks like this:

Not a bad, shall we say “moderate” relationship. For every point up the Easy fun preference scale somebody scores, they are likely to score 0.54 higher on the Serious fun scale. With a standard error of 0.09 the T value for this relationship is 5.9, and the corresponding p value is very low at 0.00000006. So this appears to be a statistically valid relationship.

(You can see that respondent who disagreed with all the Serious fun statements in the bottom left, they weren’t that keen on Easy fun either, but at least scored 22 for that. Looking at the table of data I find its the same respondent who also disagreed with all the People fun preferences, and scored 43.5 for hard fun, and 33 (0.33) for ludic.interest.)

Lets compare how that looks with the plot for the relationship between preferences for Hard fun and People fun, where the correlation coefficient is just 0.02:

Hardly any relationship at all then.