tutorial outline


"there are three kinds of lies: lies, damned lies,
and statistics" - Disraeli

Many people working in CSCW have had some formal statistics training or have used statistics in practice. They may perfom experiments or gather data themselves, read or review other's work or teach others about CSCW. This tutorial aims to complement their experience, concentrating instead on the understanding of randomness and common statitstical concepts.


It's easy to forget just how random the world is. We know that things won't come out exactly at their average value, but think they won't be far out. Even with a long standing knowledge of random phenomena, I still get surprised sometimes at how far from uniform things are. In this first part of the tutorial, we'll try some experiments to see random phenomena at - raindrops falling on the plains of Gheisra and coin tossing races.

We are very good at finding patterns in data ... so good, we even see patterns when none are there. Often experimental results are misinterpreted because randomly occurring patterns are regarded as indications of real underlying phenomena.

Over uniform results have usually not occurred by chance, but instead because of some systematic effect or human intervention. Statisticians have re-analysed Mendel's results which established genetic inheritance and also the experiment which established the fixed electron charge. In both cases the results were too good to be true. A systematic process had been at work - the experimenters had discarded those results which disagreed with their hypothesis. In fact, the results they discarded would have been simply the results of randomness making some experiments run counter to the general trend. This is quite normal and to be expected. So, don't try to fiddle your results - you will be found out!

pdf slides (128K)

finding things out

Randomness causes us two problems. First, as we discussed in the last section, we may see patterns that aren't there. But also, if there are patterns in the world, we may fail to see them. The job of statistics is to help us see through this randomness to the patterns that are really in the world

The primary way this is done in statistics is to use large numbers of things (or experimental trials) and different forms of averaging. As one deals with more and more items the randomness of each one tends to cancel out with the randomness of others, thus reducing the variability of the average.

The advantages of averaging are based on the cancelling out of randomness, but this only works if each thing is independent of the others. This condition of independence is central to much of statistics and we'll see what happens when it is violated in different ways.

Averaging on its own is not sufficient. We have to know what is the right kind of averaging for a particular problem. Then, having found patterns in the data, we need to be sure whether these are patterns in the real world, or simply the results of random occurrence. We'll look at these issues in more details in the rest of the tutorial.

pdf slides (129K)

measures of average and variation

Even for straightforward data there are several different common forms of averaging used. Indeed, when you read the word 'average' in a newspaper (e.g. average income) this is as likely to refer to the median of the data as the mean. Actually, the reason the arithmetic mean is so heavily used is due as much to its theoretical and practical tractability as its felicity!

Statistics does a strange sort of backwards reasoning, we look at data derived from the real world, then try and extract patterns from the data in order to work out what the real world is like. In the case of means, we hope that the mean of our collected data is sufficiently close to the 'real' mean to be a useful estimate. We know that bigger samples tend to give better estimates, but in what sense 'better'.

In statistics, better usually means 'less variation'. Again there is no single best measure 'variation', but the most common solutions are the inter-quartile range, the variance and standard deviation (). Of these the first is useful, but not very tractable, the second is very tractable (you can basically add up variances), but is hard to interpret, and the last both reasonably tractable and reasonably comprehensible - that's why is normally quoted!

We'll have a look at the square root rule for how averages get 'better' as estimates and also at the problem of how to estimate variation - a different sort of averaging.

pdf slides (115K)

proving things

Many reported experiments in HCI and CSCW journals (and other disciplines) end up with a statistical significance test at 5%: if it passes the result is 'proved', if it fails - well ... we'll come back to that!

Proof in statistics is about induction: reasoning from effects back to causes. In logic this is the source of many fallacies, but is essential in real life. The best one can say in a statistical proof is that what we see is unlikely to have happened by chance. Although you can never be entirely certain of anything, you can at least know how likely you are to be wrong. A 5% significance means that you are wrong one time in twenty - good enough?

Significance tests only tell you whether things are different. They don't tell you whether the difference is important or not. Some experiments may reveal a very slight difference others may have such high variability that even a huge difference would not be statistically significant. Understanding the relationship between variability, real underlying differences and statistical significance is crucial to both understanding and designing experiments.

Disturbingly often academic papers in HCI and CSCW use a lack of significance to imply that there is no underlying effect. In fact, you can never (statistically) prove that things are identical. Statistical insignificance does NOT prove equality!! The proper way to deal with equality is a confidence interval which puts bounds on how different things are.

pdf slides (143K)

design and test

There are two enemies to statistical proof. First is variability - the results may be lost in the randomness. The second is aliasing - the results you measure (and check to be statistically significant) are actually due to some other cause.

The first problem, variability, is to some extent intrinsic and in the final analysis can only be dealt with by increasing the number and size of experiments as we have discussed in previous parts. However, some of the variability may be due to factors not intrinsic to the thing being measured: in HCI and CSCW experiments principally differences between people. If such factors are randomly allocated, they may not affect the overall result, but will certainly increase the effective variability.

The second problem, aliasing, is even worse. These additional factors may give rise to spurious results if, for example, all the most expert users try out one design of software.

Careful experimental design can fix, randomise or cancel out these additional factors. Hence reducing the likelihood of aliasing and making it more likely that real differences will show up as statistically significant.

Finally, having run an experiment, if you then use the wrong statistical test, then at best real differences may be missed or at worst apparently significant results may in fact be spurious.

pdf slides (57K)

experiments in HCI and CSCW

Experiments in HCI and CSCW involve that most variable of all phenomena, people. The great danger in any experiment in HCI and CSCW is that the results are analysed at the end and no statistical conclusion can be drawn. We'll discuss how to avoid this disaster situation. This influences the choice of what to measure as well as the way experiments and constructed and analysed. Furthermore, it is often impossible to use sufficient subjects to obtain statistical results. We'll discuss how to combine different kinds of experimental data - quantitative, qualitative and anecdotal - in order to make sure that even small experiments have a useful (and publishable!) output.

pdf slides (27K)

tutorial outline

tutorial notes

7 common errors

book list

rainfall in Gheisra

two-horse races

more coin tossing!

HyperCard demo

other tutorials by Alan

about the stats tutorial

links elsewhere



Alan Dix 23/4/2001