The average Australian household is worth $1 million. The number of crimes recorded in Australia has reached a record high. Polls predict a comfortable Labor victory.
Headlines such as these are not wrong but … drill into the information behind them and it becomes clear they're not quite what they seem.
Average does not necessarily mean typical, for instance, and the number of crimes may well have risen in line with population growth, as you'd expect. And political polls? Well, their fallibility is well documented.
What these headlines have in common is that they seek to tell stories based on statistics, which are drawn from data. Yes, data, that buzziest of buzzwords in an online age where it seems everything we do produces information.
Being able to interpret data and statistics is a handy skill for anyone who wants to be well-informed.Most of us weigh up information every day and think about statistics a lot more than we probably give ourselves credit for.
Here are some common confusions and pitfalls in the way statistics are expressed – as well as some things to keep in mind whenever you stumble on a headline that causes you to raise an eyebrow.
Mean or median?
“Australians, on average, are now millionaires.”
Sounds good, doesn’t it?
But “on average” is a phrase ripe for misuse, especially if it’s not qualified by one of these two words: mean or median.
The headline above came from Australian Bureau of Statistics research into economic indicators of wealth. There are several different types of indicators and they lead to quite different results. To calculate the mean, the bureau pooled the net worth of Australian households and divided the result by the number of households counted – and came up with $1,022,200.
Household net worth includes the total amount of property, assets, shares and savings owned by all the people who live under one roof. It’s not right, though, to call a household net worth of $1.02 million typical.
Take, say, five households of varying wealth all living in a cul-de-sac, including one that’s worth heaps more than the rest because it's headed up by a multi-billionaire.
To calculate the mean household worth in this cul-de-sac, pool the net worths and distribute the total evenly among the total number of households, as the ABS did with its calculation.
Now, the average net worth of all five households in our example surges to $3 billion. In other words, one rich person’s presence makes a billionaire’s row out of an otherwise humble cul-de-sac – on paper. But the typical household in that cul-de-sac is not worth $3 billion bucks.
So talking about the mean doesn’t help much when there are big differences in what is being compared – four modest households plus a fabulously wealthy one. Talking about the mean does help if we are referring to mean life expectancy or the mean weight of Australians because the variations are less drastic – you can't be 200 years old or weigh 10 tonnes.
So how would you come up with the typical net worth of an Australian household?
Look at the median*. The median is another economic indicator of wealth, which also appears in the bureau's data tables. It's what you get if you take all the things you've counted and put them in a line from smallest to biggest. The one at the exact midpoint is the median value.
So, while the mean household net worth in Australia in 2017-18 was $1,022,200, the median household net worth was $558,900.
That means that half of Australian households are worth more than $558,900 and the other half are worth less.
As for that headline? If you dig into the data further, it turns out that about 30 per cent of Australian households have a net worth of more than $1 million. But “roughly a third of Australians are now millionaires” wouldn’t have quite the same ring to it.
Bottom line: when we tell stories based on numbers, we can choose our own adventure depending on which measure we look at. Often, taking several measures into account will give a fuller picture.
For example, the economic indicators of wealth that the bureau provides also include the gap between rich and poor – indicated by the Gini coefficient, a measure of inequality named after its creator, statistician Corrado Gini. The bureau's figures show that the gap has widened over the past decade. And yet another indicator shows that a higher proportion of households have taken on a large amount of debt.
* A story on data wouldn't be complete without at least one cameo appearance from an asterixed footnote but, as with anything involving statistics, it's worth reading the fine print. Looking at the median isn’t always the best rule of thumb. If you’ve ever looked through a list of median house prices in each suburb and wondered why some areas are missing, it’s because the median figure is not published unless a certain number of properties changed hands. Otherwise, the median figure might not be representative of the area.
Cause or correlation?
Smoking causes lung cancer – that’s an uncontroversial statement, nowadays.
But for a long time, while doctors had a hunch that people who smoked cigarettes were more likely to end up with breathing difficulties in later life, proving it was another matter.
The most confident statement a doctor could make went like this: people who smoke have also exhibited a higher rate of lung cancer than the rest of the population.
In other words, there was a correlation between smoking and lung cancer – they seemed to happen together an awful lot – but until more research was done, it wasn't possible to know whether smoking actually caused lung cancer.
This lack of a causal link left plenty of wriggle room to suggest that “confounding factors” (another term found in statistics and health studies) were really behind cancer in smokers.
After all, it might be the chemicals in cigarette lighters that made smokers ill (that theory was put forward). Or, as an Age editorial suggested, smog and bad-quality tobacco – both found in other countries – could make Australia a better place to be a smoker.
It’s easy to snicker with the benefit of hindsight. But it’s by testing a theory through experiments (and ruling out other possibilities) that it gains scientific acceptance and it becomes clear that what was considered a correlation is actually a cause.
The type of medical study designed to find causation is a randomised controlled trial, in which people randomly receive one of several interventions, which may include a placebo. Other studies, such as retrospective or observational studies, can only show correlation.
And it goes the other way too. Sometimes researchers (and humans in general) are too hasty to assert that two things are connected when, in fact, they don’t have any bearing on a phenomenon at all.
For example, earlier this decade data scientists were adamant that real-time Google search data could be used to predict where flu outbreaks were happening. They reasoned that if people started to fall ill, they would search for flu symptoms on the internet. Google engineers published a much-lauded scientific paper on the topic. But when the theory was put to the test, the Google search data didn’t match up with the recorded flu cases at all.
Scientists are well aware of this. They will often devote sizeable sections of their papers to unpicking evidence of causal links and will tend to include a refrain that more research is needed to say that x causes y.
But questions of correlation and causation are not restricted to science. Every day, countless social media posts and opinion articles argue that there is a direct connection between two disparate things. A popular genre is how Millennials are responsible for everything that is wrong with the world.
When is a poll not as precise as it might seem?
If you see a poll result that says Prime Minister Scott Morrison has a 50 per cent approval rating, what it is really saying is that his approval rating most likely lies between about 47 per cent and 53 per cent.
All polls and surveys (and a lot of studies) come with a degree of uncertainty baked in. It’s just that they often end up sounding so very precise.
Consider this population projection for Australia from the Bureau of Statistics: 49,226,089 in 2066.
That’s very precise.
But this number is not presented as a crystal clear prediction (it would be madness if it were). It has caveats and qualifiers built in.
The figure sits somewhere in a range of values of what the population could be in 2066, based on current trends. The figures are also, by the bureau's own admission, based on assumptions about high rates of fertility and immigration – other projections put estimates considerably lower, at about 38 million.
By their nature, predictions are fraught – in life, in politics and in statistics. So is dealing with uncertainty.
Federal political polls typically survey about 1500 people from all backgrounds, age groups and walks of life. The aim is that they mirror the general population – but when about 1500 people are supposed to be a microcosm of Australia’s population of 23 million, it's easy to see how the poll’s findings will not be exact.
The uncertainty of a number (such as a poll result) is expressed in what’s called its confidence interval. This tells us there is a 95 per cent chance that the result lies within a range specified by the researchers or, in this case, pollsters.
So, back to Scott Morrison. In federal polls, approval ratings have a confidence interval of about three percentage points. That’s why if a poll says the Prime Minister has a 50 per cent approval rating, it means there is a 95 per cent chance his approval rate lies somewhere between 47 per cent and 53 per cent.
There are lots of other reasons why poll results come into question, but that's a whole other story.
Rates or raw numbers?
Every city has its exceptional suburbs. In Melbourne it's Werribee, in Sydney it's Liverpool.
Werribee is the most multicultural place in Victoria (as in, it’s home to people from more countries than anywhere else), it is home to the most child care workers, the most Richmond Football Club members and the most online shoppers. It’s also home to the most people living on welfare and the most bankrupts.
Liverpool is the busiest place in New South Wales for online purchases, the number one spot for first home buyers and among the areas with the most gun owners, people on welfare, disabled parking cheats and homeowners defaulting on their mortgages.
What makes Werribee and Liverpool stand out so much? Both are now among the most crowded suburbs in Australia.
That means that whenever anybody puts together a list of the top 10 suburbs with the “most” of something, Werribee and Liverpool tend to feature.
But this doesn’t tell us anything meaningful until we put things in perspective by adjusting the figures for population.
There are several ways we can do this. One is by calculating the rate.
By expressing what is being counted as a rate we level the playing field. We can see how many Richmond supporters there are for every 1000 people in Werribee against how many Richmond supporters there are for every 1000 people in a different suburb.
Once we adjust for population size we find that there isn’t anything unusual about Werribee when it comes to its spread of Richmond supporters – or debtors or welfare recipients. But it still has one of the highest percentages of child care workers, owing to the area being home to so many young, working families.
Rates are also handy if the population is changing. When we report on crime statistics, we don’t tend to focus on whether there has been an increase in the number of crimes reported because the number of crimes recorded tends to increase each year in line with population growth. Once we adjust for population growth, we get a much clearer picture of whether crime is getting worse or better.
For example, the number of crimes recorded in Victoria increased by 5000 last year (or about 1.3 per cent). That might sound like a “crime wave” but when we adjust for the increasing population, the actual rate of crime decreased slightly (about 1 per cent) over the same period.
However, rates can be deceptive, particularly if the population size is really small. For example, in an area with a population of eight people, all it would take is for a few more crimes to be reported and, on a rate per 1000 basis, it could become the crime capital of a state.
Feeling lucky? Risk and percentage rises
Let’s say I had a vegan coffee cart.
If I told you that sales at my vegan coffee cart had surged by 400 per cent in the past month you might think I’d tapped into the next big thing in refreshments.
After seeing this information in a slick-looking graph, you might even consider investing in my burgeoning coffee cart franchise.
But you would need these two crucial pieces of information before you pitched in: how many cups of coffee I sold last month, and how many cups of coffee I sold this month.
Because, while the percentage change leads to vegan coffee sales skyrocketing on the chart, it doesn’t tell you the actual numbers behind that – the scale of the operation.
If it turned out that my coffee cart’s turnover was actually one cup in September and five cups in October, well … pass the full-milk latte.
Percentage changes can be even more slippery when it comes to risk.
Consider this entirely fictional statement: “People who drink lots of coffee risk being eight times more likely than a typical person to have their hair turn pink.”
"Eight times more likely" is a relative risk that applies to coffee drinkers: relative to a typical person, the risk is eight times higher. But this doesn’t mean much on its own. We’re missing a crucial piece of information – what is the risk of hair turning pink for a typical person, whether they drink coffee or not?
It might be that the typical risk of this happening is one in 1000. This is known as the absolute risk. That means that for every 1000 people, one person will undergo a dramatic shift in their hair colour at some stage of their lifetime.
It follows, then, that a heavy coffee drinker has an eight in 1000 chance of their hair turning pink.
Here’s a visual way to think about it. Each pink square is a person whose hair turned pink, while the white squares are people who didn’t. If the grid was completely pink that would mean your chances of getting this condition would be certain, while if the grid was blank there would be no chance of it happening.
Here’s what the pink hair risk looks like for a typical person and a coffee addict side by side, based on one in 1000 and eight in 1000:
If you’re reading this article on mobile, you might have had to squint to see the difference between the two grids. Even though, in our faux example, coffee drinkers have an eight times higher risk of their hair changing colour, most of the squares in the grid are blank – irrespective of whether you drink coffee or not, you are far more likely to go your entire life without your hair turning pink.
If an absolute risk is very low, as in our example here, big increases in relative risk don’t tend to make that much difference. But if an absolute risk is very high (such as the risk of catching the flu during winter) then any increase in relative risk (say, because you're not washing your hands) can be very serious indeed because many more people will be affected.