Narrative description of skewed data - Statistics/quantification (Feb/13/2009 )
I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?
Thanks.
I think that it depends on the group size (n).
For an example, if it's to small you can't do a t-test.
We once did a Wilcoxon test to compare arbitrarily samples.
But the best thing to do is to go to a statistician for help.
who_throws_a_shoe on Feb 13 2009, 04:35 PM said:
Thanks.
Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.
"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
hobglobin on Feb 18 2009, 11:30 AM said:
who_throws_a_shoe on Feb 13 2009, 04:35 PM said:
Thanks.
Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.
"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:
"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"
In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :
"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"
but this seems a bit clumsy.
Any thoughts?
Thanks so much
who_throws_a_shoe on Feb 18 2009, 06:58 PM said:
hobglobin on Feb 18 2009, 11:30 AM said:
who_throws_a_shoe on Feb 13 2009, 04:35 PM said:
Thanks.
Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.
"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:
"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"
In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :
"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"
but this seems a bit clumsy.
Any thoughts?
Thanks so much
You could give the confidence interval in which e.g. 95% of the measurements are included; it's used frequently for such skewed data.