Protocol Online logo
Top : New Forum Archives (2009-): : General Lab Techniques

Narrative description of skewed data - Statistics/quantification (Feb/13/2009 )

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

-who_throws_a_shoe-

I think that it depends on the group size (n).
For an example, if it's to small you can't do a t-test.
We once did a Wilcoxon test to compare arbitrarily samples.

But the best thing to do is to go to a statistician for help.

-molgen-

-molgen-

-molgen-

who_throws_a_shoe on Feb 13 2009, 04:35 PM said:

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)

-hobglobin-

hobglobin on Feb 18 2009, 11:30 AM said:

who_throws_a_shoe on Feb 13 2009, 04:35 PM said:

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)



Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much

-who_throws_a_shoe-

who_throws_a_shoe on Feb 18 2009, 06:58 PM said:

hobglobin on Feb 18 2009, 11:30 AM said:

who_throws_a_shoe on Feb 13 2009, 04:35 PM said:

I am trying to write up some results regarding changes to non-guassian distributed data. With normally distributed data it is easy; one can just write the mean +/- SEM. What is the equivalent for non-gaussian data? I imagine one uses the median along with some measure of the spread (interquartile range? some percentile?). I've looked at a few other papers, but there doesn't seem to a consensus (some are just wrong; e.g. median +/- SEM). Any ideas?

Thanks.

Normally median should do the job. I'd use box plots with median, lower quartile, upper quartile, smallest and largest observation, outliers. Sometimes the mean is also included. It combines the most important descriptive statistics and gives an impression on the distribution of the data.


"Statistics are like a drunk with a lamppost: used more for support than illumination."
Sir Winston Churchill
:)



Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much


You could give the confidence interval in which e.g. 95% of the measurements are included; it's used frequently for such skewed data.

-hobglobin-




Thanks for your help. For my figures, I think am going to plot my raw values as a scatter plot - I have an n of about 50 per treatment, it takes up the same amount of space and it shows all the data (which, in most cases, should be encouraged!). My main issue is how to describe the data in the text. For example, in gaussian-distributed data I would most-likely use the mean and the SEM and therefore the text would include a statement like:

"the mean amplitude of the response changed from 3.2 +/- 0.2 nA in the control to 4.5 +/- 0.3 nA in the presence of the drug"

In my skewed data, to use the mean +/- SEM would be inappropriate. Is there some kind of equivalent? I could use the median, the lower quartile (Q1), and upper quartile (Q3), so the equivalent statement in the results text would read something like :

"the median amplitude of the response changed from 3.4 (Q1 3.0, Q3 3.6) nA in the control to 4.5 (Q1 4.3, Q3 4.8) nA in the presence of the drug"

but this seems a bit clumsy.

Any thoughts?

Thanks so much




Have you tried taking the logarithm of the data? It doesn't always work but if it is able to make the distribution look "normal" your problems are solved; just report the log(data) and you can use all the standard statistical methods.
Failing that, I would recommend reporting the medium, SEM, and the "skewness" for each data set. Skewness is similar standard deviation except that instead of squaring of the difference between each sample and the mean you use the cubic. For performing the statistical tests; I'd go with Molgen's suggestion and look at nonparametric tests; though I suspect that one called the Mann-Whitney would be more suitable for your data than the Wilkinson which is more analogous to a paired T-test (though I’m pushing the limits of my statistical skill here so it would pay to check with a professional)

Good luck

-DRT-