Statistics - Doing the right thing - (Nov/12/2012 )
Hi, first post in a long time, so I hope this goes in the correct subforum. Otherwise, please move it.
I'm currently working on my first first-author manuscript in plant molecular biology with focus on immunity, and for a single set of data I'm having trouble with statistics. Experimental setup is as follows:
Infiltrate bacteria at a known concentration into the tissues of a wild type plant and an extremely susceptible control mutant, but also into my 4 mutant genotypes of interest. Wait 3 days, and then count bacterial growth after 3 days. The susceptible mutant is along as a control to show that there is (potentially extreme) growth over the wild type. Extremely simple, but effective and widely used in the field of plant immunity.
These data are always represented on a graph where the numbers are log10 transformed, as they reach up to 10^8 in numbers, and make for nice visual separations.
Now, when I do an ANOVA test and Tukey's test on the raw numbers of several replicates of each genotype, I get no statistical difference when you compare the wild type to my four mutants of interest. The susceptible mutant is significantly different to all other genotypes.
If I then do the tests on the log10 transformed numbers, then most of my mutants come out statistically significant compared to the wild type. I assume this is wrong, as log10 is not a linear transformation and it is thus skewed - Is that correct? The problem is, when you look at the literature, they all seem to do the statistics on the log10 transformed numbers.
So, do I follow the herd and do what everybody else does, and do it wrongly, or do I do it right and loose out on good arguments/citations? Any comments are welcome, especially on the statistical part.
Well you should try out a box-plot to see how the data look (skewed or not) and what distribution they have (could also test for normality e.g. with the Kolmogorov–Smirnov test). And also do this with the transformed data, to see if the data then fit more or less to the assumptions of an ANOVA). Anyway as you have count data you have to transform them, because count data don't have a normal distribution, but a Poisson distribution. If the log is the right one, you'll see then (there are also other possibilities for transformation).
And if you use another transformation as the results are better you can easily justify this then hat you tested it and the results for log-transformation were not good enough.
Btw. Zar recommends a square root transformation for count data, so you might try out this too.
and this is the right section: http://www.protocol-online.org/forums/forum/55-bioinformatics-and-biostatistics/
(but I'm not a mod to move it and then delete this post)