Replace samples with mean? - (Jul/05/2015 )
Hi all,
I would really appreciate some suggestions concerning an issue.
If I have a dataset with 3 groups, n=6/group originally. Lets say I delete 1 sample from each group because they are outliers. After this move, when I have n=5/group, may I replace this one deleted data with my mean for the 5 samples/group, thus I would have n=6 again? I have a marginal non-significant result p=0.053 and if I do so it reaches significancy. I saw this approach several times used by experimental researchers working with low number of individuals or animals pro group, but I could not tell whether its correct.
Thank you very much for your kind help in advance!
No; a desire to reach significance is not a good reason to substitute data. It’s bad enough deleting one sample from each group in the first place.
The only occasion I can think of when it is appropriate to substitute a data point with a mean is when a single data point is missing, for instance an animal died unexpectedly, and the experimental design requires absolute balance between treatments.
I agree with DRT. Deleting a value and replacing it with the mean of all values will skew your data, especially with a small n, this gives you a false result that bears no resemblance to the actual data. For instance, how do you know that the one that you removed isn't the beginning of a pool of individuals (1/6 = 17% of the total population) that are non-responders to the treatment - if you then remove these and replace with the mean, your data will be very different to what is the actual situation!
You can't increase your confidence in a result by making things up -- which is essentially what you propose. This is absolutely a bad idea, and just shows how badly skewed our idea of experiments is when people with go to these lengths to hit a magic p=.05 number, when most of those numbers mean very little. If you have good reason to throw out the outliers, fine, but tell us why. I'd be happy to see a p value before and after they are discarded. I think there is little shame in a p value near .05 (as you have), and it would be far far better to report it that way than to make up ways of artificially getting a lower number.
Thank you very much. Actually the p value does not change so much if I don't discard p=0.55 before and p=53 after....This is technically a complex procedure where you can get some postop complications even if the procedure is good. I have excluded individuals where some complication might influenced the reproducibility of the measurement.
Thank you, then I will avoid this mean imputation procedure. I report as it is. I heard and saw it multiple times that researchers were using this technique, but i never got the opportunity to discuss it with others.