Ratio Statistics - (May/05/2010 )
Hi Everyone,
I need some advice about ratio statistics. I am working with worms and doing a certain treatment which increases their chance to survive an environmental stress. I can create the same stress in the lab. The worms which do not get the treatment barely survive the stress while those who get the treatment are quite fine.
So, I work in 4 replicates and in each replicate I have 100 - 200 worms randomly chosen from a bigger homogeneous population (~ 2000-3000 worms). After the whole procedure (treatment/no treatment followed by stress) I count the number of survivors and divide this number to the total number of worms in that replicate. Then I get a percentage ratio of survival.
What I'd like to do is to compare two survival ratios (treated vs. non-treated) in order to determine if the treatment has an effect. However, since my variables are proportions, they are not normally distributed, so I cannot do a t-test directly. I tried to apply arcsine-square root transformation (y = arcsin(sqrt(x))) (also for determining the 95 % CI), then use the transformed values to do the statistics, but I'm not sure if this is the proper method, either.
I would appreciate if you can give me some suggestions. Just the mathematics would be OK since I can implement it in R or Excel.
Thanks a lot.
not sure where the arcsine-square root transformation fits into your question, but I think a simple Chi-square test would be appropriate for the data you describe.
For excel you can use the Chitest function, but you need to arrange your data correctly to get an intelligible result. Here's how:
Lets say you observe two groups of 100 worms(200 total); one group is treated, the other is not. after the stress, you have 50 survivors=(0.5) in the 'control' group and 60 survivors in the treated group (0.6).
1) Your null hypothesis would be that treatment does not improve survival; Alternative would be that it improves survival.
2) You have the number of survivors per total worms after the stress is applied; this is your 'control' or 'expected' population.
you also have the number of survivors per total worms after stress+treatment; this is your 'experimental' or 'actual' population
3)Arrange your possible outcomes (Control/live, Control/dead, treated/live, treated/dead) into a 2x2 contingency table in excel:
Live | dead
Control | 50 | 50 This row is the 'Expected range'
Treated | 60 | 40 This row is the 'Actual range'
4) in a cell, type "=CHITEST('Actual Range', 'Expected Range'). The value returned (in this case 0.0455) is the probablility that these two groups are from the same population; If you accept the difference as significant when p<0.05, then you can rightly reject the null hypothesis
In other words, the treatment significantly improved survival.
Make sense? If not, I've attached excel template using the above example; you can plug your data in to it. Good luck
Hi Jah,
First, thank you for your reply. It totally makes sense to prove the effect of the treatment under normal conditions. However, I don't know if I can apply chi-square to this problem because these worms are not individually handled. Instead, in groups of 100-200, 4 groups are simultaneously treated and stressed. Doesn't this violate the requirement for the independence of data?
At the end, what I have is a slightly different survival ratio for each technical replicate. When I calculate the mean and the standard error, I can get an estimate of this survival ratio. This is actually what I'd like to compare. Do you think it's possible?
jah on May 5 2010, 10:36 PM said:
For excel you can use the Chitest function, but you need to arrange your data correctly to get an intelligible result. Here's how:
Lets say you observe two groups of 100 worms(200 total); one group is treated, the other is not. after the stress, you have 50 survivors=(0.5) in the 'control' group and 60 survivors in the treated group (0.6).
1) Your null hypothesis would be that treatment does not improve survival; Alternative would be that it improves survival.
2) You have the number of survivors per total worms after the stress is applied; this is your 'control' or 'expected' population.
you also have the number of survivors per total worms after stress+treatment; this is your 'experimental' or 'actual' population
3)Arrange your possible outcomes (Control/live, Control/dead, treated/live, treated/dead) into a 2x2 contingency table in excel:
Live | dead
Control | 50 | 50 This row is the 'Expected range'
Treated | 60 | 40 This row is the 'Actual range'
4) in a cell, type "=CHITEST('Actual Range', 'Expected Range'). The value returned (in this case 0.0455) is the probablility that these two groups are from the same population; If you accept the difference as significant when p<0.05, then you can rightly reject the null hypothesis
In other words, the treatment significantly improved survival.
Make sense? If not, I've attached excel template using the above example; you can plug your data in to it. Good luck
I think you may be getting hung up on terminology. Indeed, your data can be expressed as a 'ratio' (e.g. % survival), however in statistical terminology, 'Ratio measurement' statistics would not be appropriate for use in your design. Ratio measurements statistics compare are used ratios that are based on tangible/measurable quantities (e.g. miles per hour) samples. Your data do not satisfy this criteria.
As your outcome data are binary (Survival or Death), the data cannot have a central tendency, and cannot be described by statistics that assume such a distrubution (mean/media). By that same logic, calculating mean/SEM or SD are not really appropriate descriptive statistics for your design.
Rather, your data relate frequency of observing a qualitative outcome; this type of is called categorical or nominal data. Your dependent variable is the frequency of an event (survival vs. death); the independent variable is the treatment (StressY +/- Compound X).
With nominal the most appropriate descriptive statistic is the mode; For binary data such as yours, the mode of the independent variable is equivalent to the dominant observation. To assess the probability that the two indpendent variables result in a differnet frequency of the dependent outcome, the Chi-square test is the most appropriate.
Independence vs. internal replicates....That's a point of philosophical argument that I have seen argued both ways. *Caveat* I do not work with worms. In my practice I would consider your design as described as using internal replicates (e.g. a larger sample n) rather than independent replicates from repeated experiments. It is expected that you have some variability from plate to plate (imprecision), which can be expressed as a coefficient of variabilty (CV%) within the experimental/internal replicate. But this speaks more to the error innate in the experimental precedure than to the difference between treatments. Ideally the CV between normal and treated groups would be similar and small; if dissimilar, you need a larger n, if large you need both a larger n and more consistent experimental conditions.
Hope that helps!
Well, the terminology is really complicated in statistics. You're right that I'm confused.
If I got it correctly, you say that what I have as survival ratios are actually frequencies. So, instead of turning them into percentage survival rates, I can use the number of dead vs alive worms with and without the treatment to apply a chi-square. I already tried this and looks like it works.
What I want to ask now is, while applying the chi-square, shall I choose one of these 4 replicates in the control group and compare it with one of the replicates in the treatment group or am I allowed to sum all the 4 controls to make a single group (now that I don't need the variance in the group anymore) and compare it to the sum of the treatments?
On the other hand, I also have to show (visually) these ratios somehow, with a measure of variability if possible. I was reporting them like mean survival ratio (%) +/- SEM (%) but now, do you suggest that I should use CV % (SD / mean * 100%) instead of the SEM? Then, if the mean and the SD do not make any sense, how will this help me? Can I simply make a box-plot using the % survival ratios of the 4 replicates?
Thanks again and again
jah on May 6 2010, 04:00 PM said:
As your outcome data are binary (Survival or Death), the data cannot have a central tendency, and cannot be described by statistics that assume such a distrubution (mean/media). By that same logic, calculating mean/SEM or SD are not really appropriate descriptive statistics for your design.
Rather, your data relate frequency of observing a qualitative outcome; this type of is called categorical or nominal data. Your dependent variable is the frequency of an event (survival vs. death); the independent variable is the treatment (StressY +/- Compound X).
With nominal the most appropriate descriptive statistic is the mode; For binary data such as yours, the mode of the independent variable is equivalent to the dominant observation. To assess the probability that the two indpendent variables result in a differnet frequency of the dependent outcome, the Chi-square test is the most appropriate.
Independence vs. internal replicates....That's a point of philosophical argument that I have seen argued both ways. *Caveat* I do not work with worms. In my practice I would consider your design as described as using internal replicates (e.g. a larger sample n) rather than independent replicates from repeated experiments. It is expected that you have some variability from plate to plate (imprecision), which can be expressed as a coefficient of variabilty (CV%) within the experimental/internal replicate. But this speaks more to the error innate in the experimental precedure than to the difference between treatments. Ideally the CV between normal and treated groups would be similar and small; if dissimilar, you need a larger n, if large you need both a larger n and more consistent experimental conditions.
Hope that helps!
I think that you have it correctly. It seems to me that it would be appropriate to sum the 4 internal replicates, however a graphical presentation with error bars would not be relevant because you'd only be looking at one sample. If you had independent replicates that would be another story; If it is possible to do independent replicates in the future, that would be ideal.
For example, look at a large clinical trial where the chi-square analysis is used; they almost never use a graphical presentation of nominal data, and do not mention variance. This is largely because it would be impossible or impractical to sample multiple populations and compare them as replicates. But in these types of studies, the statistical power lies in the number of subjects - in your case individual worms.
So, does it mean that I also cannot use binomial proportion confidence intervals?
jah on May 6 2010, 08:00 PM said:
For example, look at a large clinical trial where the chi-square analysis is used; they almost never use a graphical presentation of nominal data, and do not mention variance. This is largely because it would be impossible or impractical to sample multiple populations and compare them as replicates. But in these types of studies, the statistical power lies in the number of subjects - in your case individual worms.
Maybe OR (Odds ratio) and RR (Relative Risk) is useful to solve your problem.
If you want to combine 4 independent experiments, you can use a meta-analysis method. If you need more help, I am happy to be your collaborator.