-1

I have a dataset that holds the number of times a group of people used a type word in a survey at three different data collection points:

People in Group - 158

People who use a word at T0 = 123

People who used a word at T1 = 65

People who used a word at T2 = 54

Does anyone know what types of statistical analysis I can do in order to work out the statistical significance of the drop from T0 to T2?

(T0 is before an intervention, T1 directly after and T2 3 months later)

  • Missing context: What does it mean for a 'group to use a type of word' at a particular time. At $T_1$ did 65 people use such a word, or did this count arise from a transcript of a group discussion or from a group report? Are the three time periods comparable as to the opportunity to use such words? Do you believe enough time elapses between sessions that the sessions could be considered independent? I guess what I'd most like to know is how many Individual subjects of the 158 used such a word at the first session, but not at the last? – BruceET Feb 05 '18 at 22:04
  • ... Your data show three observations on one group. A more powerful dataset would describe behavior across time if 153 individual people. // Also what do you mean by 'intervention'? Are you trying to change word use? Or in word use thought to be an indicator of something else? Why are you keeping track of word use and not, say the kind of shoes they are wearing? – BruceET Feb 05 '18 at 22:11
  • 1
    I surveyed a group of children asking them to use 6 words to describe a scientist. Each wrote their own answers down on an individual survey at the beginning of a project. Then there teachers used a 5 week teaching approach to use a different set of words to describe scientists, one that include positive attributes. At the end of the five weeks, a second survey was completed (the same as the first). Then 3 months after the teaching approach the survey was conducted again. I took all the words and categorised them by type {stereotypes, target attributes, positive sentiments, etc.}. – llewmihs Feb 06 '18 at 06:53
  • 1
    My data above shows what happened to a particular category of words over the 3 data collection points. The category was stereotypes. I can see that there was a large drop in used of these types of words but want to know how to demonstrate its significance. – llewmihs Feb 06 '18 at 06:58

1 Answers1

0

As I mentioned in a Comment, it would be best to have results for individual subjects in order to do an analysis that uses all the information available. (If you intend to publish your results in a journal of quality, referees may insist on that.)

However, it seems that an analysis of your data in its present form, based on fairly simple confidence intervals, is sufficient to show differences among your three time periods.

Initial survey. The fraction of users stereotype words on the first test is $p_1 =123/158 = 0.778.$ Based on that, a 95% Agresti confidence interval for the true population proportion is $(0.772, 0.836).$

Final Survey. Similarly, a 95% Agresti confidence interval for the true population proportion based in $p_3 = 54/158 = 0.342$ is $(0.346, 0.419).$ The two confidence intervals are very far from overlapping. So there is there is good evidence that the fraction of subjects using stereotype words has decreased.

Agresti confidence intervals. In general, here is how to make an Agresti confidence interval (CI) if there are $X$ events in $n$ trials: Let $\tilde p = \frac{X+2}{n+4}.$ Then the CI is of the form $$\tilde p \pm 1.96\sqrt{\tilde p(1-\tilde p)/(n+4)}.$$

Bonferroni comparisons. For a 98.3% Agresti CI, the numerator of $\tilde p$ is $X + 2.264$ and the denominator is $n + 4.53.$ Then the interval is of the form $$\tilde p \pm 2.12\sqrt{\tilde p(1 - \tilde p)/(n+4.53)}.$$

If you are going to compare three confidence intervals and want an overall error probability of 5%, then you should use 98.3% CIs. Of course, 98.3% CIs will be a little longer than 95% CIs, but I think not enough longer to cause intervals for your three time periods to overlap. (This is known as the Bonferroni method.)

If you want more on the 'Agresti` and 'Bonferroni' methods, you can look in a recent intermediate level applied statistics text or google the names. Also, for more on Agresti CIs see this page.

BruceET
  • 52,418
  • Thanks for that, it's very useful. I'll do as directed. I do have the data for each individual, but have little idea about how to go about processing the data. And suggestions would be a great help. For each participant I have 6 words that have been categorised into 1 of 12 groups. I have this data for each time period. – llewmihs Feb 06 '18 at 11:37