As I mentioned in a Comment, it would be best to have results for individual
subjects in order to do an analysis that uses all the information available. (If you intend
to publish your results in a journal of quality, referees may insist on that.)
However, it seems that an analysis of your data in its present form, based on fairly simple confidence intervals,
is sufficient to show differences among your three time periods.
Initial survey. The fraction of users stereotype words on the first test is $p_1 =123/158 = 0.778.$ Based on that, a 95% Agresti confidence interval for the true
population proportion is $(0.772, 0.836).$
Final Survey. Similarly, a 95% Agresti confidence interval for the true population
proportion based in $p_3 = 54/158 = 0.342$ is $(0.346, 0.419).$
The two confidence intervals are very far from overlapping. So there is
there is good evidence that the fraction of subjects using stereotype
words has decreased.
Agresti confidence intervals. In general, here is how to make an Agresti confidence interval (CI) if
there are $X$ events in $n$ trials:
Let $\tilde p = \frac{X+2}{n+4}.$ Then the CI is of the form
$$\tilde p \pm 1.96\sqrt{\tilde p(1-\tilde p)/(n+4)}.$$
Bonferroni comparisons. For a 98.3% Agresti CI, the numerator of $\tilde p$ is $X + 2.264$
and the denominator is $n + 4.53.$
Then the interval is of the form
$$\tilde p \pm 2.12\sqrt{\tilde p(1 - \tilde p)/(n+4.53)}.$$
If you are going to compare three confidence intervals and want an overall error probability of 5%,
then you should use 98.3% CIs. Of course, 98.3% CIs will be a little longer than
95% CIs, but I think not enough longer to cause intervals for your
three time periods to overlap. (This is known as the Bonferroni method.)
If you want more on the 'Agresti` and 'Bonferroni' methods, you can look
in a recent intermediate level applied statistics text or google the names.
Also, for more on Agresti CIs see this page.