1. Factors that threaten the
validity of research findings
Material for this presentation has been
taken from the seminal article by Don
Campbell and Julian Stanley:
Experimental and quasi-experimental designs
for research on teaching,
which was first published as Chapter 5 in
N.L Page (1963), Ed., Handbook of
Research on Teaching.
2. Two classes of factors that jeopardize
the validity of research findings
• Factors concerned with internal validity.
– Do the research conditions warrant the
conclusions?
– Without internal validity results are
uninterpretable.
• Factors concerned with external validity.
– To what extent can the results be
generalized?
– To what populations, settings, treatment
variables, and measurement variables?
3. Factors affecting Internal
Validity
Internal validity is
threatened whenever
there exists the
possibility of un-
controlled extraneous
variables that might
otherwise account for
the results of a study.
Eight classes of
extraneous variables
can be identified.
• History
• Maturation
• Testing
• Instrumentation
• Statistical
regression
• Selection
• Research mortality
• Interactions w/
selection
4. History
Specific events, in addition to the
treatment, that occur between the first
and second measurement.
The longer the interval between the
pretest and posttest, the more viable
this threat.
5. Maturation
Changes in physical, intellectual, or
emotional characteristics, that occur
naturally over time, that influence the
results of a research study.
In longitudinal studies, for instance,
individuals grow older, become more
sophisticated, maybe more set in there
ways.
6. Testing
Also called “pretest sensitization,” this
refers to the effects of taking a test
upon performance on a second testing.
Merely having been exposed to the
pretest may influence performance on a
posttest.
Testing becomes a more viable threat to
internal validity as the time between
pretest and posttest is shortened.
7. Instrumentation
Changes in the way a test or other
measuring instrument is calibrated that
could account for results of a research
study (different forms of a test can
have different levels of difficulty).
This threat typically arises from
unreliability in the measuring
instrument.
Can also be present when using observers.
8. Statistical Regression
Occurs when individuals are selected for
an intervention or treatment on the
basis of extreme scores on a pretest.
Extreme scores are more likely to reflect
larger (positive or negative) errors in
measurement (chance factors).
Such extreme measurement errors are
NOT likely to occur on a second testing.
9. Differential Selection
This can occur when intact groups are
compared.
The groups may have been different to
begin with.
If three different classrooms are each
exposed to a different intervention, the
classroom performances may differ only
because the groups were different to begin
with.
10. Selection-Maturation Interaction
Occurs when differential selection is
confounded with maturational effects.
The treatment group might be composed
of higher aptitude students, or…
The treatment group might have more
students who are born during the
summer months.
11. Research Mortality
The differential loss of individuals from
treatment and/or comparison groups.
This is often a problem when research
participants are volunteers.
Volunteers may drop our of the study if
they find it is consuming too much of their
time.
Other’s may drop out if they find the task
to be too arduous.
12. Interaction of Selection with the Other
Factors Affecting Internal Validity
Occurs when intact groups, which may not
be equivalent, are selected to
participate in research interventions.
As in a previous example, three different
classrooms may be exposed to different
treatments, but one of the classroom
might be composed of students having
higher achievement trajectories.
13. External Validity
Concerned with whether the results of a study
can be generalized beyond the study itself:
1. Population validity (when the sample does not
adequately represent the population).
2. Personological validity (when personal/
psychological characteristics interact with the
treatment).
3. Ecological validity (when the situational
characteristics of the study are not
representative of the population).
14. Factors affecting External
Validity
External validity is
threatened
whenever conditions
inherent in the
research design are
such that the
generalizability of
the results is
limited.
Four classes of
threats to external
validity can be
identified.
• Reactive or
interactive effects
of testing
• Interaction effect
of selection bias and
the intervention.
• Reactive effects of
treatment
arrangements
• Multiple treatment
interference
15. Reactive effect of testing
Occurs whenever a pretest increases or
decreases the respondents’ sensitivity
to the treatment.
Studies involving self-report measures of
attitude and interest are very
susceptible to this threat.
16. Selection x Treatment
Interaction
This can occur when selected treatment
or comparison groups are more or less
sensitive to the treatment prior to
initiating the treatment (or
intervention).
Most likely to occur when the treatment
and comparison groups are not randomly
selected.
17. Reactive Effects of Experimental
Arrangements
These can occur when the conditions of
the study are such that the results are
not likely to be replicated in non-
experimental situations.
– Hawthorn effects
– John Henry effects
– Placebo effects
– Novelty effects
18. Multiple-treatment Interference
This has a likelihood of occurring
whenever the same research
participants are exposed to multiple
treatments.
– Sequence effects
– Carry-over effects
19. Research Designs
We will examine the operative threats to
internal and external validity in twelve
specific types of research designs.
Some symbols to be used:
R = Random Assignment
X = Treatment Intervention
O = Observation or Measurement
20. Design 1: One-shot Case Study
This is a widely-used research design in
education.
– A single group receives a treatment or
intervention.
– Following the treatment individuals are
measured on some outcome variable:
– It can be diagramed as follows:
X O
21. Design 1:
One-shot Case Study, Continued
• This design is typical of a case study
• Inferences typically are based upon
expectations of what the results would have
been had X not occurred.
• These designs often are subject to the error
of misplaced precision, since they often
involve tedious collection of specific detail
and careful observations.
• The problem is that there usually are
numerous rival, plausible sources of effect on
the outcome other than X.
22. Design 2:
One-group Pretest-Posttest Design
This, also, is a widely-used research design in
education (see the diagram).
A pretest is given, followed by a treatment or
intervention, followed by a posttest.
– The difference between O1 and O2 is used to infer
an effect due to X.
– This design is subject to four of the eight threats
to internal validity and one of the threats to
external validity. Can you name them?
O1 X O2
23. One-group Pretest-Posttest Design (Continued)
Threats to internal validity
1. History
Many change-producing events may have occurred
between O1 and O2 .
History is more viable the longer the lapse between the
pretest and posttest.
2. Maturation
During the time between O1 and O2 the individuals may
have grown older, wiser, more tired, more wary, or more
cynical.
3. Testing
The fact that the participants in the study were
exposed to a pretest may, by itself, influence
performance on the posttest.
24. One-group Pretest-Posttest Design (Continued)
Threats to internal validity (continued)
4. Instrumentation
If O1 and O2 are obtained from judges (or raters), for
example, than the judges may become more skillful
between the two sets of observations.
Standardized achievement tests might be re-normed
between pretesting and postesting.
4. Statistical regression
For example, if students are selected to participate in a
remedial intervention because of extremely low scores
on a pretest they are very likely, as a group, to score
higher upon receiving the same (or similar) test as a
posttest.
This results mainly from errors in measurement (or
unreliability in the tests).
25. Design 3:
Static-group Comparison
In this design (diagramed below) a non-random
treatment group is compared to a non-
random comparison group.
Problems associated with this design stem from
the fact that that there is no way to
substantiate that the treatment and
comparison groups were equivalent to begin
with.
X O1
O2
26. Static-group Comparison (Continued)
Threats to internal validity
1. Selection
Here, intact groups, are being compared. It is possible
that the treatment group was already prepared to do
better (or worse) than the comparison group on O;
hence the treatment group might have performed
differently from the comparison group even in the
absence of X.
2. Mortality
It is possible that differences between O1 and O2 are
due to the fact that the nature of the treatment is
such that participants drop out at higher rates than do
participants in the comparison group.
27. Static-group Comparison (Continued)
Threats to internal validity (continued)
3. Interactive effects (e.g., selections and
maturation).
It may be that one of the groups being
compared has a higher (or lower) achievement
trajectory (e.g., when a more advanced class is
compared with a lesser-advanced class).
The three designs discussed so far are usually
referred to as pre-experimental designs.
We will now turn to a consideration of three
true experimental designs.
28. True Experiments
• True experiments are characterized by
random assignment:
– Random assignment of individuals to
treatment conditions.
– Random assignments of treatment
conditions to individuals.
• When comparison groups are large
enough (usually, n > 20) and individuals
are selected at random than
representativeness can be assumed.
29. Design 4.
Pretest-posttest Control Group Design
• Here, individuals are randomly assigned to one of two
groups: the treatment group and a comparison group.
• The treatment group receives the intervention.
• The groups are compared in terms of their
difference scores:
(MO3- MO1 ) vs (MO4 – MO2)
R O1 X O3
R O2 O4
30. Pretest-posttest Control Group Design (Continued)
• This design, and the next two true-
experimental designs, control for all eight of
the threats to internal validity.
• Any differences between groups that might
have existed prior to X are (assumed to be)
controlled through random assignment.
• Any effects do to history, maturation,
testing, instrumentation, regression and so on
would be expected to occur with equal
frequency in both groups.
31. Pretest-posttest Control Group Design (Continued)
Factors effecting external validity:
1. Interactions between the treatment and testing.
The occurs whenever the pretest sensitizes the
treatment group to the effects of the treatment.
2. Interactions between the treatment and group
selection.
This can happen when the population from which the
comparison group samples were selected is not the same
as the target population.
3. Reactive arrangements
Sometimes the setting for the study is artificially
restrictive. When this occurs generalizability suffers.
32. Design 5.
Solomon Four-group Design
This design enjoys several
advantages.
1. Both the main effect of
testing and the interaction of
testing and treatment are
testable.
2. There are multiple tests of
the effect of X:
O2>O1 ; O2 >O4 ; O5>O6 ; O5 >O3
R O1 X O2
R O3 O4
R X O5
R O6
33. Design 6:
Posttest-only Design
Pretests are not always necessary. Given randomization
of subjects to treatment conditions we can assume
that the groups were equivalent prior to the
treatment intervention.
In this design all the threats to internal validity are
controlled for.
As far as external validity is concerned we might still
question whether there might be reactive effects.
R X O1
R O2
34. Design 8:
Non-equivalent Pretest-Postest
Most widely-used quasi-design in
education research.
O1 X O2
______________________________
O3 O4
Used to determine (and adjust where
necessary) whether the groups were
equivalent before onset of treatment.
39. Single (or few) Subject Designs
I certain types of situations these
designs are very appropriate.
When the target population is very small.
Particularly applicable to clinical settings.
When studying specific behaviors of unique
individuals.
Individuals serve as their own controls.
When we want to show that an intervention
can have an effect.
40. Requirements of Single-Subject
Designs
External validity is often difficulty to
establish.
Internal validity requires three things:
Repeated and reliable measurement.
Valid and reliable measuring instruments (or
techniques).
Baseline stability.
Single variable rule (manipulate only one
variable at a time.)
41. Design 8:
A-B-A Withdrawal Design
This design involves alternating phases of
baseline observation and treatment
intervention, X:
0 0 0 0 | X 0 X 0 X 0
__________________________________ ________________________________________________________
Baseline Phase Treatment Phase
During the treatment phase the
intervention is turned on and off.
42. Design 9:
A-B-A Single Subject Design
0 0 0 0 X X X X 0 0 0 0
_____________________________ _______________________________ ____________________________
Baseline Phase Treatment Phase Post-treatment
One problem with this design is that it is
sometimes considered unethical to
discontinue treatment when the
treatment has been shown to be
effective.
43. Design 10:
A-B-A-B Single Subject Design
0 0 0 0 X X X X 0 0 0 0 X X X X
_________________ _____________________ __________________
_____________________
Baseline Treatment Baseline Treatment
The advantage is that it leaves an
effective treatment in place.
44. Other Single-Subject Designs
There are a wide variety of single-subject
designs:
Multiple baseline designs.
Alternating treatment designs.
Increasing/decreasing treatment
intervention designs.
Replicated single-subject designs.