Internal vs. External Validity In Psychology

Internal validity centers on demonstrating clear casual relationships within the bounds of a specific study and external validity relates to demonstrating the applicability of findings beyond that original study situation or population.

Researchers have to weigh these considerations in designing methodologically rigorous and generalizable studies.

Internal ValidityExternal Validity
DefinitionWhether conclusions about cause and effect relationships within a study are validThe extent study results apply to contexts beyond the original study
Main ConcernWere effects observed really caused by the independent variable or did flaws in the study design/conduct lead to that result?Can results be expected to apply to other settings, populations, times?
Key FactorsRandomization, control conditions, elimination of confounding variablesHaving a sample representative of the population of interest, testing variability in contexts
Examples ThreatsSelection bias, attrition, history effectsInteraction effects of setting and treatment, limited participant sample
How to ImproveUse control groups, randomization, blinding, account for confoundersDraw from heterogeneous, more representative samples, replicate across ranges of contexts
Balance ConsiderationControlling internal validity often means more artificial research contextBroader generalizability requires flexible, real-world applicable paradigms
two people working at a lap, writing notes on paper
Internal validity relates to how well a study is conducted (its experimental design and methods), while external validity relates to how applicable and generalizable the findings are to the world at large. 

Internal Validity 

Internal validity refers to the degree of confidence that the causal relationship being tested exists and is trustworthy.

It tests how likely it is that your treatment caused the differences in results that you observe. Internal validity is largely determined by the study’s experimental design and methods

Studies that have a high degree of internal validity provide strong evidence of causality, so it makes it possible to eliminate alternative explanations for a finding.

Studies with low internal validity provide weak evidence of causality. The less chance there is for confounding or extraneous variables, the higher the internal validity and the more confident we can be in our findings. 

In order to assume cause and effect in a research study, the cause must precede the effect in terms of time, the cause and effect must vary together, and there must be no other explanations for the relationship observed. If these three criteria are observed, you can be confident that a study is internally valid. 

Example

An example of a study with high internal validity would be if you wanted to run an experiment to see if using a particular weight-loss pill will help people lose weight.

To test this hypothesis, you would randomly assign a sample of participants to one of two groups: those who will take the weight-loss pill and those who will take a placebo pill.

You can ensure that there is no bias in how participants are assigned to the groups by blinding the research assistants, so they don’t know which participants are in which groups during the experiment. The participants are also blinded, so they do not know whether they are receiving the intervention or not.

If participants drop out of the study, their characteristics are examined to ensure there is no systematic bias regarding who left.

It is important to have a well-thought-out research procedure to mitigate the threats to internal validity.

External Validity

External validity refers to the extent to which the results of a research study can be applied or generalized to another context.

This is important because if external validity is established, the studies’ findings can be generalized to a larger population as opposed to only the relatively few subjects who participated in the study. Unlike internal validity, external validity doesn’t assess causality or rule out confounders.

There are two types of external validity: ecological validity and population validity.

  • Ecological validity refers to whether a study’s findings can be generalized to other situations or settings. A high ecological validity means that there is a high degree of similarity between the experimental setting and another setting, and thus we can be confident that the results will generalize to that other setting.
  • Population validity refers to how well the experimental sample represents other populations or groups. Using random sampling techniques, such as stratified sampling or cluster sampling, significantly helps increase population validity. 

Example

An example of a study with high external validity would be if you hypothesize that practicing mindfulness two times per week will improve the mental health of those diagnosed with depression.

You recruit people who have been diagnosed with depression for at least a year and are between 18–29 years old. Choosing this representative sample with a clearly defined population of interest helps ensure external validity. 

You give participants a pre-test and a post-test measuring how often they experienced symptoms of depression in the past week.

During the study, all participants were given individual mindfulness training and asked to practice mindfulness daily for 15 minutes as part of their morning routine. 

You can also replicate the study’s results using different methods of mindfulness or different samples of participants. 

Trade-off Between Internal and External Validity

There tends to be a negative correlation between internal and external validity in experimental research. This means that experiments that have high internal validity will likely have low external validity and vice versa. 

This happens because experimental conditions that produce higher degrees of internal validity (e.g., artificial labs) tend to be highly unlikely to match real-world conditions. So, the external validity is weaker because a lab environment is much different than the real world. 

On the other hand, to produce higher degrees of external validity, you want experimental conditions that match a real-world setting (e.g., observational studies).

However, this comes at the expense of internal validity because these types of studies increase the likelihood of confounding variables and alternative explanations for differences in outcomes. 

A solution to this trade-off is replication! You want to conduct the research in multiple environments and settings – first in a controlled, artificial environment to establish the existence of a causal relationship and then in a “real-world” setting to analyze if the results are generalizable. 

Threats to Internal Validity

Attrition

Attrition refers to the loss of study participants over time. Participants might drop out or leave the study which means that the results are based solely on a biased sample of only the people who did not choose to leave.

Differential rates of attrition between treatment and control groups can skew results by affecting the relationship between your independent and dependent variables and thus affect the internal validity of a study. 

Confounders

A confounding variable is an unmeasured third variable that influences, or “confounds,” the relationship between an independent and a dependent variable by suggesting the presence of a spurious correlation.

Confounders are threats to internal validity because you can’t tell whether the predicted independent variable causes the outcome or if the confounding variable causes it.

Participant Selection Bias

This is a bias that may result from the selection or assignment of study groups in such a way that proper randomization is not achieved.

If participants are not randomly assigned to groups, the sample obtained might not be representative of the population intended to be studied. For example, some members of a population might be less likely to be included than others due to motivation, willingness to take part in the study, or demographics. 

Experimenter Bias

Experimenter bias occurs when an experimenter behaves in a different way with different groups in a study, impacting the results and threatening internal validity. This can be eliminated through blinding. 

Social Interaction (Diffusion)

Diffusion refers to when the treatment in research spreads within or between treatment and control groups. This can happen when there is interaction or observation among the groups.

Diffusion poses a threat to internal validity because it can lead to resentful demoralization. This is when the control group is less motivated because they feel resentful over the group that they are in. 

Historical Events

Historical events might influence the outcome of studies that occur over longer periods of time. For example, changes in political leadership, natural disasters, or other unanticipated events might change the conditions of the study and influence the outcomes.

Instrumentation

Instrumentation refers to any change in the dependent variable in a study that arises from changes in the measuring instrument used. This happens when different measures are used in the pre-test and post-test phases. 

Maturation

Maturation refers to the impact of time on a study. If the outcomes of the study vary as a natural result of time, it might not be possible to determine whether the effects seen in the study were due to the study treatment or simply due to the impact of time. 

Statistical Regression

Regression to the mean refers to the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely going to be closer to its mean.

This is a threat to internal validity as participants at extreme ends of treatment can naturally fall in a certain direction due to the passage of time rather than being a direct effect of an intervention. 

Repeated Testing

Testing your research participants repeatedly with the same measures will influence your research findings because participants will become more accustomed to the testing. Due to familiarity, or awareness of the study’s purpose, many participants might achieve better results over time.

Threats to External Validity 

Sample Features

If some feature(s) of the sample used were responsible for the effect, this could lead to limited generalizability of the findings.

Historical Events

Historical events might influence the outcome of studies that occur over longer periods of time. For example, changes in political leadership, natural disasters, or other unanticipated events might change the conditions of the study and influence the outcomes.

Participant Selection Bias

This is a bias that may result from the selection or assignment of study groups in such a way that proper randomization is not achieved. If participants are not randomly assigned to groups, the sample obtained might not be representative of the population intended to be studied.

For example, some members of a population might be less likely to be included than others due to motivation, willingness to take part in the study, or demographics. 

Situational Factors

Factors such as the setting, time of day, location, researchers’ characteristics, noise, or the number of measures might affect the generalizability of the findings.

Repeated Testing

Testing your research participants repeatedly with the same measures will influence your research findings because participants will become more accustomed to the testing. Due to familiarity, or awareness of the study’s purpose, many participants might achieve better results over time.

Aptitude-Treatment Interaction → Aptitude-Treatment Interaction to the concept that some treatments are more or less effective for particular individuals depending upon their specific abilities or characteristics. 

Hawthorne Effect

The Hawthorne Effect refers to the tendency for participants to change their behaviors simply because they know they are being studied.

Experimenter Effect

Experimenter bias occurs when an experimenter behaves in a different way with different groups in a study, impacting the results and threatening the external validity.

John Henry Effect

The John Henry Effect refers to the tendency for participants in a control group to actively work harder because they know they are in an experiment and want to overcome the “disadvantage” of being in the control group.

Factors that Improve Internal Validity

Blinding

Blinding refers to a practice where the participants (and sometimes the researchers) are unaware of what intervention they are receiving.

This reduces the influence of extraneous factors and minimizes bias, as any differences in outcome can thus be linked to the intervention and not to the participant’s knowledge of whether they were receiving a new treatment or not. 

Random Sampling

Using random sampling to obtain a sample that represents the population that you wish to study will improve internal validity. 

Random Assignment

Using random assignment to assign participants to control and treatment groups ensures that there is no systematic bias among the research groups. 

Strict Study Protocol

Highly controlled experiments tend to improve internal validity. Experiments that occur in lab settings tend to have higher validity as this reduces variability from sources other than the treatment. 

Experimental Manipulation

Manipulating an independent variable in a study as opposed to just observing an association without conducting an intervention improves internal validity. 

Factors that Improve External Validity

Replication

Conducting a study more than once with a different sample or in a different setting to see if the results will replicate can help improve external validity.

If multiple studies have been conducted on the same topic, a meta-analysis can be used to determine if the effect of an independent variable can be replicated, thus making it more reliable.

Replication is the strongest method to counter threats to external validity by enhancing generalizability to other settings, populations, and conditions.

Field Experiments

Conducting a study outside the laboratory, in a natural, real-world setting will improve external validity (however, this will threaten the internal validity) 

Probability Sampling

Using probability sampling will counter selection bias by making sure everyone in a population has an equal chance of being selected for a study sample.

Recalibration

Recalibration is the use of statistical methods to maintain accuracy, standardization, and repeatability in measurements to assure reliable results.

Reweighting groups, if a study had uneven groups for a particular characteristic (such as age), is an example of calibration. 

Inclusion and Exclusion Criteria

Setting criteria as to who can be involved in the research and who cannot be involved will ensure that the population being studied is clearly defined and that the sample is representative of the population.

Psychological Realism

Psychological realism refers to the process of making sure participants perceive the experimental manipulations as real events so as to not reveal the purpose of the study and so participants don’t behave differently than they would in real life based on knowing the study’s goal.

Print Friendly, PDF & Email

Saul Mcleod, PhD

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Educator, Researcher

Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.


Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.