After the planned hypothesis testing for an experiment is finished, exploratory data analysis can look for patterns in these data that may have been missed by the original hypothesis tests.
Successful exploratory analyses help the researcher modify theories and modify or design novel experiments with focussed hypothesis tests.A second use of exploratory data analysis is in diagnostics for hypothesis tests.
This posting describes the difference between Exploratory Data Analysis (EDA) and Confirmatory Data Analysis (CDA). Tukey (1977) distinguished between EDA and CDA. Confirmatory Data Analysis tests hypotheses and produces estimates with a specified precision. Regression analysis, Analysis of Variance, and Hypothesis Tests are examples of Confirmatory Data Analysis. Confirmatory Data Analysis requires hypotheses or assumptions to consider and evaluate.
Exploratory Data Analysis makes few assumptions, and its purpose is to suggest hypotheses and assumptions. Consider the OEM manufacturer described in the posting on 1/30/2008. The company was experiencing customer complaints. A team wanted to identify and remove causes of these complaints. They asked customers for usage data so the team could calculate defect rates. This started an Exploratory Data Analysis. The team plotted a control chart, and these charts identified a high defect rate in October, 1991. The investigation established that a supplier used the wrong raw material. Discussions with the supplier and team members motivated further analysis of raw material, and its composition. This decision to analyze raw material completed the Exploratory Data Analysis. The Exploratory Data Analysis used both data analysis and process knowledge possessed by team members. The supplier and company conducted a series of designed experiments which identified an improved raw material composition. Using this composition, the defect rate improved from .023% to .004%. The experimental design and its analysis was Confirmatory Data Analysis. Note that the experimental design required a hypothesis generated by the Exploratory Data Analysis.
Tukey states that EDA is detective work. He uses the criminal justice process as an analogue to illustrate the roles of EDA and CDA. A detective investigating a crime needs both tools and understanding. The detectives and other investigative units search for and produce evidence. The juries and judges evaluate the evidence’s strength. Exploratory Data Analysis uncovers statements or hypotheses for Confirmatory Data Analysis to consider. Experimental design and regression modeling are more effective if Exploratory Data Analysis uncovers precise statements or hypotheses. Admittedly, one can conduct experiments searching for hypotheses; however, our viewpoint is that preliminary Exploratory Data Analyses may reduce the costs of these experiments.
Hypothesis testing takes the next step in scientific theory, having already stood the rigors of examination. Meanwhile, exploratory research examines unknown areas with no or little-known theories to back them—perceived as a riskier bet.
The article offers two explanations why the NIH prefers hypothesis testing: The research is driven by best practices (how to do and test science), and it’s easy for peer reviewers to separate good from bad science based on the research methods.
Hypothesis-driven research is based on scientific theories, while exploration is based on a search for discovery backed by few theories or none at all.
But after the hypothesis test either does or doesn't reject a null hypothesis, where does the idea for the next experiment come from?Exploratory data analysis completes this research cycle by helping to form and change new theories.
Figure 1. A diagram showing processes underlying the age-based hypothesis of . See the blog post text for further explanation. From ““. By Karl Herrup. Volume 30, Number 50, December 15, 2010.