New Method for Reducing Subjectivity in Analysis of Observational Data: Interventional Probability of Causation

published June 26, 2024

The Center for Truth in Science is excited to announce the publication of the first article from an important project that began in 2020. Featured in this month’s Critical Reviews in Toxicology, the paper, authored by Drs. Tony Cox, Kenneth Mundt, and William Thompson, is entitled, Interventional Probability of Causation with Epidemiologic Partial Mechanistic Evidence: Benzene vs. Formaldehyde and Acute Myeloid Leukemia (AML).

While the primary audiences for this publication are researchers and regulators who work in health and environmental policy, the Center believes the novel methodology described and illustrated in this case-study can be useful in many situations where causation must be determined from observational data. In this working example, the authors use the method to answer the question of whether benzene in one instance, and inhaled formaldehyde in another, causes myeloid leukemia in humans. As benzene exposure has been well established to increase the risk of AML, it serves as a positive control in this study to which we can compare the answer to the more controversial question of whether formaldehyde exposure causes AML.

Interventional Probability of Causation (IPoC), as the project and method are named, might be best understood by a brief look at the history of the field of epidemiology, and how it has been used to identify the causes of diseases in populations when controlled experiments in humans can’t be conducted for ethical or practical reasons. The use of this new tool has the potential to change debates on what is fair in law and efficient in regulation to what is the probability of achieving a fair outcome or one where a likely benefit exceeds the costs.

A Look Back in History

Scientific methods to understand the spread of disease go back to ancient Greek times when Hippocrates first posited the theory that environmental and other factors, such as behaviors, might influence the development and spread of disease. Since then, the field of epidemiology has made countless critical contributions to human health. In the case of infectious diseases, discovering a common source of exposure – such as the famous story of John Snow, considered the father of the field of epidemiology, and the tracing of cholera outbreaks at water pumps in London during the late 1800’s – can often lead to quick prevention and eradication by eliminating the source of exposure.

Causal Inference in Modern Times

But is determining the source (or cause) of harm and getting rid of it always this simple? Unfortunately, no. Consider the thousands, perhaps millions, of chemical and biological exposures humans and animals encounter every day. Consider, also, the variability in animal species and especially the differences among human beings in terms of genetics, environment, personal history, and a host of other factors, known to researchers as confounders. Current wisdom links smoking cigarettes to lung cancer and a variety of other serious diseases and harms but not everyone who smokes will succumb. It is thought that those individuals who do not experience harms have genetic differences that lead to repair of DNA damage more efficiently, as well as allow their bodies to detoxify cigarette smoke. While smoking is well understood to cause cancer for too many smokers, it is not always the case. This begins to show some of the complexities that arise when trying to determine cause and effect of health impacts from environmental or behavioral exposures.

Consider the case of exposure to the chemical formaldehyde, which is found in many everyday items such as wood, particleboard, and laminate products (i.e., kitchen cabinets and flooring), cosmetics, and is emitted by gas stoves and tobacco products. The human body also produces formaldehyde on its own (endogenously). Over the time that formaldehyde has been used in manufacturing, as well as a preservative, much has been learned about potential harms. Has it been found to be a cause of cancers? Yes, with caveats. It is currently listed by the U.S. Environmental Protection Agency (EPA) as probably carcinogenic to humans (categorized as 2B) but the science is undergoing new analysis to determine if that should be changed. Also, the level of exposure, as well as the route of exposure, matters. The EPA currently states on their website:

Exposure to formaldehyde may occur by breathing contaminated indoor air, tobacco smoke, or ambient urban air. Acute (short-term) and chronic (long-term) inhalation exposure to formaldehyde in humans can result in respiratory symptoms, and eye, nose, and throat irritation. Limited human studies have reported an association between formaldehyde exposure and lung and nasopharyngeal cancer. Animal inhalation studies have reported an increased incidence of nasal squamous cell cancer.

If an individual who works in a job or profession where there is exposure to formaldehyde (i.e., embalmer) comes down with cancer, is it a given that the exposure caused their cancer? It would depend on the type of cancer (i.e., lung cancer), the level of exposure, and how formaldehyde entered the body (inhaled, through skin, etc.). In the case of inhalation, the Agency for Toxic Substances and Disease Registry (ATSDR) established a minimal level of exposure from inhalation before any noncancer health effects would occur (.003 ppm).

For lung and nasopharyngeal cancer from exposure to formaldehyde, EPA’s website says that there are “statistically significant associations between exposure to formaldehyde and increased incidence of lung and nasopharyngeal cancer.” They go on to say that they cannot rule out that the statistical association is not due to “chance, bias, or confounding.” Thus, they consider this evidence “limited.”

However, there are other cancers that we are concerned about with formaldehyde such, as leukemias (cancers of the blood and bone marrow). Two newly published scientific studies (Cox et al, 2024 see above; Vincent et al, 2024 https://doi.org/10.1093/toxsci/kfae039) have shown that formaldehyde does not cause blood cancers such as leukemia, because when inhaled, a mechanism (or way) for it to enter the blood stream and bone marrow has not been found.

So, in the case one type of cancer, we have statistically significant associations that may not be causal, due to statistical issues. In the other case, there is no biological mechanism that would allow for cancer.

One can imagine how the complexities of our daily exposures multiply with more complex examples or combinations of exposures. It is important to keep in mind that cancer is not a single disease, and that establishing causation of cancer in one organ or tissue type within an organ does not translate into a finding that the agent can cause cancer in all organs or tissues.

All of this is to illustrate the intricacies involved in inferring from available evidence whether exposure to something in the environment causes a particular harm to humans. How can it be proven from observations whether an exposure actually causes harm, or was only associated with it? These are the complex questions that public health practitioners, regulators, and the legal system confront daily. Making determinations of a product’s safety is essential to protect consumers and the environment. But short-sighted decisions based on inadequate knowledge or outdated research techniques can upend human health, disrupt the economy, and impact the livelihoods of workers (in the case where a substance or product is banned, and manufacturing is shut down). While epidemiology is important, and the observational studies it relies on contribute to our knowledge when trying to infer causality of harm, they are not enough to accurately demonstrate cause in many cases, especially in individuals.

Therefore, the Center decided in 2021 to invest in the development of advanced analytical methods that better determine causality from observational data. The field of epidemiology has been working on this since the early days of understanding the spread of infectious diseases in the mid-to-late 1800’s, and methods such as the Bradford Hill Criteria have been employed more often in recent times. However, many scientists feel that more can be done, especially using techniques that come from ongoing advances in mathematics, economics, statistics, and computer science (artificial intelligence).

In The Book of Why, Professor Judea Pearl introduces ideas in causal inference that improve on ways to work with and think about the data that come from observational studies, and how they might be helpful when trying to determine cause and effect. He illustrates this with a three-rung ladder, the Ladder of Causation, on which association (observation) is on the bottom rung. He uses questions to illustrate: Knowing that a person has a headache does not tell you it was caused by stress, it could be an infection, hormones, or medications. The next rung up is intervention, deliberate intervention. What if John Snow had erroneously banned eating raw oysters, a common cause of cholera? On the top rung, Perl places counterfactuals, where imagination, retrospection, and understanding are used to think about what would have happened if things had been done differently: What would happen if people continued to get their drinking water out of the offending pump? Even if they had stopped eating raw oysters, cholera would have continued. [Pearl and Mackenzie, 2018 p.28 (Figure 1.2)]

The Center has worked for the past three years with a committee made up of experienced risk analysts and epidemiologists to explore and test methods that are on the third rung of Perl’s Ladder of Causation. The recently published paper describes the methods they have developed and applied in Interventional Probability of Causation (IPoC).

The authors of the article in Critical Reviews in Toxicology point out the limitations of traditional approaches in epidemiology when trying to determine cause and quantify risk from incomplete causal knowledge, partial evidence, and observational data. They present IPoC methods as possible solutions to each of these limitations. Using existing data and knowledge, IPoC takes advantage of modern causal Artificial Intelligence (AI) and machine learning techniques to address the uncertainties. By relying on these more rational, mathematical approaches, there is less subjective reasoning (and more reduction of bias) in determining relationships between exposures and outcomes, as well as establishing more precise targets for prevention or health interventions. IPoC methods can answer questions such as the following:

What is the likelihood that an individual’s risk would change if exposure levels were changed or if (counterfactually) they had been changed in the past to be different from the real exposure?
What is the likelihood that a target population’s risk would change, if at all, in the future if exposure levels were changed?
What direct effects have past changes in exposure had on individual and population risks?
How sure can we be about the answers to these questions given realistically limited evidence and knowledge currently available?

Case Study – Formaldehyde or Benzene Exposure and Increased Risk for Leukemia

The case study described in the published article, which looked at increased risk for leukemia from exposure to inhaled formaldehyde or benzene, answered the following question:

To what extent do available knowledge and data support conclusions that reducing formaldehyde/benzene exposure reduces cases of myeloid leukemia (AML) by amounts that can be quantified?

After applying IPoC methods, the authors concluded

Prolonged, high-intensity exposure to benzene can increase risk of AML.
No causal pathway leading from formaldehyde exposure to increased risk of AML was found; and thus, exposure to formaldehyde will not increase risk of AML.

Conclusions

The published article shows that the IPoC approach can differentiate between likely and unlikely causal factors and can provide useful upper bounds for the likelihood of different exposures as a cause of disease. For causal factors, IPoC can help quantitatively estimate the results of health or behavioral interventions that reduce exposures (such as limiting exposures to certain chemicals), even when there is incomplete evidence of the way those exposures cause harm, or it is uncertain what the individual level of response is to the exposure.

For details on the explanation of IPoC and how it provides a useful evolutionary step in our interpretation of observational data and incomplete causal knowledge, the full paper can be found via this link: https://doi.org/10.1080/10408444.2024.2337435.

The Center will continue to disseminate these findings and support ongoing development of these ideas and their application to epidemiology, toxicology, risk analysis and their use in regulation and litigation.

The top takeaways and full summary of the case study are available here.

Other

Politics Has Poisoned Science. Philanthropy Can Help Provide the Cure.

Learn More

October 3, 2024

Other

Why Robust Methods in Systematic Review Matter: The Case of Formaldehyde and Myeloid Leukemia

Learn More

March 21, 2024

Other

Center-funded research to be highlighted at the 2023 Society for Risk Analysis Meeting in Washington, DC

Learn More

October 16, 2023