WhyNot provides a large number of simulated environments from fields ranging from economics to epidemiology. Each simulator comes equipped with a representative set of causal inference experiments and will export a uniform Python interface that makes it easy to construct new causal inference experiments in these environments.

The simulators in WhyNot can be split into two main categories: system dynamics models which use differential equations to model interactions between different components of a system over time and agent-based models which represent a system as a collection of interacting, heterogeneous agents. Below we describe the set of simulators currently available in WhyNot.

Disclaimer: The simulators in our work are not intended to be realistic models of the world, but rather realistic models of the technical difficulties that the real world poses for causal inference. In many cases, the simulators present in WhyNot have been disputed and criticized as models of the real world.


Dynamical systems based simulators:

Agent-based simulators:


Adding more simulators to WhyNot is easy and allows one to take advantage of WhyNot tools for rapidly creating and running new causal inference experiments. For more information, see Adding a Simulator. For additional API reference, including specification of simulator states, configurations, and supported interventions, see Simulator API.

HIV Treatment Simulator

The HIV treatment simulator is a differential equation simulator of HIV treatment based on

Adams, Brian Michael, et al. Dynamic multidrug therapies for HIV: Optimal and STI control approaches. North Carolina State University. Center for Research in Scientific Computation, 2004.

The Adams HIV model has a set of 6 state and 20 simulation parameters to parameterize the dynamics. We omit listing all of them here and refer the reader to the API documentation (HIV) for more details.

Dynamic Integrated Climate Economy Model (DICE)

The Dynamice Integrated Climate Economy Model (DICE) is a computer-based integrated assessment model developed by 2018 Nobel Laureate William Nordhaus that “integrates in an end-to-end fashion the economics, carbon cycle, climate science, and impacts in a highly aggregated model that allows a weighing of the costs and benefits of taking steps to slow greenhouse warming.”

The DICE model has a set of 26 state and 54 simulation parameters to parameterize the dynamics. We omit listing all of them here and refer the reader to the API documentation (DICE) for more details.

Zika Epidemic Prevention Simulator

The Zika simulator is dynamical systems model of transmission and control of the Zika virus disease. The simulator is based on:

Momoh, Abdulfatai A., and Armin Fügenschuh. “Optimal control of intervention strategies and cost effectiveness analysis for a Zika virus model.” Operations Research for Health Care 18 (2018): 99-111.

The intended purpose of the model was to study the efficacy of various strategies for controlling the spread of the Zika virus: the use of treated bed nets, the use of condoms, a medical treatment of infected persons, and the use of indoor residual spray (IRS). The dynamics of the Zika model govern the evolution of 9 state variables, and the simulator has four control variables and 20 simulation parameters. See Zika for precise details.

Opioid Epidemic Simulator

The opioid epidemic simulator is a system dynamics model of the US opioid epidemic developed by Chen et al. (JAMA, 2019). The model is calibrated based on past opioid use data from the Center for Disease Control and was developed to simulate the effect of interventions like reducing the number of new non-medical users of opioids on future opioid overdose deaths in the United States. The simulator is a time-varying differential equations model in 3 variables. For a complete description of the model, please refer to the appendix of Chen et al., and see Opioid Epidemic for more details about the state variables and simulation parameters.


World3 is a systems dynamics model commissioned by the Club of Rome in the early 1970s to illustrate the interactions between population growth, industrial development, and the limitations of the natural environment over time.

The model is a differential equation model with 12 state variables and 245 algebraic equations governing their evolution over time. See World3 for a complete description of the state variables and parameters.


World 2 is a systems dynamics model developed by Jay Forrester to demonstrate the tension between industrial growth and natural resource limitations. The model is a precursor to the World3 model and, although it was used to study similar questions, it represents different dynamics.

The model is a system of differential equations in 5 variables corresponding to quantities and 43 algebraic equations governing their evolution over time. See World2 for a complete description of the state variables and parameters.

Civil Violence Simulator

Civil Violence is an agent-based model of civil violence introduced by Joshua Epstein in 2002. The model was originally used to study the complex dynamics of decentralized rebellion and revolution and to examine the state’s efforts to counter these dynamics. The model consists of two types of actors: agents and cops. Agents are heterogeneous, and their varied features make them more or less likely to actively rebel against the state. The rich dynamics of the model emerge from the interaction between agents and between agents and cops: agents are more likely to begin rebel if other agents start to rebel, and the cops attempt to arrest rebelling agents. See Civil Violence for a complete description of the agents and the simulation parameters.

The implementation of this simulator is taken from the examples of the mesa library.

Incarceration Simulator

The incarceration simulator is based on the paper:

Lum K, Swarup S, Eubank S, Hawdon J. The contagious nature of imprisonment: an agent-based model to explain racial disparities in incarceration rates. J R Soc Interface. 2014;11(98):20140409. doi:10.1098/rsif.2014.0409

The paper proposes an agent-based model that models incarceration as “contagious” in the sense that social ties to incarcerated individuals lead to a higher risk of being imprisoned. The simulation occurs on a fixed set of agents with a fixed set of social ties. What varies is the randomness with which incarceration is passed on and randomness in sentence length. Transition probabilities, and the sentence length distribution are based on real data. The paper shows that higher-on-average sentence lengths for black individuals than for whites lead to a disparity in incarceration rates that resembles the one observed in the United States.

Lotka-Volterra Model

Lotka-Volterra is a classical differential equation model of the interactions between predator and prey in a single ecosystem. The model was originally developed to understand and explain perplexing fishery statistics during World War I, namely why the hiatus of fishing during the war led to an observed increase in the number of predators.

For more details, see Scholl 2012.

The simulator is system of ordinary differential equations in two variables. For a complete description of the dynamics, see here. The state and simulator parameters are detailed in Lotka Volterra.

Delayed Impact Simulator

The Delayed Impact is a lending simulator is based on the paper:

Liu, L., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. (2018, July). Delayed Impact of Fair Machine Learning. In International Conference on Machine Learning (pp. 3156-3164). Chicago.

The paper proposes a simple lending model in which individuals apply for loans, a lending institution approves or denies the loan on the basis of the individual’s credit score, and subsequent loan repayment or default in turn changes the individual’s credit score. Credit scores and repayment probabilities are based on real FICO data. In this dynamic setting, the paper shows that static fairness criterion do not necessarily promote improvement over time and can indeed cause active harm.

Credit Simulator

The credit simulator is based on the paper:

Perdomo, Juan C., Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. “Performative Prediction.” arXiv preprint arXiv:2002.06673 (2020).

The simulator reprises on a model of strategic classification in which an institution classifies the creditworthiness of loan applicants, and agents react to the institution’s classifier by manipulating their features to increase the likelihood that they receive a favorable classification. The underlying data comes from a Kaggle credit scoring dataset, though the agent response model is synthetic. The model was originally used to qualitatively analyze the long-run properties of repeated retraining of classifiers in the face of strategic adaptation.

Schelling Model

The Schelling model is a classic agent-based model originally used to illustrate how weak individual preferences regarding one’s neighbors can lead to global segregation of entire cities. In the model, individuals prefer to live where at least some fraction of their neighbors are the same race as they are and will move if this constraint is not met. As this process is iterated, an originally well-mixed city rapidly becomes segregated by group.

Agents The agents in Schelling’s model are labeled either Type 0 or Type 1, corresponding to members of the majority or minority class.

Simulation Parameters

  • Grid size (height, width)
  • Agent density
  • Percentage of minority agents
  • Homophily
  • Education boost (how much receiving education decreases homophily)
  • Percentage of agents receiving education

The implementation of this simulator is taken from the examples of the mesa library.

LaLonde Synthetic Outcome Model

The Lalone simulator is based on data from Robert LaLonde’s 1986 study evaluating the impact of the National Supported Work Demonstration, a labor training program, on post-intervention income levels. Since the actual function mapping the measured covariates to the observed outcomes is unknown, we instead simulate random functions of varying complexity on the data to generate synthetic outputs. This procedure allows us to generate causal inference problems with response surfaces of varying, but known complexity.

In the Lalonde data, the function mapping covariates \(X\) to outcome \(Y\) is unknown, and it is impossible to simulate ground truth. Therefore, following Hill et al., we replace the true outcome \(Y\) with one generated by functions \(f_0, f_1\), corresponds to control and treatment, as follows. Let \(W\) denote treatment assignment. Then,

\[f_0(X) = Y(0), f_1(X) = Y(1), Y = Y(W).\]