Sequential Decision Making

WhyNot is also an excellent test bed for sequential decision making and reinforcement learning in diverse dynamic environments. WhyNot offers RL environments compatible with the OpenAI Gym API style, so that existing code for OpenAI Gym can be adapted for WhyNot with minimal changes.

Using Existing WhyNot Environments

To see all available environments,

import whynot.gym as gym
for env in gym.envs.registry.all():
    print(env.id)

To create an environment, set the random seed, and get an initial observation,

env = gym.make('HIV-v0')
env.seed(1)
observation = env.reset()

To sample a random action and perform the random action, use the step function. The step function returns the reward, the next observation, whether the environment achieves a terminal state, and a dict of additional debugging info.

action = env.action_space.sample()
observation, reward, done, info = env.step(action)

The actions, observations, and rewards in the WhyNot Gym environment are all represented as numpy arrays. The environment works with algorithms implemented in any Python numerical computation library, such as PyTorch or TensorFlow.

See this notebook for an example of training policies on the HIV environment.

Defining a New Custom Environment

To define a new custom environment on top of a WhyNot simulator, implement 1) the reward function, 2) a mapping from numerical actions to system interventions, and, optionally, 3) a mapping from state to observation. The class ODEEnvBuilder then wraps an arbitrary dynamical system simulator into a Gym environment for reinforcement learning.

For example, we defined the HIV environment by

from whynot.gym.envs import ODEEnvBuilder
from whynot.simulators.hiv import Config, Intervention, State
from whynot.simulators.hiv import simulate

def reward_fn(intervention, state):
    reward = ...
    return reward

def intervention_fn(action, time):
    action_to_intervention_map = ...
    return action_to_intervention_map[action]

HivEnv = ODEEnvBuilder(
    # Specify the dynamical system
    simulate_fn=simulate,
    # Simulator configuration
    config=Config(),
    # Initial state to begin simulator
    initial_state=State(),
    # Define the action space
    action_space=spaces.Discrete(...)
    # Define the observation space
    observation_space=spaces.Box(...)
    # Convert numerical actions to simulator interventions
    intervention_fn=intervention_fn,
    # Define the reward function
    reward_fn=reward_fn,
)