Sequential Decision Making¶
WhyNot is also an excellent test bed for sequential decision making and reinforcement learning in diverse dynamic environments. WhyNot offers RL environments compatible with the OpenAI Gym API style, so that existing code for OpenAI Gym can be adapted for WhyNot with minimal changes.
Using Existing WhyNot Environments¶
To see all available environments,
import whynot.gym as gym
for env in gym.envs.registry.all():
print(env.id)
To create an environment, set the random seed, and get an initial observation,
env = gym.make('HIV-v0')
env.seed(1)
observation = env.reset()
To sample a random action and perform the random action, use the step
function. The step function returns the reward, the next observation, whether
the environment achieves a terminal state, and a dict of additional debugging
info.
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
The actions, observations, and rewards in the WhyNot Gym environment are all represented as numpy arrays. The environment works with algorithms implemented in any Python numerical computation library, such as PyTorch or TensorFlow.
See this notebook for an example of training policies on the HIV environment.
Defining a New Custom Environment¶
To define a new custom environment on top of a WhyNot simulator, implement 1)
the reward function, 2) a mapping from numerical actions to system
interventions, and, optionally, 3) a mapping from state to observation. The
class ODEEnvBuilder
then wraps an arbitrary dynamical
system simulator into a Gym environment for reinforcement learning.
For example, we defined the HIV environment by
from whynot.gym.envs import ODEEnvBuilder
from whynot.simulators.hiv import Config, Intervention, State
from whynot.simulators.hiv import simulate
def reward_fn(intervention, state):
reward = ...
return reward
def intervention_fn(action, time):
action_to_intervention_map = ...
return action_to_intervention_map[action]
HivEnv = ODEEnvBuilder(
# Specify the dynamical system
simulate_fn=simulate,
# Simulator configuration
config=Config(),
# Initial state to begin simulator
initial_state=State(),
# Define the action space
action_space=spaces.Discrete(...)
# Define the observation space
observation_space=spaces.Box(...)
# Convert numerical actions to simulator interventions
intervention_fn=intervention_fn,
# Define the reward function
reward_fn=reward_fn,
)