econirl.environments.random_mdp
- econirl.environments.random_mdp(num_states=30, num_actions=3, num_features=3, branching=4, discount_factor=0.95, reward_scale=1.0, self_loop=0.05, action_dependent=True, scale_parameter=1.0, seed=0)[source]
Construct a Garnet-style random
ArrayMDP.- Parameters:
num_states (int) – Number of discrete states
S.num_actions (int) – Number of discrete actions
A.num_features (int) – Number of linear reward features
K.branching (int) – Number of reachable next-states per state-action pair.
discount_factor (float) – Time discount
betain[0, 1).reward_scale (float) – Scale of the random linear reward parameters.
self_loop (float) – Probability mass placed on staying in the current state (guarantees aperiodicity). Set to 0 to disable.
action_dependent (bool) – Whether features vary across actions.
scale_parameter (float) – Logit scale
sigma.seed (int) – Random seed. Same seed reproduces the same MDP exactly.
- Returns:
An
ArrayMDPwith sparse transitions and linear reward.- Return type: