econirl.environments.random_mdp

econirl.environments.random_mdp(num_states=30, num_actions=3, num_features=3, branching=4, discount_factor=0.95, reward_scale=1.0, self_loop=0.05, action_dependent=True, scale_parameter=1.0, seed=0)[source]

Construct a Garnet-style random ArrayMDP.

Parameters:

num_states (int) – Number of discrete states S.
num_actions (int) – Number of discrete actions A.
num_features (int) – Number of linear reward features K.
branching (int) – Number of reachable next-states per state-action pair.
discount_factor (float) – Time discount beta in [0, 1).
reward_scale (float) – Scale of the random linear reward parameters.
self_loop (float) – Probability mass placed on staying in the current state (guarantees aperiodicity). Set to 0 to disable.
action_dependent (bool) – Whether features vary across actions.
scale_parameter (float) – Logit scale sigma.
seed (int) – Random seed. Same seed reproduces the same MDP exactly.

Returns:

An ArrayMDP with sparse transitions and linear reward.

Return type:

ArrayMDP