econirl.environments.random_mdp

econirl.environments.random_mdp(num_states=30, num_actions=3, num_features=3, branching=4, discount_factor=0.95, reward_scale=1.0, self_loop=0.05, action_dependent=True, scale_parameter=1.0, seed=0)[source]

Construct a Garnet-style random ArrayMDP.

Parameters:
  • num_states (int) – Number of discrete states S.

  • num_actions (int) – Number of discrete actions A.

  • num_features (int) – Number of linear reward features K.

  • branching (int) – Number of reachable next-states per state-action pair.

  • discount_factor (float) – Time discount beta in [0, 1).

  • reward_scale (float) – Scale of the random linear reward parameters.

  • self_loop (float) – Probability mass placed on staying in the current state (guarantees aperiodicity). Set to 0 to disable.

  • action_dependent (bool) – Whether features vary across actions.

  • scale_parameter (float) – Logit scale sigma.

  • seed (int) – Random seed. Same seed reproduces the same MDP exactly.

Returns:

An ArrayMDP with sparse transitions and linear reward.

Return type:

ArrayMDP