econirl.environments.ArrayMDP
- class econirl.environments.ArrayMDP(transitions, features, theta, discount_factor=0.95, scale_parameter=1.0, parameter_names=None, initial_distribution=None, seed=None)[source]
Bases:
DDCEnvironmentA DDC environment defined by explicit transition, feature, and reward arrays.
- Parameters:
transitions (np.ndarray | jnp.ndarray) –
(A, S, S)transition probabilitiesP(s'|s,a).features (np.ndarray | jnp.ndarray) –
(S, A, K)feature tensorphi(s, a).theta (Sequence[float] | np.ndarray | jnp.ndarray | Mapping[str, float]) – Linear reward parameters. Either a length-
Karray, or a mapping{name: value}of lengthK(its keys become the parameter names, preserving insertion order).discount_factor (float) – Time discount
betain[0, 1).scale_parameter (float) – Logit scale
sigma > 0.parameter_names (Sequence[str] | None) – Optional names for the
Kparameters. Ignored whenthetais a mapping (the mapping keys are used instead). Defaults to["theta_0", ..., "theta_{K-1}"].initial_distribution (np.ndarray | jnp.ndarray | None) – Optional
(S,)initial-state distribution. Defaults to uniform.seed (int | None) – Random seed for the environment RNG used by
reset/step.
Example
>>> import numpy as np >>> S, A, K = 5, 2, 2 >>> T = np.zeros((A, S, S)); T[:, np.arange(S), np.arange(S)] = 1.0 >>> phi = np.random.default_rng(0).normal(size=(S, A, K)) >>> env = ArrayMDP(T, phi, theta={"cost": -1.0, "value": 0.5}) >>> from econirl.simulation.synthetic import simulate_panel >>> panel = simulate_panel(env, n_individuals=10, n_periods=20, seed=1)
- __init__(transitions, features, theta, discount_factor=0.95, scale_parameter=1.0, parameter_names=None, initial_distribution=None, seed=None)[source]
Initialize the environment.
- Parameters:
discount_factor (float) – Time discount factor β ∈ [0, 1)
scale_parameter (float) – Logit scale parameter σ > 0
seed (int | None) – Random seed for reproducibility
transitions (ndarray | Array)
features (ndarray | Array)
theta (Sequence[float] | ndarray | Array | Mapping[str, float])
initial_distribution (ndarray | Array | None)
- Return type:
None
- property transition_matrices: Array
Return transition probability matrices.
- Returns:
Tensor of shape (num_actions, num_states, num_states) where result[a, s, s’] = P(s’ | s, a)
- property feature_matrix: Array
Return feature matrix for utility computation.
Features are the observable characteristics that enter the utility function: U(s,a;θ) = θ · φ(s,a)
- Returns:
Tensor of shape (num_states, num_actions, num_features)
- property true_parameters: dict[str, float]
Return the true utility parameters (for simulation studies).
- Returns:
Dictionary mapping parameter names to values
- property true_reward_matrix: Array
Ground-truth flow reward
R(s, a) = theta . phi(s, a), shape (S, A).
- compute_utility_matrix(parameters=None)
Compute the full utility matrix for all state-action pairs.
- Parameters:
parameters (Array | None) – Optional parameter vector. If None, uses true_parameters.
- Returns:
Tensor of shape (num_states, num_actions) with flow utilities
- Return type:
Array
- encode_states(states)
Encode flat state indices to continuous features.
Default: normalized scalar s/(S-1) with shape (batch, 1). Override for multi-dimensional environments.
- Parameters:
states (Array)
- Return type:
Array
- generate_panel(n_individuals=1000, n_periods=100, seed=42, as_dataframe=False)
Generate synthetic panel data from this environment.
Computes the optimal policy from the true parameters and simulates trajectories for multiple individuals.
- Parameters:
- Returns:
Panel object, or DataFrame if as_dataframe=True.
- Return type:
- get_true_parameter_vector()
Return true parameters as a tensor in canonical order.
- Return type:
Array
- classmethod info()[source]
Return metadata about this environment.
Subclasses should override this to provide name, description, source, n_states, n_actions, parameter details, etc.
- property problem_spec: DDCProblem
Return the DDCProblem specification for this environment.