econirl.environments.ArrayMDP

class econirl.environments.ArrayMDP(transitions, features, theta, discount_factor=0.95, scale_parameter=1.0, parameter_names=None, initial_distribution=None, seed=None)[source]

Bases: DDCEnvironment

A DDC environment defined by explicit transition, feature, and reward arrays.

Parameters:
  • transitions (np.ndarray | jnp.ndarray) – (A, S, S) transition probabilities P(s'|s,a).

  • features (np.ndarray | jnp.ndarray) – (S, A, K) feature tensor phi(s, a).

  • theta (Sequence[float] | np.ndarray | jnp.ndarray | Mapping[str, float]) – Linear reward parameters. Either a length-K array, or a mapping {name: value} of length K (its keys become the parameter names, preserving insertion order).

  • discount_factor (float) – Time discount beta in [0, 1).

  • scale_parameter (float) – Logit scale sigma > 0.

  • parameter_names (Sequence[str] | None) – Optional names for the K parameters. Ignored when theta is a mapping (the mapping keys are used instead). Defaults to ["theta_0", ..., "theta_{K-1}"].

  • initial_distribution (np.ndarray | jnp.ndarray | None) – Optional (S,) initial-state distribution. Defaults to uniform.

  • seed (int | None) – Random seed for the environment RNG used by reset/step.

Example

>>> import numpy as np
>>> S, A, K = 5, 2, 2
>>> T = np.zeros((A, S, S)); T[:, np.arange(S), np.arange(S)] = 1.0
>>> phi = np.random.default_rng(0).normal(size=(S, A, K))
>>> env = ArrayMDP(T, phi, theta={"cost": -1.0, "value": 0.5})
>>> from econirl.simulation.synthetic import simulate_panel
>>> panel = simulate_panel(env, n_individuals=10, n_periods=20, seed=1)
__init__(transitions, features, theta, discount_factor=0.95, scale_parameter=1.0, parameter_names=None, initial_distribution=None, seed=None)[source]

Initialize the environment.

Parameters:
Return type:

None

property num_states: int

Number of discrete states in the environment.

property num_actions: int

Number of discrete actions available.

property num_features: int
property transition_matrices: Array

Return transition probability matrices.

Returns:

Tensor of shape (num_actions, num_states, num_states) where result[a, s, s’] = P(s’ | s, a)

property feature_matrix: Array

Return feature matrix for utility computation.

Features are the observable characteristics that enter the utility function: U(s,a;θ) = θ · φ(s,a)

Returns:

Tensor of shape (num_states, num_actions, num_features)

property true_parameters: dict[str, float]

Return the true utility parameters (for simulation studies).

Returns:

Dictionary mapping parameter names to values

property parameter_names: list[str]

Return names of utility parameters in order.

property true_reward_matrix: Array

Ground-truth flow reward R(s, a) = theta . phi(s, a), shape (S, A).

compute_utility_matrix(parameters=None)

Compute the full utility matrix for all state-action pairs.

Parameters:

parameters (Array | None) – Optional parameter vector. If None, uses true_parameters.

Returns:

Tensor of shape (num_states, num_actions) with flow utilities

Return type:

Array

property current_state: int | None

Return the current state index.

encode_states(states)

Encode flat state indices to continuous features.

Default: normalized scalar s/(S-1) with shape (batch, 1). Override for multi-dimensional environments.

Parameters:

states (Array)

Return type:

Array

generate_panel(n_individuals=1000, n_periods=100, seed=42, as_dataframe=False)

Generate synthetic panel data from this environment.

Computes the optimal policy from the true parameters and simulates trajectories for multiple individuals.

Parameters:
  • n_individuals (int) – Number of individuals to simulate.

  • n_periods (int) – Number of time periods per individual.

  • seed (int) – Random seed for reproducibility.

  • as_dataframe (bool) – If True, return a DataFrame with human-readable columns via _state_to_record().

Returns:

Panel object, or DataFrame if as_dataframe=True.

Return type:

TrajectoryPanel | DataFrame

get_true_parameter_vector()

Return true parameters as a tensor in canonical order.

Return type:

Array

classmethod info()[source]

Return metadata about this environment.

Subclasses should override this to provide name, description, source, n_states, n_actions, parameter details, etc.

Return type:

dict[str, Any]

property problem_spec: DDCProblem

Return the DDCProblem specification for this environment.

property state_dim: int

Dimensionality of the continuous state representation.