econirl.MCEIRL

class econirl.MCEIRL(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]

Bases: object

Sklearn-style Maximum Causal Entropy IRL estimator.

Maximum Causal Entropy IRL (Ziebart 2010) recovers reward function parameters from demonstrated behavior, properly accounting for the causal structure of sequential decisions.

Parameters:
  • n_states (int, default=90) – Number of discrete states.

  • n_actions (int, default=2) – Number of discrete actions.

  • discount (float, default=0.99) – Time discount factor (beta). Use <0.999 for numerical stability.

  • feature_matrix (numpy.ndarray, optional) – Feature matrix. State-only features have shape (n_states, n_features). Action-dependent features have shape (n_states, n_actions, n_features). For multi-action models, fit raises if neither feature_matrix nor reward is supplied; the old implicit state-index fallback is not a validated structural specification.

  • feature_names (list[str], optional) – Names for each feature.

  • se_method (str, default="bootstrap") – Method for standard errors: “bootstrap”, “asymptotic”, or “hessian”.

  • n_bootstrap (int, default=100) – Number of bootstrap samples for SE computation.

  • verbose (bool, default=False) – Print progress messages.

  • inner_max_iter (int)

Variables:
  • params (dict) – Estimated reward parameters {name: value}.

  • se (dict) – Standard errors for each parameter.

  • coef (numpy.ndarray) – Coefficients as array.

  • reward (numpy.ndarray) – Recovered reward R(s) for each state.

  • policy (numpy.ndarray) – Learned policy π(a|s), shape (n_states, n_actions).

  • value_function (numpy.ndarray) – Value function V(s) for each state.

  • state_visitation (numpy.ndarray) – Expected state visitation frequencies.

  • log_likelihood (float) – Log-likelihood of the data under learned model.

  • converged (bool) – Whether optimization converged.

Examples

>>> from econirl.estimators import MCEIRL
>>> from econirl.datasets import load_rust_bus
>>>
>>> df = load_rust_bus()
>>>
>>> # State features: linear and quadratic mileage cost
>>> n_states = 90
>>> s = np.arange(n_states)
>>> features = np.column_stack([s / 100, (s / 100) ** 2])
>>>
>>> model = MCEIRL(
...     n_states=n_states,
...     discount=0.99,
...     feature_matrix=features,
...     feature_names=["linear", "quadratic"],
...     verbose=True,
... )
>>> model.fit(df, state="mileage_bin", action="replaced", id="bus_id")
>>> print(model.summary())

References

Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the

principle of maximum causal entropy. PhD thesis, CMU.

__init__(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]
Parameters:
  • n_states (int)

  • n_actions (int)

  • discount (float)

  • feature_matrix (ndarray | None)

  • feature_names (list[str] | None)

  • se_method (Literal['bootstrap', 'asymptotic', 'hessian'])

  • n_bootstrap (int)

  • inner_max_iter (int)

  • verbose (bool)

fit(data, state=None, action=None, id=None, transitions=None, reward=None)[source]

Fit the MCE IRL estimator.

Parameters:
  • data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed, state, action, and id column names are required. When a Panel/TrajectoryPanel is passed, column names are ignored.

  • state (str, optional) – Column name for state variable (required for DataFrame input).

  • action (str, optional) – Column name for action variable (required for DataFrame input).

  • id (str, optional) – Column name for individual/trajectory identifier (required for DataFrame input).

  • transitions (numpy.ndarray, optional) – Pre-estimated transition matrix (n_states, n_states). If None, estimated from data.

  • reward (RewardSpec, optional) – Reward/utility specification. If provided, overrides the feature_matrix and feature_names parameters passed at construction time.

Returns:

self – Fitted estimator.

Return type:

MCEIRL

property reward_matrix_: ndarray | None

Structural reward matrix R(s,a) of shape (n_states, n_actions).

Computes the reward matrix from the fitted parameters and the reward function. Returns None if the model has not been fitted.

predict_proba(states)[source]

Predict choice probabilities.

Parameters:

states (numpy.ndarray) – Array of state indices.

Returns:

proba – Choice probabilities, shape (len(states), n_actions).

Return type:

numpy.ndarray

conf_int(alpha=0.05)[source]

Compute confidence intervals for parameters.

Parameters:

alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.

Returns:

{param_name: (lower, upper)} confidence intervals.

Return type:

dict

Raises:

RuntimeError – If the model has not been fitted yet.

summary()[source]

Generate formatted summary of results.

Return type:

str