econirl.MCEIRL

class econirl.MCEIRL(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]

Bases: object

Sklearn-style Maximum Causal Entropy IRL estimator.

Maximum Causal Entropy IRL (Ziebart 2010) recovers reward function parameters from demonstrated behavior, properly accounting for the causal structure of sequential decisions.

Parameters:

n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (beta). Use <0.999 for numerical stability.
feature_matrix (numpy.ndarray, optional) – Feature matrix. State-only features have shape (n_states, n_features). Action-dependent features have shape (n_states, n_actions, n_features). For multi-action models, fit raises if neither feature_matrix nor reward is supplied; the old implicit state-index fallback is not a validated structural specification.
feature_names (list[str], optional) – Names for each feature.
se_method (str, default="bootstrap") – Method for standard errors: “bootstrap”, “asymptotic”, or “hessian”.
n_bootstrap (int, default=100) – Number of bootstrap samples for SE computation.
verbose (bool, default=False) – Print progress messages.
inner_max_iter (int)

Variables:

params (dict) – Estimated reward parameters {name: value}.
se (dict) – Standard errors for each parameter.
coef (numpy.ndarray) – Coefficients as array.
reward (numpy.ndarray) – Recovered reward R(s) for each state.
policy (numpy.ndarray) – Learned policy π(a|s), shape (n_states, n_actions).
value_function (numpy.ndarray) – Value function V(s) for each state.
state_visitation (numpy.ndarray) – Expected state visitation frequencies.
log_likelihood (float) – Log-likelihood of the data under learned model.
converged (bool) – Whether optimization converged.

Examples

>>> from econirl.estimators import MCEIRL
>>> from econirl.datasets import load_rust_bus
>>>
>>> df = load_rust_bus()
>>>
>>> # State features: linear and quadratic mileage cost
>>> n_states = 90
>>> s = np.arange(n_states)
>>> features = np.column_stack([s / 100, (s / 100) ** 2])
>>>
>>> model = MCEIRL(
...     n_states=n_states,
...     discount=0.99,
...     feature_matrix=features,
...     feature_names=["linear", "quadratic"],
...     verbose=True,
... )
>>> model.fit(df, state="mileage_bin", action="replaced", id="bus_id")
>>> print(model.summary())

References

Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the: principle of maximum causal entropy. PhD thesis, CMU.

__init__(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]

Parameters:

n_states (int)
n_actions (int)
discount (float)
feature_matrix (ndarray | None)
feature_names (list[str] | None)
se_method (Literal['bootstrap', 'asymptotic', 'hessian'])
n_bootstrap (int)
inner_max_iter (int)
verbose (bool)

fit(data, state=None, action=None, id=None, transitions=None, reward=None)[source]

Fit the MCE IRL estimator.

Parameters:

data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed, state, action, and id column names are required. When a Panel/TrajectoryPanel is passed, column names are ignored.
state (str, optional) – Column name for state variable (required for DataFrame input).
action (str, optional) – Column name for action variable (required for DataFrame input).
id (str, optional) – Column name for individual/trajectory identifier (required for DataFrame input).
transitions (numpy.ndarray, optional) – Pre-estimated transition matrix (n_states, n_states). If None, estimated from data.
reward (RewardSpec, optional) – Reward/utility specification. If provided, overrides the feature_matrix and feature_names parameters passed at construction time.

Returns:

self – Fitted estimator.

Return type:

MCEIRL

property reward_matrix_: ndarray | None

Structural reward matrix R(s,a) of shape (n_states, n_actions).

Computes the reward matrix from the fitted parameters and the reward function. Returns None if the model has not been fitted.

predict_proba(states)[source]

Predict choice probabilities.

Parameters:: states (numpy.ndarray) – Array of state indices.
Returns:: proba – Choice probabilities, shape (len(states), n_actions).
Return type:: numpy.ndarray

conf_int(alpha=0.05)[source]

Compute confidence intervals for parameters.

Parameters:: alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
Returns:: {param_name: (lower, upper)} confidence intervals.
Return type:: dict
Raises:: RuntimeError – If the model has not been fitted yet.

summary()[source]

Generate formatted summary of results.

Return type:: str