econirl.MaxEntIRL

class econirl.MaxEntIRL(n_states=90, n_actions=2, discount=0.99, se_method='asymptotic', verbose=False, feature_matrix=None, feature_names=None)[source]

Bases: object

Sklearn-style MaxEnt IRL estimator for inverse reinforcement learning.

Maximum Entropy Inverse Reinforcement Learning (Ziebart et al., 2008) recovers reward function parameters from demonstrated behavior by maximizing the entropy of the policy while matching feature expectations.

Unlike NFXP/CCP which estimate utility parameters in a known model, MaxEnt IRL recovers the reward function that explains observed behavior, assuming the demonstrator is approximately optimal.

Parameters:

n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (beta).
se_method (str, default="asymptotic") – Method for computing standard errors. Options: “robust”, “asymptotic”.
verbose (bool, default=False) – Whether to print progress messages during estimation.
feature_matrix (numpy.ndarray, optional) – State feature matrix of shape (n_states, n_features). If None, uses one-hot state encoding (n_features = n_states).
feature_names (list[str], optional) – Names for each feature. If None, uses “f0”, “f1”, etc.

Variables:

params (dict) – Estimated reward parameters after fitting. Keys are feature names and values are point estimates (weights on features).
se (dict) – Standard errors for each parameter.
coef (numpy.ndarray) – Coefficients as a numpy array (sklearn convention).
reward (numpy.ndarray) – Recovered reward R(s) for each state, shape (n_states,).
log_likelihood (float) – Maximized log-likelihood value.
value_function (numpy.ndarray) – Estimated value function V(s) for each state.
transitions (numpy.ndarray) – Transition probability matrix (n_states x n_states).
converged (bool) – Whether the optimization converged.

Examples

>>> from econirl.estimators import MaxEntIRL
>>> import pandas as pd
>>> import numpy as np
>>>
>>> # Create state features (e.g., distance to goal, obstacles)
>>> n_states = 100
>>> features = np.column_stack([
...     np.arange(n_states) / n_states,  # normalized state index
...     (np.arange(n_states) > 50).astype(float),  # high state indicator
... ])
>>>
>>> model = MaxEntIRL(n_states=100, feature_matrix=features,
...                   feature_names=["distance", "high_state"])
>>> model.fit(df, state="state", action="action", id="agent_id")
>>> print(model.params_)

References

Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008).: “Maximum entropy inverse reinforcement learning.” AAAI Conference on Artificial Intelligence.

__init__(n_states=90, n_actions=2, discount=0.99, se_method='asymptotic', verbose=False, feature_matrix=None, feature_names=None)[source]

Initialize the MaxEnt IRL estimator.

Parameters:

n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (beta).
se_method (str, default="asymptotic") – Method for computing standard errors.
verbose (bool, default=False) – Whether to print progress messages.
feature_matrix (numpy.ndarray, optional) – State feature matrix of shape (n_states, n_features).
feature_names (list[str], optional) – Names for each feature.

fit(data, state, action, id, transitions=None)[source]

Fit the MaxEnt IRL estimator to demonstration data.

Parameters:

data (pandas.DataFrame) – Panel data with expert demonstrations. Must contain columns for state, action, and individual id.
state (str) – Column name for the state variable.
action (str) – Column name for the action variable.
id (str) – Column name for the individual/trajectory identifier.
transitions (numpy.ndarray, optional) – Pre-estimated transition matrix of shape (n_states, n_states). If None, transitions are estimated from the data.

Returns:

self – Returns self for method chaining.

Return type:

MaxEntIRL

property value_: ndarray | None: Value function V(s) of shape (n_states,).

property reward_matrix_: ndarray | None

Reward matrix R(s,a) of shape (n_states, n_actions).

MaxEntIRL learns a state-only reward R(s). This property broadcasts it to all actions so that the shape matches the protocol requirement.

conf_int(alpha=0.05)[source]

Compute confidence intervals for parameters.

Parameters:: alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
Returns:: {param_name: (lower, upper)} confidence intervals.
Return type:: dict
Raises:: RuntimeError – If the model has not been fitted yet.

summary()[source]

Generate a formatted summary of estimation results.

Returns:: Human-readable summary of the estimation.
Return type:: str

predict_proba(states)[source]

Predict choice probabilities for given states.

Parameters:: states (numpy.ndarray) – Array of state indices.
Returns:: Choice probabilities of shape (len(states), n_actions). Each row sums to 1.
Return type:: numpy.ndarray