econirl.MaxMarginIRL

class econirl.MaxMarginIRL(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]

Bases: object

Sklearn-style Max Margin IRL estimator (Abbeel & Ng 2004).

Maximum Margin Inverse Reinforcement Learning finds reward weights that make the expert policy better than any other policy by a margin. It uses an iterative constraint generation approach to solve a quadratic program.

This is useful when you have expert demonstrations and want to recover the underlying reward function that explains the behavior.

Parameters:

n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (gamma).
n_features (int, default=None) – Number of reward features. If None, uses n_states (one-hot encoding).
features (numpy.ndarray, optional) – State feature matrix of shape (n_states, n_features). If None, uses identity matrix (one-hot state features).
feature_names (list[str], optional) – Names for each feature. If None, uses “feature_0”, “feature_1”, etc.
max_iterations (int, default=50) – Maximum constraint generation iterations.
margin_tol (float, default=1e-4) – Convergence tolerance on margin improvement.
se_method (str, default="asymptotic") – Method for computing standard errors.
verbose (bool, default=False) – Whether to print progress messages during estimation.

Variables:

params (dict) – Estimated reward weights after fitting. Keys are feature names and values are point estimates.
se (dict) – Standard errors for each parameter.
coef (numpy.ndarray) – Coefficients as a numpy array (sklearn convention).
reward (numpy.ndarray) – Recovered reward R(s) for each state, shape (n_states,).
margin (float) – Achieved margin between expert and best alternative policy.
value_function (numpy.ndarray) – Estimated value function V(s) for each state.
transitions (numpy.ndarray) – Transition probability matrix (n_states x n_states).
converged (bool) – Whether the optimization converged.

Examples

>>> from econirl.estimators import MaxMarginIRL
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({
...     "agent_id": [0, 0, 1, 1],
...     "state": [10, 20, 15, 30],
...     "action": [0, 0, 0, 1],
... })
>>>
>>> model = MaxMarginIRL(n_states=90, n_actions=2)
>>> model.fit(df, state="state", action="action", id="agent_id")
>>> print(model.reward_)  # Recovered reward for each state

References

Abbeel, P., & Ng, A. Y. (2004). “Apprenticeship learning via inverse: reinforcement learning.” In Proceedings of ICML.

__init__(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]

Initialize the MaxMarginIRL estimator.

Parameters:

n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor.
n_features (int, optional) – Number of reward features. If None, defaults to n_states.
features (numpy.ndarray, optional) – State feature matrix (n_states, n_features). If None, uses identity.
feature_names (list[str], optional) – Names for each feature.
max_iterations (int, default=50) – Maximum constraint generation iterations.
margin_tol (float, default=1e-4) – Convergence tolerance on margin.
se_method (str, default="asymptotic") – Method for computing standard errors.
verbose (bool, default=False) – Whether to print progress messages.

fit(data, state, action, id, transitions=None)[source]

Fit the MaxMarginIRL estimator to expert demonstration data.

Parameters:

data (pandas.DataFrame) – Panel data with expert demonstrations. Must contain columns for state, action, and individual id.
state (str) – Column name for the state variable.
action (str) – Column name for the action variable.
id (str) – Column name for the individual/agent identifier.
transitions (numpy.ndarray, optional) – Pre-estimated transition matrix of shape (n_states, n_states). If None, transitions are estimated from the data.

Returns:

self – Returns self for method chaining.

Return type:

MaxMarginIRL

property value_: ndarray | None: Value function V(s) of shape (n_states,).

property reward_matrix_: ndarray | None

Reward matrix R(s,a) of shape (n_states, n_actions).

MaxMarginIRL learns a state-only reward R(s). This property broadcasts it to all actions so that the shape matches the protocol requirement.

conf_int(alpha=0.05)[source]

Compute confidence intervals for parameters.

Parameters:: alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
Returns:: {param_name: (lower, upper)} confidence intervals.
Return type:: dict
Raises:: RuntimeError – If the model has not been fitted yet.

summary()[source]

Generate a formatted summary of estimation results.

Returns:: Human-readable summary of the estimation.
Return type:: str

predict_proba(states)[source]

Predict choice probabilities for given states.

Parameters:: states (numpy.ndarray) – Array of state indices.
Returns:: Choice probabilities of shape (len(states), n_actions). Each row sums to 1.
Return type:: numpy.ndarray