econirl.MaxMarginIRL

class econirl.MaxMarginIRL(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]

Bases: object

Sklearn-style Max Margin IRL estimator (Abbeel & Ng 2004).

Maximum Margin Inverse Reinforcement Learning finds reward weights that make the expert policy better than any other policy by a margin. It uses an iterative constraint generation approach to solve a quadratic program.

This is useful when you have expert demonstrations and want to recover the underlying reward function that explains the behavior.

Parameters:
  • n_states (int, default=90) – Number of discrete states.

  • n_actions (int, default=2) – Number of discrete actions.

  • discount (float, default=0.99) – Time discount factor (gamma).

  • n_features (int, default=None) – Number of reward features. If None, uses n_states (one-hot encoding).

  • features (numpy.ndarray, optional) – State feature matrix of shape (n_states, n_features). If None, uses identity matrix (one-hot state features).

  • feature_names (list[str], optional) – Names for each feature. If None, uses “feature_0”, “feature_1”, etc.

  • max_iterations (int, default=50) – Maximum constraint generation iterations.

  • margin_tol (float, default=1e-4) – Convergence tolerance on margin improvement.

  • se_method (str, default="asymptotic") – Method for computing standard errors.

  • verbose (bool, default=False) – Whether to print progress messages during estimation.

Variables:
  • params (dict) – Estimated reward weights after fitting. Keys are feature names and values are point estimates.

  • se (dict) – Standard errors for each parameter.

  • coef (numpy.ndarray) – Coefficients as a numpy array (sklearn convention).

  • reward (numpy.ndarray) – Recovered reward R(s) for each state, shape (n_states,).

  • margin (float) – Achieved margin between expert and best alternative policy.

  • value_function (numpy.ndarray) – Estimated value function V(s) for each state.

  • transitions (numpy.ndarray) – Transition probability matrix (n_states x n_states).

  • converged (bool) – Whether the optimization converged.

Examples

>>> from econirl.estimators import MaxMarginIRL
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({
...     "agent_id": [0, 0, 1, 1],
...     "state": [10, 20, 15, 30],
...     "action": [0, 0, 0, 1],
... })
>>>
>>> model = MaxMarginIRL(n_states=90, n_actions=2)
>>> model.fit(df, state="state", action="action", id="agent_id")
>>> print(model.reward_)  # Recovered reward for each state

References

Abbeel, P., & Ng, A. Y. (2004). “Apprenticeship learning via inverse

reinforcement learning.” In Proceedings of ICML.

__init__(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]

Initialize the MaxMarginIRL estimator.

Parameters:
  • n_states (int, default=90) – Number of discrete states.

  • n_actions (int, default=2) – Number of discrete actions.

  • discount (float, default=0.99) – Time discount factor.

  • n_features (int, optional) – Number of reward features. If None, defaults to n_states.

  • features (numpy.ndarray, optional) – State feature matrix (n_states, n_features). If None, uses identity.

  • feature_names (list[str], optional) – Names for each feature.

  • max_iterations (int, default=50) – Maximum constraint generation iterations.

  • margin_tol (float, default=1e-4) – Convergence tolerance on margin.

  • se_method (str, default="asymptotic") – Method for computing standard errors.

  • verbose (bool, default=False) – Whether to print progress messages.

fit(data, state, action, id, transitions=None)[source]

Fit the MaxMarginIRL estimator to expert demonstration data.

Parameters:
  • data (pandas.DataFrame) – Panel data with expert demonstrations. Must contain columns for state, action, and individual id.

  • state (str) – Column name for the state variable.

  • action (str) – Column name for the action variable.

  • id (str) – Column name for the individual/agent identifier.

  • transitions (numpy.ndarray, optional) – Pre-estimated transition matrix of shape (n_states, n_states). If None, transitions are estimated from the data.

Returns:

self – Returns self for method chaining.

Return type:

MaxMarginIRL

property value_: ndarray | None

Value function V(s) of shape (n_states,).

property reward_matrix_: ndarray | None

Reward matrix R(s,a) of shape (n_states, n_actions).

MaxMarginIRL learns a state-only reward R(s). This property broadcasts it to all actions so that the shape matches the protocol requirement.

conf_int(alpha=0.05)[source]

Compute confidence intervals for parameters.

Parameters:

alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.

Returns:

{param_name: (lower, upper)} confidence intervals.

Return type:

dict

Raises:

RuntimeError – If the model has not been fitted yet.

summary()[source]

Generate a formatted summary of estimation results.

Returns:

Human-readable summary of the estimation.

Return type:

str

predict_proba(states)[source]

Predict choice probabilities for given states.

Parameters:

states (numpy.ndarray) – Array of state indices.

Returns:

Choice probabilities of shape (len(states), n_actions). Each row sums to 1.

Return type:

numpy.ndarray