econirl.MaxMarginIRL
- class econirl.MaxMarginIRL(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]
Bases:
objectSklearn-style Max Margin IRL estimator (Abbeel & Ng 2004).
Maximum Margin Inverse Reinforcement Learning finds reward weights that make the expert policy better than any other policy by a margin. It uses an iterative constraint generation approach to solve a quadratic program.
This is useful when you have expert demonstrations and want to recover the underlying reward function that explains the behavior.
- Parameters:
n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (gamma).
n_features (int, default=None) – Number of reward features. If None, uses n_states (one-hot encoding).
features (numpy.ndarray, optional) – State feature matrix of shape (n_states, n_features). If None, uses identity matrix (one-hot state features).
feature_names (list[str], optional) – Names for each feature. If None, uses “feature_0”, “feature_1”, etc.
max_iterations (int, default=50) – Maximum constraint generation iterations.
margin_tol (float, default=1e-4) – Convergence tolerance on margin improvement.
se_method (str, default="asymptotic") – Method for computing standard errors.
verbose (bool, default=False) – Whether to print progress messages during estimation.
- Variables:
params (dict) – Estimated reward weights after fitting. Keys are feature names and values are point estimates.
se (dict) – Standard errors for each parameter.
coef (numpy.ndarray) – Coefficients as a numpy array (sklearn convention).
reward (numpy.ndarray) – Recovered reward R(s) for each state, shape (n_states,).
margin (float) – Achieved margin between expert and best alternative policy.
value_function (numpy.ndarray) – Estimated value function V(s) for each state.
transitions (numpy.ndarray) – Transition probability matrix (n_states x n_states).
converged (bool) – Whether the optimization converged.
Examples
>>> from econirl.estimators import MaxMarginIRL >>> import pandas as pd >>> >>> df = pd.DataFrame({ ... "agent_id": [0, 0, 1, 1], ... "state": [10, 20, 15, 30], ... "action": [0, 0, 0, 1], ... }) >>> >>> model = MaxMarginIRL(n_states=90, n_actions=2) >>> model.fit(df, state="state", action="action", id="agent_id") >>> print(model.reward_) # Recovered reward for each state
References
- Abbeel, P., & Ng, A. Y. (2004). “Apprenticeship learning via inverse
reinforcement learning.” In Proceedings of ICML.
- __init__(n_states=90, n_actions=2, discount=0.99, n_features=None, features=None, feature_names=None, max_iterations=50, margin_tol=0.0001, se_method='asymptotic', verbose=False)[source]
Initialize the MaxMarginIRL estimator.
- Parameters:
n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor.
n_features (int, optional) – Number of reward features. If None, defaults to n_states.
features (numpy.ndarray, optional) – State feature matrix (n_states, n_features). If None, uses identity.
feature_names (list[str], optional) – Names for each feature.
max_iterations (int, default=50) – Maximum constraint generation iterations.
margin_tol (float, default=1e-4) – Convergence tolerance on margin.
se_method (str, default="asymptotic") – Method for computing standard errors.
verbose (bool, default=False) – Whether to print progress messages.
- fit(data, state, action, id, transitions=None)[source]
Fit the MaxMarginIRL estimator to expert demonstration data.
- Parameters:
data (pandas.DataFrame) – Panel data with expert demonstrations. Must contain columns for state, action, and individual id.
state (str) – Column name for the state variable.
action (str) – Column name for the action variable.
id (str) – Column name for the individual/agent identifier.
transitions (numpy.ndarray, optional) – Pre-estimated transition matrix of shape (n_states, n_states). If None, transitions are estimated from the data.
- Returns:
self – Returns self for method chaining.
- Return type:
- property reward_matrix_: ndarray | None
Reward matrix R(s,a) of shape (n_states, n_actions).
MaxMarginIRL learns a state-only reward R(s). This property broadcasts it to all actions so that the shape matches the protocol requirement.
- conf_int(alpha=0.05)[source]
Compute confidence intervals for parameters.
- Parameters:
alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
- Returns:
{param_name: (lower, upper)}confidence intervals.- Return type:
- Raises:
RuntimeError – If the model has not been fitted yet.
- summary()[source]
Generate a formatted summary of estimation results.
- Returns:
Human-readable summary of the estimation.
- Return type:
- predict_proba(states)[source]
Predict choice probabilities for given states.
- Parameters:
states (numpy.ndarray) – Array of state indices.
- Returns:
Choice probabilities of shape (len(states), n_actions). Each row sums to 1.
- Return type: