econirl.MCEIRL
- class econirl.MCEIRL(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]
Bases:
objectSklearn-style Maximum Causal Entropy IRL estimator.
Maximum Causal Entropy IRL (Ziebart 2010) recovers reward function parameters from demonstrated behavior, properly accounting for the causal structure of sequential decisions.
- Parameters:
n_states (int, default=90) – Number of discrete states.
n_actions (int, default=2) – Number of discrete actions.
discount (float, default=0.99) – Time discount factor (beta). Use <0.999 for numerical stability.
feature_matrix (numpy.ndarray, optional) – Feature matrix. State-only features have shape
(n_states, n_features). Action-dependent features have shape(n_states, n_actions, n_features). For multi-action models,fitraises if neitherfeature_matrixnorrewardis supplied; the old implicit state-index fallback is not a validated structural specification.feature_names (list[str], optional) – Names for each feature.
se_method (str, default="bootstrap") – Method for standard errors: “bootstrap”, “asymptotic”, or “hessian”.
n_bootstrap (int, default=100) – Number of bootstrap samples for SE computation.
verbose (bool, default=False) – Print progress messages.
inner_max_iter (int)
- Variables:
params (dict) – Estimated reward parameters {name: value}.
se (dict) – Standard errors for each parameter.
coef (numpy.ndarray) – Coefficients as array.
reward (numpy.ndarray) – Recovered reward R(s) for each state.
policy (numpy.ndarray) – Learned policy π(a|s), shape (n_states, n_actions).
value_function (numpy.ndarray) – Value function V(s) for each state.
state_visitation (numpy.ndarray) – Expected state visitation frequencies.
log_likelihood (float) – Log-likelihood of the data under learned model.
converged (bool) – Whether optimization converged.
Examples
>>> from econirl.estimators import MCEIRL >>> from econirl.datasets import load_rust_bus >>> >>> df = load_rust_bus() >>> >>> # State features: linear and quadratic mileage cost >>> n_states = 90 >>> s = np.arange(n_states) >>> features = np.column_stack([s / 100, (s / 100) ** 2]) >>> >>> model = MCEIRL( ... n_states=n_states, ... discount=0.99, ... feature_matrix=features, ... feature_names=["linear", "quadratic"], ... verbose=True, ... ) >>> model.fit(df, state="mileage_bin", action="replaced", id="bus_id") >>> print(model.summary())
References
- Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the
principle of maximum causal entropy. PhD thesis, CMU.
- __init__(n_states=90, n_actions=2, discount=0.99, feature_matrix=None, feature_names=None, se_method='bootstrap', n_bootstrap=100, inner_max_iter=10000, verbose=False)[source]
- fit(data, state=None, action=None, id=None, transitions=None, reward=None)[source]
Fit the MCE IRL estimator.
- Parameters:
data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed,
state,action, andidcolumn names are required. When a Panel/TrajectoryPanel is passed, column names are ignored.state (str, optional) – Column name for state variable (required for DataFrame input).
action (str, optional) – Column name for action variable (required for DataFrame input).
id (str, optional) – Column name for individual/trajectory identifier (required for DataFrame input).
transitions (numpy.ndarray, optional) – Pre-estimated transition matrix (n_states, n_states). If None, estimated from data.
reward (RewardSpec, optional) – Reward/utility specification. If provided, overrides the
feature_matrixandfeature_namesparameters passed at construction time.
- Returns:
self – Fitted estimator.
- Return type:
- property reward_matrix_: ndarray | None
Structural reward matrix R(s,a) of shape (n_states, n_actions).
Computes the reward matrix from the fitted parameters and the reward function. Returns None if the model has not been fitted.
- predict_proba(states)[source]
Predict choice probabilities.
- Parameters:
states (numpy.ndarray) – Array of state indices.
- Returns:
proba – Choice probabilities, shape (len(states), n_actions).
- Return type:
- conf_int(alpha=0.05)[source]
Compute confidence intervals for parameters.
- Parameters:
alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
- Returns:
{param_name: (lower, upper)}confidence intervals.- Return type:
- Raises:
RuntimeError – If the model has not been fitted yet.