econirl.MCEIRLNeural
- class econirl.MCEIRLNeural(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]
Bases:
NeuralEstimatorMixinNeural Maximum Causal Entropy IRL.
Learns a neural reward function using the MCE-IRL objective: maximize E_expert[R] - log Z(R)
where Z(R) is the partition function (soft value at initial state).
Supports two reward types:
reward_type="state_action"(default): R(s,a) via a network that takes [state_features, action_onehot]. This is more general and correctly handles action-dependent rewards.reward_type="state": R(s) broadcast to all actions (original).
For v1, transitions must be available so that exact soft value iteration and state visitation frequencies can be computed.
- Parameters:
n_states (int, optional) – Number of discrete states. Inferred from data if None.
n_actions (int, optional) – Number of discrete actions. Inferred from data if None.
discount (float, default=0.95) – Time discount factor beta.
reward_type (str, default="state_action") – Type of reward function:
"state_action"for R(s,a) or"state"for R(s) broadcast to all actions.reward_hidden_dim (int, default=64) – Hidden dimension for the reward MLP.
reward_num_layers (int, default=2) – Number of hidden layers in the reward MLP.
max_epochs (int, default=200) – Maximum number of training epochs.
lr (float, default=1e-3) – Learning rate for Adam optimizer.
inner_solver (str, default="hybrid") – Solver for soft value iteration: “hybrid” or “value”.
inner_tol (float, default=1e-8) – Convergence tolerance for inner solver.
inner_max_iter (int, default=5000) – Maximum iterations for inner solver.
state_encoder (callable, optional) – Function mapping state indices (int array) to feature vectors. Receives shape (B,) and should return shape (B, state_dim). If None, a default normalizing encoder is created.
state_dim (int, optional) – Dimension of state features. Required if state_encoder is provided.
feature_names (list of str, optional) – Names for features when projecting rewards onto linear features.
anchor_action (int, optional) – Action whose reward is fixed to zero. This is useful for identified action-dependent IRL designs with a normalized outside/exit action.
absorbing_state (int, optional) – State whose reward row is fixed to zero.
seed (int, default=0) – Random seed for network initialization.
verbose (bool, default=False) – Whether to print progress during training.
- Variables:
params (dict or None) – Projected structural parameters after fitting. None if no features were provided for projection.
se (dict or None) – Pseudo standard errors from the projection regression.
pvalues (dict or None) – P-values from Wald t-test on pseudo SEs.
coef (numpy.ndarray or None) – Coefficient array (same values as
params_in array form).policy (numpy.ndarray or None) – Estimated choice probabilities P(a|s) of shape (n_states, n_actions).
value (numpy.ndarray or None) – Estimated value function V(s) of shape (n_states,).
reward (numpy.ndarray or None) – Neural reward. Shape (n_states,) for
reward_type="state"or (n_states, n_actions) forreward_type="state_action".projection_r2 (float or None) – R-squared of the feature projection.
converged (bool or None) – Whether training converged.
n_epochs (int or None) – Number of training epochs completed.
Examples
>>> from econirl.estimators import MCEIRLNeural >>> import numpy as np >>> >>> # R(s,a) -- default, more general >>> model = MCEIRLNeural(n_states=25, n_actions=4, discount=0.95) >>> model.fit(data=df, state="state", action="action", id="agent_id", ... transitions=T) >>> print(model.reward_.shape) # (25, 4) >>> print(model.policy_.shape) # (25, 4) >>> >>> # R(s) -- state-only, backward compatible >>> model = MCEIRLNeural(n_states=25, n_actions=4, reward_type="state") >>> model.fit(...) >>> print(model.reward_.shape) # (25,)
- __init__(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]
- Parameters:
n_states (int | None)
n_actions (int | None)
discount (float)
reward_type (str)
reward_hidden_dim (int)
reward_num_layers (int)
max_epochs (int)
lr (float)
inner_solver (str)
inner_tol (float)
inner_max_iter (int)
state_encoder (Callable | None)
state_dim (int | None)
anchor_action (int | None)
absorbing_state (int | None)
seed (int)
verbose (bool)
- fit(data, state=None, action=None, id=None, features=None, transitions=None, context=None)[source]
Fit the MCEIRLNeural estimator to data.
- Parameters:
data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed,
state,action, andidcolumn names are required.state (str, optional) – Column name for the state variable (required for DataFrame).
action (str, optional) – Column name for the action variable (required for DataFrame).
id (str, optional) – Column name for the individual identifier (required for DataFrame).
features (RewardSpec or numpy.ndarray, optional) – Feature specification for parameter projection. If provided, the neural reward is projected onto these features to extract interpretable theta.
transitions (numpy.ndarray) – Transition matrices
P(s'|s,a), shape (n_actions, n_states, n_states). Required for v1 (exact soft value iteration).context (ignored) – Accepted for API compatibility but not used.
- Returns:
self – Returns self for method chaining.
- Return type:
- property reward_matrix_: ndarray | None
Reward matrix R(s,a) of shape (n_states, n_actions).
For
reward_type="state_action",self.reward_already has shape (n_states, n_actions) and is returned directly. Forreward_type="state", the state-only reward is broadcast to all actions.
- predict_proba(states)[source]
Predict choice probabilities for given states.
- Parameters:
states (numpy.ndarray) – Array of state indices.
- Returns:
Choice probabilities of shape (len(states), n_actions).
- Return type:
- Raises:
RuntimeError – If the model has not been fitted yet.
- conf_int(alpha=0.05)[source]
Compute confidence intervals for projected parameters.
- Parameters:
alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
- Returns:
{param_name: (lower, upper)}confidence intervals.- Return type:
- Raises:
RuntimeError – If no projected parameters are available.