econirl.MCEIRLNeural

class econirl.MCEIRLNeural(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]

Bases: NeuralEstimatorMixin

Neural Maximum Causal Entropy IRL.

Learns a neural reward function using the MCE-IRL objective: maximize E_expert[R] - log Z(R)

where Z(R) is the partition function (soft value at initial state).

Supports two reward types:

reward_type="state_action" (default): R(s,a) via a network that takes [state_features, action_onehot]. This is more general and correctly handles action-dependent rewards.
reward_type="state": R(s) broadcast to all actions (original).

For v1, transitions must be available so that exact soft value iteration and state visitation frequencies can be computed.

Parameters:

n_states (int, optional) – Number of discrete states. Inferred from data if None.
n_actions (int, optional) – Number of discrete actions. Inferred from data if None.
discount (float, default=0.95) – Time discount factor beta.
reward_type (str, default="state_action") – Type of reward function: "state_action" for R(s,a) or "state" for R(s) broadcast to all actions.
reward_hidden_dim (int, default=64) – Hidden dimension for the reward MLP.
reward_num_layers (int, default=2) – Number of hidden layers in the reward MLP.
max_epochs (int, default=200) – Maximum number of training epochs.
lr (float, default=1e-3) – Learning rate for Adam optimizer.
inner_solver (str, default="hybrid") – Solver for soft value iteration: “hybrid” or “value”.
inner_tol (float, default=1e-8) – Convergence tolerance for inner solver.
inner_max_iter (int, default=5000) – Maximum iterations for inner solver.
state_encoder (callable, optional) – Function mapping state indices (int array) to feature vectors. Receives shape (B,) and should return shape (B, state_dim). If None, a default normalizing encoder is created.
state_dim (int, optional) – Dimension of state features. Required if state_encoder is provided.
feature_names (list of str, optional) – Names for features when projecting rewards onto linear features.
anchor_action (int, optional) – Action whose reward is fixed to zero. This is useful for identified action-dependent IRL designs with a normalized outside/exit action.
absorbing_state (int, optional) – State whose reward row is fixed to zero.
seed (int, default=0) – Random seed for network initialization.
verbose (bool, default=False) – Whether to print progress during training.

Variables:

params (dict or None) – Projected structural parameters after fitting. None if no features were provided for projection.
se (dict or None) – Pseudo standard errors from the projection regression.
pvalues (dict or None) – P-values from Wald t-test on pseudo SEs.
coef (numpy.ndarray or None) – Coefficient array (same values as params_ in array form).
policy (numpy.ndarray or None) – Estimated choice probabilities P(a|s) of shape (n_states, n_actions).
value (numpy.ndarray or None) – Estimated value function V(s) of shape (n_states,).
reward (numpy.ndarray or None) – Neural reward. Shape (n_states,) for reward_type="state" or (n_states, n_actions) for reward_type="state_action".
projection_r2 (float or None) – R-squared of the feature projection.
converged (bool or None) – Whether training converged.
n_epochs (int or None) – Number of training epochs completed.

Examples

>>> from econirl.estimators import MCEIRLNeural
>>> import numpy as np
>>>
>>> # R(s,a) -- default, more general
>>> model = MCEIRLNeural(n_states=25, n_actions=4, discount=0.95)
>>> model.fit(data=df, state="state", action="action", id="agent_id",
...           transitions=T)
>>> print(model.reward_.shape)  # (25, 4)
>>> print(model.policy_.shape)  # (25, 4)
>>>
>>> # R(s) -- state-only, backward compatible
>>> model = MCEIRLNeural(n_states=25, n_actions=4, reward_type="state")
>>> model.fit(...)
>>> print(model.reward_.shape)  # (25,)

__init__(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]

Parameters:

n_states (int | None)
n_actions (int | None)
discount (float)
reward_type (str)
reward_hidden_dim (int)
reward_num_layers (int)
max_epochs (int)
lr (float)
inner_solver (str)
inner_tol (float)
inner_max_iter (int)
state_encoder (Callable | None)
state_dim (int | None)
feature_names (list[str] | None)
anchor_action (int | None)
absorbing_state (int | None)
seed (int)
verbose (bool)

fit(data, state=None, action=None, id=None, features=None, transitions=None, context=None)[source]

Fit the MCEIRLNeural estimator to data.

Parameters:

data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed, state, action, and id column names are required.
state (str, optional) – Column name for the state variable (required for DataFrame).
action (str, optional) – Column name for the action variable (required for DataFrame).
id (str, optional) – Column name for the individual identifier (required for DataFrame).
features (RewardSpec or numpy.ndarray, optional) – Feature specification for parameter projection. If provided, the neural reward is projected onto these features to extract interpretable theta.
transitions (numpy.ndarray) – Transition matrices P(s'|s,a), shape (n_actions, n_states, n_states). Required for v1 (exact soft value iteration).
context (ignored) – Accepted for API compatibility but not used.

Returns:

self – Returns self for method chaining.

Return type:

MCEIRLNeural

property reward_matrix_: ndarray | None

Reward matrix R(s,a) of shape (n_states, n_actions).

For reward_type="state_action", self.reward_ already has shape (n_states, n_actions) and is returned directly. For reward_type="state", the state-only reward is broadcast to all actions.

predict_proba(states)[source]

Predict choice probabilities for given states.

Parameters:: states (numpy.ndarray) – Array of state indices.
Returns:: Choice probabilities of shape (len(states), n_actions).
Return type:: numpy.ndarray
Raises:: RuntimeError – If the model has not been fitted yet.

conf_int(alpha=0.05)[source]

Compute confidence intervals for projected parameters.

Parameters:: alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.
Returns:: {param_name: (lower, upper)} confidence intervals.
Return type:: dict
Raises:: RuntimeError – If no projected parameters are available.

summary()[source]

Generate a formatted summary of estimation results.

Returns:: Human-readable summary including neural reward info, parameter estimates, and projection R-squared.
Return type:: str