econirl.MCEIRLNeural

class econirl.MCEIRLNeural(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]

Bases: NeuralEstimatorMixin

Neural Maximum Causal Entropy IRL.

Learns a neural reward function using the MCE-IRL objective: maximize E_expert[R] - log Z(R)

where Z(R) is the partition function (soft value at initial state).

Supports two reward types:

  • reward_type="state_action" (default): R(s,a) via a network that takes [state_features, action_onehot]. This is more general and correctly handles action-dependent rewards.

  • reward_type="state": R(s) broadcast to all actions (original).

For v1, transitions must be available so that exact soft value iteration and state visitation frequencies can be computed.

Parameters:
  • n_states (int, optional) – Number of discrete states. Inferred from data if None.

  • n_actions (int, optional) – Number of discrete actions. Inferred from data if None.

  • discount (float, default=0.95) – Time discount factor beta.

  • reward_type (str, default="state_action") – Type of reward function: "state_action" for R(s,a) or "state" for R(s) broadcast to all actions.

  • reward_hidden_dim (int, default=64) – Hidden dimension for the reward MLP.

  • reward_num_layers (int, default=2) – Number of hidden layers in the reward MLP.

  • max_epochs (int, default=200) – Maximum number of training epochs.

  • lr (float, default=1e-3) – Learning rate for Adam optimizer.

  • inner_solver (str, default="hybrid") – Solver for soft value iteration: “hybrid” or “value”.

  • inner_tol (float, default=1e-8) – Convergence tolerance for inner solver.

  • inner_max_iter (int, default=5000) – Maximum iterations for inner solver.

  • state_encoder (callable, optional) – Function mapping state indices (int array) to feature vectors. Receives shape (B,) and should return shape (B, state_dim). If None, a default normalizing encoder is created.

  • state_dim (int, optional) – Dimension of state features. Required if state_encoder is provided.

  • feature_names (list of str, optional) – Names for features when projecting rewards onto linear features.

  • anchor_action (int, optional) – Action whose reward is fixed to zero. This is useful for identified action-dependent IRL designs with a normalized outside/exit action.

  • absorbing_state (int, optional) – State whose reward row is fixed to zero.

  • seed (int, default=0) – Random seed for network initialization.

  • verbose (bool, default=False) – Whether to print progress during training.

Variables:
  • params (dict or None) – Projected structural parameters after fitting. None if no features were provided for projection.

  • se (dict or None) – Pseudo standard errors from the projection regression.

  • pvalues (dict or None) – P-values from Wald t-test on pseudo SEs.

  • coef (numpy.ndarray or None) – Coefficient array (same values as params_ in array form).

  • policy (numpy.ndarray or None) – Estimated choice probabilities P(a|s) of shape (n_states, n_actions).

  • value (numpy.ndarray or None) – Estimated value function V(s) of shape (n_states,).

  • reward (numpy.ndarray or None) – Neural reward. Shape (n_states,) for reward_type="state" or (n_states, n_actions) for reward_type="state_action".

  • projection_r2 (float or None) – R-squared of the feature projection.

  • converged (bool or None) – Whether training converged.

  • n_epochs (int or None) – Number of training epochs completed.

Examples

>>> from econirl.estimators import MCEIRLNeural
>>> import numpy as np
>>>
>>> # R(s,a) -- default, more general
>>> model = MCEIRLNeural(n_states=25, n_actions=4, discount=0.95)
>>> model.fit(data=df, state="state", action="action", id="agent_id",
...           transitions=T)
>>> print(model.reward_.shape)  # (25, 4)
>>> print(model.policy_.shape)  # (25, 4)
>>>
>>> # R(s) -- state-only, backward compatible
>>> model = MCEIRLNeural(n_states=25, n_actions=4, reward_type="state")
>>> model.fit(...)
>>> print(model.reward_.shape)  # (25,)
__init__(n_states=None, n_actions=None, discount=0.95, reward_type='state_action', reward_hidden_dim=64, reward_num_layers=2, max_epochs=200, lr=0.001, inner_solver='hybrid', inner_tol=1e-08, inner_max_iter=5000, state_encoder=None, state_dim=None, feature_names=None, anchor_action=None, absorbing_state=None, seed=0, verbose=False)[source]
Parameters:
  • n_states (int | None)

  • n_actions (int | None)

  • discount (float)

  • reward_type (str)

  • reward_hidden_dim (int)

  • reward_num_layers (int)

  • max_epochs (int)

  • lr (float)

  • inner_solver (str)

  • inner_tol (float)

  • inner_max_iter (int)

  • state_encoder (Callable | None)

  • state_dim (int | None)

  • feature_names (list[str] | None)

  • anchor_action (int | None)

  • absorbing_state (int | None)

  • seed (int)

  • verbose (bool)

fit(data, state=None, action=None, id=None, features=None, transitions=None, context=None)[source]

Fit the MCEIRLNeural estimator to data.

Parameters:
  • data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel data with demonstrations. When a DataFrame is passed, state, action, and id column names are required.

  • state (str, optional) – Column name for the state variable (required for DataFrame).

  • action (str, optional) – Column name for the action variable (required for DataFrame).

  • id (str, optional) – Column name for the individual identifier (required for DataFrame).

  • features (RewardSpec or numpy.ndarray, optional) – Feature specification for parameter projection. If provided, the neural reward is projected onto these features to extract interpretable theta.

  • transitions (numpy.ndarray) – Transition matrices P(s'|s,a), shape (n_actions, n_states, n_states). Required for v1 (exact soft value iteration).

  • context (ignored) – Accepted for API compatibility but not used.

Returns:

self – Returns self for method chaining.

Return type:

MCEIRLNeural

property reward_matrix_: ndarray | None

Reward matrix R(s,a) of shape (n_states, n_actions).

For reward_type="state_action", self.reward_ already has shape (n_states, n_actions) and is returned directly. For reward_type="state", the state-only reward is broadcast to all actions.

predict_proba(states)[source]

Predict choice probabilities for given states.

Parameters:

states (numpy.ndarray) – Array of state indices.

Returns:

Choice probabilities of shape (len(states), n_actions).

Return type:

numpy.ndarray

Raises:

RuntimeError – If the model has not been fitted yet.

conf_int(alpha=0.05)[source]

Compute confidence intervals for projected parameters.

Parameters:

alpha (float, default=0.05) – Significance level. Returns (1 - alpha) confidence intervals.

Returns:

{param_name: (lower, upper)} confidence intervals.

Return type:

dict

Raises:

RuntimeError – If no projected parameters are available.

summary()[source]

Generate a formatted summary of estimation results.

Returns:

Human-readable summary including neural reward info, parameter estimates, and projection R-squared.

Return type:

str