econirl.NeuralUFXP

class econirl.NeuralUFXP(n_states=None, n_actions=None, discount=0.95, scale=1.0, num_projections=64, reward_hidden_dim=64, reward_num_layers=2, max_epochs=2000, lr=0.01, gradient_clip=10.0, ccp_min_count=1, ccp_smoothing=1e-06, seed=0, verbose=False)[source]

Bases: NeuralEstimatorMixin

Neural-utility UFXP estimator (Oguz and Bray 2026).

Trains a neural utility u_w(s, a) by minimizing the UFXP random-projection objective, reusing the linear estimator’s precomputed dual so no Bellman equation is solved during training.

Parameters:
  • n_states (int, optional) – Sizes of the state and action spaces. Inferred from the data if None.

  • n_actions (int, optional) – Sizes of the state and action spaces. Inferred from the data if None.

  • discount (float, default=0.95) – Discount factor beta.

  • scale (float, default=1.0) – Logit scale sigma.

  • num_projections (int, default=64) – Number of random projections m.

  • reward_hidden_dim (int, default=64) – Hidden width of the utility network.

  • reward_num_layers (int, default=2) – Hidden depth of the utility network.

  • max_epochs (int, default=2000) – Adam steps over the projection objective.

  • lr (float, default=1e-2) – Adam learning rate.

  • gradient_clip (float, default=10.0) – Global-norm gradient clip (<=0 disables).

  • ccp_min_count (int, default=1) – Minimum visits for a state’s first-order conditions to be scored.

  • ccp_smoothing (float, default=1e-6) – Additive smoothing for the frequency CCPs.

  • seed (int, default=0) – Seed for the projections and the network initialization.

  • verbose (bool, default=False) – Print the objective during training.

Variables:
  • policy (numpy.ndarray) – Estimated choice probabilities, shape (n_states, n_actions).

  • value (numpy.ndarray) – Estimated value function, shape (n_states,).

  • reward (numpy.ndarray) – Learned utility u_w(s, a), shape (n_states, n_actions).

  • params (dict) – The learned utility projected onto the features. The objective constrains the choice-relevant utility, not the utility level, so this is a best-effort linear summary of a partially identified function; a low projection_r2_ flags that the utility is not linear in the features.

  • se (dict) – Projection pseudo standard errors (not the efficient UFXP variance).

  • coef (numpy.ndarray) – Projected coefficients in array form.

  • projection_r2 (float) – R-squared of the feature projection.

  • converged (bool) – Whether the objective decreased to a finite value.

__init__(n_states=None, n_actions=None, discount=0.95, scale=1.0, num_projections=64, reward_hidden_dim=64, reward_num_layers=2, max_epochs=2000, lr=0.01, gradient_clip=10.0, ccp_min_count=1, ccp_smoothing=1e-06, seed=0, verbose=False)[source]
Parameters:
  • n_states (int | None)

  • n_actions (int | None)

  • discount (float)

  • scale (float)

  • num_projections (int)

  • reward_hidden_dim (int)

  • reward_num_layers (int)

  • max_epochs (int)

  • lr (float)

  • gradient_clip (float)

  • ccp_min_count (int)

  • ccp_smoothing (float)

  • seed (int)

  • verbose (bool)

fit(data, state=None, action=None, id=None, features=None, transitions=None)[source]

Fit the neural utility to data.

Parameters:
  • data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel of observed choices.

  • state (str, optional) – Column names (required when data is a DataFrame).

  • action (str, optional) – Column names (required when data is a DataFrame).

  • id (str, optional) – Column names (required when data is a DataFrame).

  • features (numpy.ndarray) – Reward features phi(s, a) of shape (n_states, n_actions, K). The utility network maps each feature vector to a scalar utility, so the features set the inputs the network can combine.

  • transitions (numpy.ndarray) – Transition matrices P(s'|s,a) of shape (n_actions, n_states, n_states).

Return type:

NeuralUFXP