econirl.NeuralUFXP

class econirl.NeuralUFXP(n_states=None, n_actions=None, discount=0.95, scale=1.0, num_projections=64, reward_hidden_dim=64, reward_num_layers=2, max_epochs=2000, lr=0.01, gradient_clip=10.0, ccp_min_count=1, ccp_smoothing=1e-06, seed=0, verbose=False)[source]

Bases: NeuralEstimatorMixin

Neural-utility UFXP estimator (Oguz and Bray 2026).

Trains a neural utility u_w(s, a) by minimizing the UFXP random-projection objective, reusing the linear estimator’s precomputed dual so no Bellman equation is solved during training.

Parameters:

n_states (int, optional) – Sizes of the state and action spaces. Inferred from the data if None.
n_actions (int, optional) – Sizes of the state and action spaces. Inferred from the data if None.
discount (float, default=0.95) – Discount factor beta.
scale (float, default=1.0) – Logit scale sigma.
num_projections (int, default=64) – Number of random projections m.
reward_hidden_dim (int, default=64) – Hidden width of the utility network.
reward_num_layers (int, default=2) – Hidden depth of the utility network.
max_epochs (int, default=2000) – Adam steps over the projection objective.
lr (float, default=1e-2) – Adam learning rate.
gradient_clip (float, default=10.0) – Global-norm gradient clip (<=0 disables).
ccp_min_count (int, default=1) – Minimum visits for a state’s first-order conditions to be scored.
ccp_smoothing (float, default=1e-6) – Additive smoothing for the frequency CCPs.
seed (int, default=0) – Seed for the projections and the network initialization.
verbose (bool, default=False) – Print the objective during training.

Variables:

policy (numpy.ndarray) – Estimated choice probabilities, shape (n_states, n_actions).
value (numpy.ndarray) – Estimated value function, shape (n_states,).
reward (numpy.ndarray) – Learned utility u_w(s, a), shape (n_states, n_actions).
params (dict) – The learned utility projected onto the features. The objective constrains the choice-relevant utility, not the utility level, so this is a best-effort linear summary of a partially identified function; a low projection_r2_ flags that the utility is not linear in the features.
se (dict) – Projection pseudo standard errors (not the efficient UFXP variance).
coef (numpy.ndarray) – Projected coefficients in array form.
projection_r2 (float) – R-squared of the feature projection.
converged (bool) – Whether the objective decreased to a finite value.

__init__(n_states=None, n_actions=None, discount=0.95, scale=1.0, num_projections=64, reward_hidden_dim=64, reward_num_layers=2, max_epochs=2000, lr=0.01, gradient_clip=10.0, ccp_min_count=1, ccp_smoothing=1e-06, seed=0, verbose=False)[source]

Parameters:

n_states (int | None)
n_actions (int | None)
discount (float)
scale (float)
num_projections (int)
reward_hidden_dim (int)
reward_num_layers (int)
max_epochs (int)
lr (float)
gradient_clip (float)
ccp_min_count (int)
ccp_smoothing (float)
seed (int)
verbose (bool)

fit(data, state=None, action=None, id=None, features=None, transitions=None)[source]

Fit the neural utility to data.

Parameters:

data (pandas.DataFrame or Panel or TrajectoryPanel) – Panel of observed choices.
state (str, optional) – Column names (required when data is a DataFrame).
action (str, optional) – Column names (required when data is a DataFrame).
id (str, optional) – Column names (required when data is a DataFrame).
features (numpy.ndarray) – Reward features phi(s, a) of shape (n_states, n_actions, K). The utility network maps each feature vector to a scalar utility, so the features set the inputs the network can combine.
transitions (numpy.ndarray) – Transition matrices P(s'|s,a) of shape (n_actions, n_states, n_states).

Return type:

NeuralUFXP