Under the Hood

Model

The data are state, action, next-state triples \((s, a, s')\) from a stationary infinite-horizon dynamic discrete choice model with linear flow utility \(u_\theta(s, a) = \varphi(s, a)^\top \theta\), discount factor \(\beta\), known transition kernels \(F_a(s' \mid s)\), and i.i.d. logit shocks. The integrated value function satisfies the soft Bellman fixed point

\[ V_\theta(s) = \log \sum_a \exp\!\bigl(u_\theta(s,a) + \beta \sum_{s'} F_a(s'\mid s)\,V_\theta(s')\bigr). \]

MPEC treats the value vector \(V\) as an explicit optimization variable alongside \(\theta\) and enforces the Bellman constraint directly. The full model derivation, standard-error identity, and consistency argument are on the MPEC overview.

Pseudocode

initialize theta; solve V = T_theta(V) for the starting value vector
define the constraint c(theta, V) = V - T_theta(V)
while the constrained optimizer has not stopped:
    compute Q(s, a) from theta, V, transitions, and beta
    compute log pi(a | s) by the log-softmax rule
    evaluate the conditional log likelihood
    evaluate c(theta, V) and its Jacobian via JAX
    update theta and V with SLSQP
return theta, V, policy, standard errors, and constraint diagnostics

The joint variable is \(x = (\theta, V)\) and the equality constraint has one row per state. Objective gradients and constraint Jacobians come from JAX, not finite differences. The value vector is initialized at the Bellman fixed point of the starting \(\theta\), so the optimizer begins feasible and the SQP steps stay near the constraint surface. No Bellman fixed point is solved inside the objective; the constraint carries it.

Implementation Notes

Standard errors use the implicit-score identity stated on the overview page: at the constrained optimum, per-observation score contributions are computed from \(\partial V / \partial \theta\) after convergence, and the robust covariance is their outer product. The fitted summary exposes metadata["final_constraint_violation"]; gate on it alongside the convergence flag, since a high likelihood with a violated constraint is not a solution.

The estimator lives in econirl.estimation.mpec. Use MPECConfig(solver="sqp") for the recommended SLSQP path. The augmented_lagrangian solver is retained for comparison but is less reliable at high discount factors.