Counterfactuals
IQ-Learn recovers a Bellman-implied reward as a diagnostic alongside the imitation policy. Using that reward for counterfactual analysis requires understanding what is and is not grounded by the current evidence.
What the Evidence Says
On the primary synthetic cell (canonical_low_action) the three counterfactual
regret checks pass:
Family |
Intervention |
Regret |
Gate |
|---|---|---|---|
Type A |
Reward shift (a payoff component changes). |
0.011518059257921435 |
pass |
Type B |
Transition change (the dynamics change). |
0.02155897051624924 |
pass |
Type C |
Action removal (one action is penalized away). |
0.009886935172236239 |
pass |
Low regret on this cell means the Q-induced policy happens to produce near-oracle welfare under the applied interventions. It does not mean the reward, value, or Q objects are structurally accurate; all three fail their NRMSE checks by a wide margin on this same cell.
On the high-dimensional neural cell, Type A and Type B regret also fail.
How to Use the Reward Diagnostic
The Bellman-implied reward r_IB(s, a) = Q(s, a) - beta * E[V(s') | s, a]
is available from summary.metadata["raw_bellman_reward_table"]. A
least-squares projection of this table into the utility feature basis is
stored in summary.metadata["projected_reward_matrix"] and
summary.metadata["reward_params"].
These are diagnostic objects. They may be useful for qualitative comparison or for initializing a structural estimator, but they should not be reported as structurally recovered parameters without first verifying that the structural gates pass.
Structural Counterfactual Use
To run a counterfactual with structural guarantees, use the structural family (NFXP, CCP, MPEC, or UFXP). Those estimators enforce the Bellman fixed point as a hard constraint and report standard errors; IQ-Learn does neither.
If you want to use the IQ-Learn projected theta as a starting point for a structural estimator:
# Extract projected parameters from IQ-Learn
iq_summary = estimator.estimate(panel, utility, problem, transitions)
theta_init = iq_summary.metadata.get("reward_params")
# Pass as starting point to NFXP or another structural estimator
from econirl.estimation import NFXPEstimator
nfxp_summary = NFXPEstimator().estimate(
panel, utility, problem, transitions,
initial_params=theta_init,
)
Check coverage gates before relying on theta_init; off-support reward values
can mislead the structural search.