Counterfactuals

IQ-Learn recovers a Bellman-implied reward as a diagnostic alongside the imitation policy. Using that reward for counterfactual analysis requires understanding what is and is not grounded by the current evidence.

What the Evidence Says

On the primary synthetic cell (canonical_low_action) the three counterfactual regret checks pass:

Family

Intervention

Regret

Gate

Type A

Reward shift (a payoff component changes).

0.011518059257921435

pass

Type B

Transition change (the dynamics change).

0.02155897051624924

pass

Type C

Action removal (one action is penalized away).

0.009886935172236239

pass

Low regret on this cell means the Q-induced policy happens to produce near-oracle welfare under the applied interventions. It does not mean the reward, value, or Q objects are structurally accurate; all three fail their NRMSE checks by a wide margin on this same cell.

On the high-dimensional neural cell, Type A and Type B regret also fail.

How to Use the Reward Diagnostic

The Bellman-implied reward r_IB(s, a) = Q(s, a) - beta * E[V(s') | s, a] is available from summary.metadata["raw_bellman_reward_table"]. A least-squares projection of this table into the utility feature basis is stored in summary.metadata["projected_reward_matrix"] and summary.metadata["reward_params"].

These are diagnostic objects. They may be useful for qualitative comparison or for initializing a structural estimator, but they should not be reported as structurally recovered parameters without first verifying that the structural gates pass.

Structural Counterfactual Use

To run a counterfactual with structural guarantees, use the structural family (NFXP, CCP, MPEC, or UFXP). Those estimators enforce the Bellman fixed point as a hard constraint and report standard errors; IQ-Learn does neither.

If you want to use the IQ-Learn projected theta as a starting point for a structural estimator:

# Extract projected parameters from IQ-Learn
iq_summary = estimator.estimate(panel, utility, problem, transitions)
theta_init = iq_summary.metadata.get("reward_params")

# Pass as starting point to NFXP or another structural estimator
from econirl.estimation import NFXPEstimator
nfxp_summary = NFXPEstimator().estimate(
    panel, utility, problem, transitions,
    initial_params=theta_init,
)

Check coverage gates before relying on theta_init; off-support reward values can mislead the structural search.