# Counterfactuals

IQ-Learn recovers a Bellman-implied reward as a diagnostic alongside the
imitation policy. Using that reward for counterfactual analysis requires
understanding what is and is not grounded by the current evidence.

## What the Evidence Says

On the primary synthetic cell (`canonical_low_action`) the three counterfactual
regret checks pass:

| Family | Intervention | Regret | Gate |
| --- | --- | --- | --- |
| Type A | Reward shift (a payoff component changes). | 0.011518059257921435 | pass |
| Type B | Transition change (the dynamics change). | 0.02155897051624924 | pass |
| Type C | Action removal (one action is penalized away). | 0.009886935172236239 | pass |

Low regret on this cell means the Q-induced policy happens to produce
near-oracle welfare under the applied interventions. It does not mean the
reward, value, or Q objects are structurally accurate; all three fail their
NRMSE checks by a wide margin on this same cell.

On the high-dimensional neural cell, Type A and Type B regret also fail.

## How to Use the Reward Diagnostic

The Bellman-implied reward `r_IB(s, a) = Q(s, a) - beta * E[V(s') | s, a]`
is available from `summary.metadata["raw_bellman_reward_table"]`. A
least-squares projection of this table into the utility feature basis is
stored in `summary.metadata["projected_reward_matrix"]` and
`summary.metadata["reward_params"]`.

These are diagnostic objects. They may be useful for qualitative comparison
or for initializing a structural estimator, but they should not be reported as
structurally recovered parameters without first verifying that the structural
gates pass.

## Structural Counterfactual Use

To run a counterfactual with structural guarantees, use the structural family
(NFXP, CCP, MPEC, or UFXP). Those estimators enforce the Bellman fixed point
as a hard constraint and report standard errors; IQ-Learn does neither.

If you want to use the IQ-Learn projected theta as a starting point for a
structural estimator:

```python
# Extract projected parameters from IQ-Learn
iq_summary = estimator.estimate(panel, utility, problem, transitions)
theta_init = iq_summary.metadata.get("reward_params")

# Pass as starting point to NFXP or another structural estimator
from econirl.estimation import NFXPEstimator
nfxp_summary = NFXPEstimator().estimate(
    panel, utility, problem, transitions,
    initial_params=theta_init,
)
```

Check coverage gates before relying on `theta_init`; off-support reward values
can mislead the structural search.