# Pre-Estimation Checks

IQ-Learn has the same general data-quality checks as other estimators plus
coverage checks that are specific to its Q-based reward recovery.

| Check | Why it matters for IQ-Learn |
| --- | --- |
| Expert state coverage | The objective only scores expert (s, a) pairs; states not in the panel receive no direct Q signal. |
| Expert state-action coverage | Off-support state-action pairs get no gradient; their implied reward is extrapolated, not fitted. |
| Q parameterization and divergence | Tabular Q with simple divergence has no upper bound; chi-squared is required for bounded optimization. |
| Feature rank (linear head) | A rank-deficient feature matrix leaves directions of theta undetermined. |
| Feature condition number | Ill-conditioning inflates the variance of the linear Q solve. |
| Transition row sums | Transitions must be row-stochastic in the (n_actions, n_states, n_states) orientation for the inverse Bellman reward to be valid. |
| Discount and scale | Misspecified beta or sigma shift the implied reward by a constant factor. |

## Coverage Gates

IQ-Learn output is suitable for reward and counterfactual diagnostics only
when:

- `expert_state_coverage == 1.0` (every state in the MDP was visited),
- `expert_state_action_coverage >= 0.95` (at least 95 percent of
  state-action pairs were visited).

Below these thresholds the Q table and implied reward are valid only on
support; off-support values are extrapolation.

## Canonical Simulation Checks

Values from the primary synthetic cell run (see [Simulation Study](validation.md)):

| Check | Value | Status |
| --- | --- | --- |
| Feature rank | 4 / 4 | pass |
| Feature condition number | 4.51 | pass |
| Observed states | 21 / 21 | pass |
| State-action coverage | 1.000 | pass |
| Minimum action share | 0.325 | pass |

## Common Risk Patterns

Sparse expert panels are the main risk. When the expert panel covers only a
subset of states, the Q table on unvisited states is unconstrained and the
inverse Bellman reward at those cells is unreliable. The linear Q head
mitigates this by constraining Q to propagate through features, but it
requires a well-specified feature matrix. Always check both coverage figures
from `summary.metadata` before interpreting the reward output.

The sparse-support guard in
[`sparse_support_guard.py`](https://github.com/rawatpranjal/EconIRL/blob/main/validation/estimators/iq_learn/sparse_support_guard.py)
demonstrates that small policy TV and low counterfactual regret numbers are
not sufficient evidence when coverage is below the gates; support must pass
first.