Pre-Estimation Checks

IQ-Learn has the same general data-quality checks as other estimators plus coverage checks that are specific to its Q-based reward recovery.

Check	Why it matters for IQ-Learn
Expert state coverage	The objective only scores expert (s, a) pairs; states not in the panel receive no direct Q signal.
Expert state-action coverage	Off-support state-action pairs get no gradient; their implied reward is extrapolated, not fitted.
Q parameterization and divergence	Tabular Q with simple divergence has no upper bound; chi-squared is required for bounded optimization.
Feature rank (linear head)	A rank-deficient feature matrix leaves directions of theta undetermined.
Feature condition number	Ill-conditioning inflates the variance of the linear Q solve.
Transition row sums	Transitions must be row-stochastic in the (n_actions, n_states, n_states) orientation for the inverse Bellman reward to be valid.
Discount and scale	Misspecified beta or sigma shift the implied reward by a constant factor.

Coverage Gates

IQ-Learn output is suitable for reward and counterfactual diagnostics only when:

expert_state_coverage == 1.0 (every state in the MDP was visited),
expert_state_action_coverage >= 0.95 (at least 95 percent of state-action pairs were visited).

Below these thresholds the Q table and implied reward are valid only on support; off-support values are extrapolation.

Canonical Simulation Checks

Values from the primary synthetic cell run (see Simulation Study):

Check	Value	Status
Feature rank	4 / 4	pass
Feature condition number	4.51	pass
Observed states	21 / 21	pass
State-action coverage	1.000	pass
Minimum action share	0.325	pass

Common Risk Patterns

Sparse expert panels are the main risk. When the expert panel covers only a subset of states, the Q table on unvisited states is unconstrained and the inverse Bellman reward at those cells is unreliable. The linear Q head mitigates this by constraining Q to propagate through features, but it requires a well-specified feature matrix. Always check both coverage figures from summary.metadata before interpreting the reward output.

The sparse-support guard in sparse_support_guard.py demonstrates that small policy TV and low counterfactual regret numbers are not sufficient evidence when coverage is below the gates; support must pass first.