# Quick Start ```python from econirl.estimation import IQLearnConfig, IQLearnEstimator config = IQLearnConfig( q_type="tabular", divergence="chi2", alpha=1.0, ) estimator = IQLearnEstimator(config=config) summary = estimator.estimate( panel=panel, utility=utility, problem=problem, transitions=transitions, ) print(summary.parameters) print(summary.converged) print(summary.metadata["expert_state_coverage"]) print(summary.metadata["expert_state_action_coverage"]) ``` Fitted attributes follow the same convention as other estimators: | Attribute | Meaning | | --- | --- | | `parameters` | Q parameters: feature coefficients for `q_type="linear"`, per-cell rewards for tabular. | | `converged` | Whether L-BFGS-B reported convergence. | | `log_likelihood` | Log-likelihood of expert actions under the Q-induced policy. | | `policy` | Choice probabilities pi(a given s), shape (n_states, n_actions). | | `value_function` | V(s) = sigma * logsumexp(Q(s, :) / sigma), shape (n_states,). | | `metadata["expert_state_coverage"]` | Fraction of MDP states visited in the expert panel. | | `metadata["expert_state_action_coverage"]` | Fraction of state-action pairs visited. | ## Q Parameterizations ```python # Tabular: free Q(s, a) for each cell -- no feature structure config = IQLearnConfig(q_type="tabular", divergence="chi2", alpha=1.0) # Linear: Q(s, a) = phi(s, a)^T theta, matches the utility feature spec config = IQLearnConfig(q_type="linear", divergence="chi2", alpha=1.0) # Neural: small feedforward net mapping state features to Q(s, :) config = IQLearnConfig(q_type="neural", divergence="chi2", alpha=1.0, hidden_dim=64, num_layers=2) ``` The linear parameterization is the most useful for structural interpretation: it constrains Q to live in the same feature space as the utility, so the recovered theta is directly comparable to the data-generating parameters and propagates to unvisited state-action pairs. ## Common Pitfall Do not pair `q_type="tabular"` with `divergence="simple"`. The simple objective has no upper bound on a free Q table and the optimizer drives Q to numerical overflow (value RMSE can exceed 1e21). The chi-squared objective with `alpha >= 1` keeps the problem bounded. See the internal notes for a detailed account of this failure mode.