Quick Start

from econirl.estimation import IQLearnConfig, IQLearnEstimator

config = IQLearnConfig(
    q_type="tabular",
    divergence="chi2",
    alpha=1.0,
)
estimator = IQLearnEstimator(config=config)

summary = estimator.estimate(
    panel=panel,
    utility=utility,
    problem=problem,
    transitions=transitions,
)

print(summary.parameters)
print(summary.converged)
print(summary.metadata["expert_state_coverage"])
print(summary.metadata["expert_state_action_coverage"])

Fitted attributes follow the same convention as other estimators:

Attribute	Meaning
`parameters`	Q parameters: feature coefficients for `q_type="linear"`, per-cell rewards for tabular.
`converged`	Whether L-BFGS-B reported convergence.
`log_likelihood`	Log-likelihood of expert actions under the Q-induced policy.
`policy`	Choice probabilities pi(a given s), shape (n_states, n_actions).
`value_function`	V(s) = sigma * logsumexp(Q(s, :) / sigma), shape (n_states,).
`metadata["expert_state_coverage"]`	Fraction of MDP states visited in the expert panel.
`metadata["expert_state_action_coverage"]`	Fraction of state-action pairs visited.

Q Parameterizations

# Tabular: free Q(s, a) for each cell -- no feature structure
config = IQLearnConfig(q_type="tabular", divergence="chi2", alpha=1.0)

# Linear: Q(s, a) = phi(s, a)^T theta, matches the utility feature spec
config = IQLearnConfig(q_type="linear", divergence="chi2", alpha=1.0)

# Neural: small feedforward net mapping state features to Q(s, :)
config = IQLearnConfig(q_type="neural", divergence="chi2", alpha=1.0,
                       hidden_dim=64, num_layers=2)

The linear parameterization is the most useful for structural interpretation: it constrains Q to live in the same feature space as the utility, so the recovered theta is directly comparable to the data-generating parameters and propagates to unvisited state-action pairs.

Common Pitfall

Do not pair q_type="tabular" with divergence="simple". The simple objective has no upper bound on a free Q table and the optimizer drives Q to numerical overflow (value RMSE can exceed 1e21). The chi-squared objective with alpha >= 1 keeps the problem bounded. See the internal notes for a detailed account of this failure mode.