Quick Start
from econirl.estimation import IQLearnConfig, IQLearnEstimator
config = IQLearnConfig(
q_type="tabular",
divergence="chi2",
alpha=1.0,
)
estimator = IQLearnEstimator(config=config)
summary = estimator.estimate(
panel=panel,
utility=utility,
problem=problem,
transitions=transitions,
)
print(summary.parameters)
print(summary.converged)
print(summary.metadata["expert_state_coverage"])
print(summary.metadata["expert_state_action_coverage"])
Fitted attributes follow the same convention as other estimators:
Attribute |
Meaning |
|---|---|
|
Q parameters: feature coefficients for |
|
Whether L-BFGS-B reported convergence. |
|
Log-likelihood of expert actions under the Q-induced policy. |
|
Choice probabilities pi(a given s), shape (n_states, n_actions). |
|
V(s) = sigma * logsumexp(Q(s, :) / sigma), shape (n_states,). |
|
Fraction of MDP states visited in the expert panel. |
|
Fraction of state-action pairs visited. |
Q Parameterizations
# Tabular: free Q(s, a) for each cell -- no feature structure
config = IQLearnConfig(q_type="tabular", divergence="chi2", alpha=1.0)
# Linear: Q(s, a) = phi(s, a)^T theta, matches the utility feature spec
config = IQLearnConfig(q_type="linear", divergence="chi2", alpha=1.0)
# Neural: small feedforward net mapping state features to Q(s, :)
config = IQLearnConfig(q_type="neural", divergence="chi2", alpha=1.0,
hidden_dim=64, num_layers=2)
The linear parameterization is the most useful for structural interpretation: it constrains Q to live in the same feature space as the utility, so the recovered theta is directly comparable to the data-generating parameters and propagates to unvisited state-action pairs.
Common Pitfall
Do not pair q_type="tabular" with divergence="simple". The simple
objective has no upper bound on a free Q table and the optimizer drives Q to
numerical overflow (value RMSE can exceed 1e21). The chi-squared objective
with alpha >= 1 keeps the problem bounded. See the internal notes for a
detailed account of this failure mode.