MCE-IRL#
Reference PDF: papers/econirl_package/primers/mce_irl/mce_irl.pdf.
MCE-IRL learns reward parameters by matching expert feature expectations under the maximum causal entropy policy. It is part of the 12-estimator known-truth validation suite.
Validation Status#
Pass. The gated artifact is generated by
papers/econirl_package/primers/mce_irl/mce_irl_run.py from the shared
known-truth harness. It validates the low-level MCEIRLEstimator directly with
known transitions and known action-dependent reward features.
The primary cell is mce_low_high_reward: 25 states, 3 actions, 8
action-dependent reward features, 3,000 individuals, and 100 periods. It passes
10/10 gates: feature residual, occupancy moment residual, normalized reward
RMSE, policy TV, normalized value RMSE, normalized Q RMSE, and Type A/B/C
counterfactual regret.
Reward, value, and Q metrics use the standard IRL location-and-scale normalization before RMSE is computed. Policy and counterfactual gates are not normalized. Raw parameter cosine is not used as a MCE-IRL validation gate.
Usage Scope#
Use MCE-IRL when transitions are known or supplied and the reward features are
explicit. For multi-action structural recovery, pass a RewardSpec to fit()
or pass a feature_matrix at construction time. The wrapper no longer silently
treats feature_matrix=None as a validated structural default for multi-action
models.
For the neural reward variant, see docs/estimators/deep_mce_irl.md. Its
primary validation artifact is the anchored recovered reward matrix; projected
finite parameters are diagnostic unless the supplied feature basis is
well-conditioned.
Artifacts#
PDF source:
papers/econirl_package/primers/mce_irl/mce_irl.texResult generator:
papers/econirl_package/primers/mce_irl/mce_irl_run.pyShared DGP harness:
experiments/known_truth.pyResults:
papers/econirl_package/primers/mce_irl/mce_irl_results.json