MCE-IRL#

Reference PDF: papers/econirl_package/primers/mce_irl/mce_irl.pdf.

MCE-IRL learns reward parameters by matching expert feature expectations under the maximum causal entropy policy. It is part of the 12-estimator known-truth validation suite.

Validation Status#

Pass. The gated artifact is generated by papers/econirl_package/primers/mce_irl/mce_irl_run.py from the shared known-truth harness. It validates the low-level MCEIRLEstimator directly with known transitions and known action-dependent reward features.

The primary cell is mce_low_high_reward: 25 states, 3 actions, 8 action-dependent reward features, 3,000 individuals, and 100 periods. It passes 10/10 gates: feature residual, occupancy moment residual, normalized reward RMSE, policy TV, normalized value RMSE, normalized Q RMSE, and Type A/B/C counterfactual regret.

Reward, value, and Q metrics use the standard IRL location-and-scale normalization before RMSE is computed. Policy and counterfactual gates are not normalized. Raw parameter cosine is not used as a MCE-IRL validation gate.

Usage Scope#

Use MCE-IRL when transitions are known or supplied and the reward features are explicit. For multi-action structural recovery, pass a RewardSpec to fit() or pass a feature_matrix at construction time. The wrapper no longer silently treats feature_matrix=None as a validated structural default for multi-action models.

For the neural reward variant, see docs/estimators/deep_mce_irl.md. Its primary validation artifact is the anchored recovered reward matrix; projected finite parameters are diagnostic unless the supplied feature basis is well-conditioned.

Artifacts#

  • PDF source: papers/econirl_package/primers/mce_irl/mce_irl.tex

  • Result generator: papers/econirl_package/primers/mce_irl/mce_irl_run.py

  • Shared DGP harness: experiments/known_truth.py

  • Results: papers/econirl_package/primers/mce_irl/mce_irl_results.json