Simulation Study

MPEC runs on the canonical_low_action synthetic cell, the same low-dimensional action-dependent structural benchmark used for NFXP and CCP. The cell has 21 states, 3 actions, known linear reward, known transitions, and exact policy, value, Q, and Type A, Type B, and Type C counterfactual oracle objects, so every recovery claim is checked against fully specified truth.

The full result generator is mpec_run.py. It writes the machine-readable results file mpec_results.json.

cd /path/to/econirl
PYTHONPATH=src:. python validation/estimators/mpec/run.py

Design

Quantity	Value
Regular states	20
Absorbing states	1
Total states	21
Actions	3
Exit action	2
Discount factor	0.95
Shock scale	1.0
Simulated individuals	2,000
Periods per individual	80
Observations	160,000

The simulation DGP has action-dependent reward features and an exit action that anchors the reward level.

Fit Summary

Quantity	Value
Converged	true
SQP iterations	19
Log likelihood	-174875.7719
Estimation time	1.92 seconds
Solver	slsqp
SciPy success	true
Final Bellman constraint violation	7.72e-12
Standard errors finite	true

The constrained optimizer satisfies the Bellman equality threshold by several orders of magnitude. The simulation result still depends on recovery checks, not on the constraint diagnostic alone.

Parameter Recovery

Parameter	Truth	Estimate	SE	Error
`action_0_intercept`	0.100000	0.083894	0.029336	-0.016106
`action_0_progress`	0.500000	0.528522	0.035890	0.028522
`action_1_intercept`	0.000000	-0.014461	0.036733	-0.014461
`action_1_progress`	-0.200000	-0.200511	0.052502	-0.000511

Recovery Metrics

Metric	Value
Parameter RMSE	0.017905
Parameter relative RMSE	0.065378
Parameter cosine similarity	0.998867
Reward RMSE	0.009694
Value RMSE	0.019445
Q RMSE	0.022437
Policy KL	9.21e-5
Policy total variation	0.005697
Policy max state L1	0.018905

Numerical Checks

Check	Threshold	Value	Status
converged	true	true	pass
Bellman constraint violation	at most 0.000001	0.000000	pass
Standard errors finite	true	true	pass
Parameter cosine	at least 0.98	0.998867	pass
Parameter relative RMSE	at most 0.15	0.065378	pass
Policy total variation	at most 0.03	0.005697	pass
Value RMSE	at most 0.10	0.019445	pass
Q RMSE	at most 0.10	0.022437	pass
Type A regret	at most 0.05	0.000213	pass
Type B regret	at most 0.05	0.000362	pass
Type C regret	at most 0.05	0.000086	pass

The estimates are not exactly equal to truth because the panel is finite. The study reports recovery within the listed tolerances in the synthetic cell.

Counterfactual Recovery

Counterfactual	Policy TV	Policy KL	Value RMSE	Regret
Type A	0.005109	7.56e-5	0.000238	0.000213
Type B	0.005457	8.20e-5	0.000363	0.000362
Type C	0.003549	3.56e-5	0.000114	0.000086

MPEC also appears on the bus engine, taxi gridworld, and abstract MDP simulation study pages, where it is compared against the full structural and IRL rosters on shared synthetic panels.