Simulation Study

MPEC runs on the canonical_low_action synthetic cell, the same low-dimensional action-dependent structural benchmark used for NFXP and CCP. The cell has 21 states, 3 actions, known linear reward, known transitions, and exact policy, value, Q, and Type A, Type B, and Type C counterfactual oracle objects, so every recovery claim is checked against fully specified truth.

The full result generator is mpec_run.py. It writes the machine-readable results file mpec_results.json.

cd /path/to/econirl
PYTHONPATH=src:. python validation/estimators/mpec/run.py

Design

Quantity

Value

Regular states

20

Absorbing states

1

Total states

21

Actions

3

Exit action

2

Discount factor

0.95

Shock scale

1.0

Simulated individuals

2,000

Periods per individual

80

Observations

160,000

The simulation DGP has action-dependent reward features and an exit action that anchors the reward level.

Fit Summary

Quantity

Value

Converged

true

SQP iterations

19

Log likelihood

-174875.7719

Estimation time

1.92 seconds

Solver

slsqp

SciPy success

true

Final Bellman constraint violation

7.72e-12

Standard errors finite

true

The constrained optimizer satisfies the Bellman equality threshold by several orders of magnitude. The simulation result still depends on recovery checks, not on the constraint diagnostic alone.

Parameter Recovery

Parameter

Truth

Estimate

SE

Error

action_0_intercept

0.100000

0.083894

0.029336

-0.016106

action_0_progress

0.500000

0.528522

0.035890

0.028522

action_1_intercept

0.000000

-0.014461

0.036733

-0.014461

action_1_progress

-0.200000

-0.200511

0.052502

-0.000511

Recovery Metrics

Metric

Value

Parameter RMSE

0.017905

Parameter relative RMSE

0.065378

Parameter cosine similarity

0.998867

Reward RMSE

0.009694

Value RMSE

0.019445

Q RMSE

0.022437

Policy KL

9.21e-5

Policy total variation

0.005697

Policy max state L1

0.018905

Numerical Checks

Check

Threshold

Value

Status

converged

true

true

pass

Bellman constraint violation

at most 0.000001

0.000000

pass

Standard errors finite

true

true

pass

Parameter cosine

at least 0.98

0.998867

pass

Parameter relative RMSE

at most 0.15

0.065378

pass

Policy total variation

at most 0.03

0.005697

pass

Value RMSE

at most 0.10

0.019445

pass

Q RMSE

at most 0.10

0.022437

pass

Type A regret

at most 0.05

0.000213

pass

Type B regret

at most 0.05

0.000362

pass

Type C regret

at most 0.05

0.000086

pass

The estimates are not exactly equal to truth because the panel is finite. The study reports recovery within the listed tolerances in the synthetic cell.

Counterfactual Recovery

Counterfactual

Policy TV

Policy KL

Value RMSE

Regret

Type A

0.005109

7.56e-5

0.000238

0.000213

Type B

0.005457

8.20e-5

0.000363

0.000362

Type C

0.003549

3.56e-5

0.000114

0.000086

MPEC also appears on the bus engine, taxi gridworld, and abstract MDP simulation study pages, where it is compared against the full structural and IRL rosters on shared synthetic panels.