Simulation Studies

Every page below is one experiment. We simulate data from a known model, run the estimators on it, and report what they recover. Because the truth is known, both recovery and failure are measurable.

Page	Environment	Size	Estimators	What it shows
Bus engine replacement	Keep-or-replace mileage model (Rust 1987).	20 states x 2 actions	All.	The canonical benchmark. Who recovers the cost parameters, and at what compute cost.
Gridworld navigation	Walk to a goal on a grid.	64 states x 5 actions	All, IRL focus.	What happens where the data rarely goes.
Abstract MDP 1	Small random MDP, linear reward.	8 states x 2 actions	All.	An easy problem every correct estimator must pass.
Abstract MDP 2	The same generator, hardened three ways.	300 states; 24-state collinear cell	Structural family.	Runtime at scale, inference near discount one, and broken identification.
Abstract MDP 3: High Dimensional Case	The same generator at large scale.	3000 states x 2 actions	Ten estimators across families.	How compute costs separate as the state space grows.
Abstract MDP 4: Interaction effect	A reward that multiplies two features the estimators do not model.	24 states x 3 actions	All.	What an omitted interaction costs: a small behavioral miss, a larger counterfactual one.
Direct optimization	Estimation under correct and misspecified rewards.	varies	MPEC, neural MPEC, GLADIUS.	How this family degrades under reward misspecification.

The findings in one line. Almost every estimator matches the choice probabilities. The differences show up in parameter recovery, in counterfactuals, and in compute cost.

Reading the tables

All numbers come from a saved results file written by the run script. Crashes and timeouts stay in the table with their error message.

Policy TV measures how far the estimated choice probabilities are from the truth. Lower is better.

Regret measures welfare lost when the recovered model is used in a changed environment. Type A shifts a payoff. Type B changes the dynamics. Type C penalizes an action. Structural estimators re-solve the model and adapt. Behavioral estimators keep their old policy, so their Type C regret is large.

Parameter recovery is reported only for structural estimators. IRL methods recover a reward that produces the same behavior but in a different parameterization, so comparing their parameters to the truth is not meaningful.

The estimators are documented in the catalog.