# Simulation Studies Every page below is one experiment. We simulate data from a known model, run the estimators on it, and report what they recover. Because the truth is known, both recovery and failure are measurable. | Page | Environment | Size | Estimators | What it shows | | --- | --- | --- | --- | --- | | [Bus engine replacement](rust_bus.md) | Keep-or-replace mileage model (Rust 1987). | 20 states x 2 actions | All. | The canonical benchmark. Who recovers the cost parameters, and at what compute cost. | | [Gridworld navigation](taxi_gridworld.md) | Walk to a goal on a grid. | 64 states x 5 actions | All, IRL focus. | What happens where the data rarely goes. | | [Abstract MDP 1](abstract_mdp_1_sanity.md) | Small random MDP, linear reward. | 8 states x 2 actions | All. | An easy problem every correct estimator must pass. | | [Abstract MDP 2](abstract_mdp_2_harder.md) | The same generator, hardened three ways. | 300 states; 24-state collinear cell | Structural family. | Runtime at scale, inference near discount one, and broken identification. | | [Abstract MDP 3: High Dimensional Case](abstract_mdp_3_highdim.md) | The same generator at large scale. | 3000 states x 2 actions | Ten estimators across families. | How compute costs separate as the state space grows. | | [Abstract MDP 4: Interaction effect](abstract_mdp_4_nonlinear.md) | A reward that multiplies two features the estimators do not model. | 24 states x 3 actions | All. | What an omitted interaction costs: a small behavioral miss, a larger counterfactual one. | | [Direct optimization](direct_optimization.md) | Estimation under correct and misspecified rewards. | varies | MPEC, neural MPEC, GLADIUS. | How this family degrades under reward misspecification. | The findings in one line. Almost every estimator matches the choice probabilities. The differences show up in parameter recovery, in counterfactuals, and in compute cost. ## Reading the tables All numbers come from a saved results file written by the run script. Crashes and timeouts stay in the table with their error message. Policy TV measures how far the estimated choice probabilities are from the truth. Lower is better. Regret measures welfare lost when the recovered model is used in a changed environment. Type A shifts a payoff. Type B changes the dynamics. Type C penalizes an action. Structural estimators re-solve the model and adapt. Behavioral estimators keep their old policy, so their Type C regret is large. Parameter recovery is reported only for structural estimators. IRL methods recover a reward that produces the same behavior but in a different parameterization, so comparing their parameters to the truth is not meaningful. The estimators are documented in the [catalog](../estimators.md). ```{toctree} :maxdepth: 1 rust_bus taxi_gridworld abstract_mdp_1_sanity abstract_mdp_2_harder abstract_mdp_3_highdim abstract_mdp_4_nonlinear direct_optimization ```