econirl.RewardSpec

class econirl.RewardSpec(features, names, n_actions=None)[source]

Bases: object

Unified feature specification for structural estimation and IRL.

Stores features as a (S, A, K) array and provides compute, gradient, and hessian methods compatible with the BaseUtilityFunction protocol.

Parameters:

features (jnp.ndarray) – Either (S, A, K) for action-dependent features, or (S, K) for state-only features (broadcast to all actions).
names (list[str]) – Human-readable name for each feature/parameter dimension.
n_actions (int, optional) – Required when features is (S, K) to specify the number of actions for broadcasting. Ignored when features is (S, A, K).

__init__(features, names, n_actions=None)[source]

Parameters:

features (Array)
names (list[str])
n_actions (int | None)

classmethod state_dependent(state_features, names, n_actions)[source]

Create from state-only features (S, K), broadcast to all actions.

Parameters:

state_features (jnp.ndarray) – Shape (S, K).
names (list[str]) – One name per feature.
n_actions (int) – Number of actions to broadcast to.

Return type:

RewardSpec

classmethod state_action_dependent(features, names)[source]

Create from action-dependent features (S, A, K).

Parameters:

features (jnp.ndarray) – Shape (S, A, K).
names (list[str]) – One name per feature.

Return type:

RewardSpec

property feature_matrix: Array: Feature array of shape (S, A, K).

property parameter_names: list[str]: Human-readable names for each parameter.

property num_parameters: int: Number of parameters (K).

property num_states: int: Number of states (S).

property num_actions: int: Number of actions (A).

property is_state_only: bool: Whether the spec was constructed from state-only features.

compute(parameters)[source]

Compute reward matrix R(s, a) = sum_k params[k] * features[s, a, k].

Parameters:: parameters (jnp.ndarray) – Shape (K,).
Returns:: Shape (S, A).
Return type:: jnp.ndarray

compute_gradient(parameters)[source]

Gradient of reward w.r.t. parameters.

For linear specification the gradient is the feature matrix itself, independent of the parameter values.

Parameters:: parameters (jnp.ndarray) – Shape (K,). Unused but kept for protocol compatibility.
Returns:: Shape (S, A, K).
Return type:: jnp.ndarray

compute_hessian(parameters)[source]

Hessian of reward w.r.t. parameters.

For linear specification the Hessian is identically zero.

Parameters:: parameters (jnp.ndarray) – Shape (K,). Unused.
Returns:: Shape (S, A, K, K) of zeros.
Return type:: jnp.ndarray

get_initial_parameters()[source]

Return zeros of shape (K,) as a starting point.

Return type:: Array

get_parameter_bounds()[source]

Return (None, None) indicating unbounded parameters.

Return type:: tuple[Array | None, Array | None]

validate_parameters(parameters)[source]

Check that parameters have shape (K,).

Raises:: ValueError – If shape does not match.
Parameters:: parameters (Array)
Return type:: None

subset_states(indices)[source]

Return a new RewardSpec containing only the specified states.

Parameters:: indices (jnp.ndarray) – 1-D integer array of state indices to keep.
Return type:: RewardSpec

to_linear_utility()[source]

Convert to a LinearUtility with the same (S, A, K) feature matrix.

Returns:: Equivalent LinearUtility instance.
Return type:: LinearUtility

to_action_dependent_reward()[source]

Convert to an ActionDependentReward with the same (S, A, K) features.

Returns:: Equivalent ActionDependentReward instance.
Return type:: ActionDependentReward

to_linear_reward()[source]

Convert to a LinearReward with state-only (S, K) features.

This only works when features are truly state-only (identical across all actions). If features differ across actions, a ValueError is raised.

Returns:: Equivalent LinearReward instance.
Return type:: LinearReward
Raises:: ValueError – If features vary across actions.