This paper investigates the problem of testing and calibrating models of individual decision making. We consider a consumption space equipped with an endogenous notion of abstract ‘numeraire,’ and characterize those preferences for which the quantity of numeraire needed to compensate an agent between a pair of alternatives provides a consistent, cardinal measure of the intensity of preference. This framework includes many well-known preferences over classical commodity spaces, finite or infinite horizon consumption streams, and a wide range of models of preference over uncertainty and risk as special cases. For data consisting of observed or experimentally elicited compensation differences, we develop a least squares theory for quantifying a model’s predictive accuracy and estimating underlying parameters. We additionally provide a general class of explicit, non-parametric statistical tests of rationalizability by particular models for stochastic data. Applications to model selection, welfare analysis and elicitation of subjective beliefs are given.