Simple to calculate: In many cases all you need to calculate a score is a pen and a piece of paper.
Easily interpreted: The result of the calculation is a single number, and higher score usually means higher risk. Furthermore, many scoring methods enforce some form of monotonicity along the measured risk factors to allow a straight forward interpretation of the score.
Actionable: Scores are designed around a set of possible actions that should be taken as a result of the calculated score. Effective score-based policies can be designed and executed by setting thresholds on the value of the score and associating them with escalating actions.
Formal definition
A typical scoring method is composed of 3 components:
A set of consistent rules that assign a numerical value to each risk factor that reflect our estimation of underlying risk.
A formula that calculates the score.
A set of thresholds that helps to translate the calculated score into a level of risk, or an equivalent formula or set of rules to translate the calculated score back into probabilities.
Items 1 & 2 can be achieved by using some form of regression, that will provide both the risk estimation and the formula to calculate the score. Item 3 requires setting an arbitrary set of thresholds and will usually involve expert opinion.
Estimating risk with GLM
Risk score are designed to represent an underlying probability of an adverse event denoted given a vector of explaining variables containing measurements of the relevant risk factors. In order to establish the connection between the risk factors and the probability we estimate a set of weights is estimated using a generalized linear model: Where is a real-valued, monotonically increasing function that maps the values of the linear predictor to the interval. GLM methods typically uses the logit or probit as the link function.
Estimating risk with other methods
While it's possible to estimate using other statistical or machine learning methods, the requirements of simplicity and easy interpretation make most of these methods difficult to use for scoring in this context:
With more sophisticated methods it becomes difficult to attribute simple weights for each risk factor and to provide a simple formula for the calculation of the score. A notable exception are tree-based methods like CART, that can provide a simple set of decision rules and calculations, but cannot ensure the monotonicity of the scale across the different risk factors.
The fact that we are estimating underlying risk across the population, and therefore cannot tag people in advance on an ordinal scaleclassification methods are only relevant if we want to classify people into 2 groups or 2 possible actions.
Constructing the score
When using GLM, the set of estimated weights can be used to assign different values to different values of the risk factors in . The score can then be expressed as a weighted sum:
Some scoring methods will translate the score into probabilities by using or a look-up table. This practice makes the process of obtaining the score more complicated computationally but has the advantage of translating an arbitrary number to a more familiar scale of 0 to 1.
The columns of can represent complex transformations of the risk factors and not just the risk factors themselves.
The values of are sometimes scaled or rounded to allow working with integers instead of very small fractions. While scaling has no impact ability of the score to estimate risk, rounding has the potential of disrupting the "optimality" of the GLM estimation.
Making score-based decisions
Let denote a set of "escalating" actions available for the decision maker. In order to define a decision rule, we want to define a map between different values of the score and the possible decisions in. Let be a partition of into consecutive, non-overlapping intervals, such that. The map is defined as follows:
The values of are set based on expert opinion, the type and prevalence of the measured risk, consequences of miss-classification, etc. For example, a risk of 9 out of 10 will usually be considered as "high risk", but a risk of 7 out of 10 can be considered either "high risk" or "medium risk" depending on context.
The definition of the intervals is on right open-ended intervals but can be equivalently defined using left open ended intervals.
For scoring methods that are already translated the score into probabilities we either define the partition directly on the interval or translate the decision criteria into, and the monotonicity of ensures a 1-to-1 translation.
Other financial industries, such as the insurance industry also use scoring methods, but the exact implementation remains a trade secret, except for some rare cases
Social Sciences
COMPAS score for recidivism, as reverse-engineered by ProPublica using logistic regression and Cox's proportional hazard model.