Similarity score


In sabermetrics and basketball analytics, similarity scores are a method of comparing baseball and basketball players to other players, with the intent of discovering who the most similar historical players are to a certain player.
Similarity scores are among the many original sabermetric concepts first introduced by Bill James. James initially created the concept as a way to effectively compare non-Hall of Fame players to players in the Hall, to see who was either on track to make the HOF, or to determine if any eligible players had been snubbed by the selection committee. For example, if the most similar players to a non-HOFer were all in the Hall of Fame, one could effectively argue that that player should be in the Hall.
More recently, similarity scores have been used to determine career paths and projected statistics for players. The logic behind this line of thought is simple: players often follow similar career trajectories to their most similar players, so the historical similar players' performance in years after the active player's current age should be a good predictor of that active player's future production. An example of this would be the Football Outsiders' discovery that all but the highest caliber of wide receivers suffer a marked decline after their seventh season in the NFL, a fact that bore out for the receivers selected in the 1996 NFL Draft when their production collectively slipped.
Many baseball analysts have augmented James' method over the years, or come up with their own system of measuring similarity. Baseball Prospectus employs a projection system developed by Nate Silver known as PECOTA which applies nearest neighbor analysis to calculate similarities between players from different eras. Pro Football Prospectus has their own system for projecting future performance. John Hollinger developed a similar system for basketball players in his Pro Basketball Forecast series of books, and several APBRmetricians have expanded on his methodology. Similarity scores are also used extensively in many statistical forecasting programs.