Judgment of Princeton


The Judgment of Princeton was a wine tasting event held on 8 June 2012 during a conference of the American Association of Wine Economists held at Princeton University in Princeton, New Jersey. The purpose of this event was to compare, by a blind tasting, of several French wines against wines produced in New Jersey in order to gauge the quality and development of the New Jersey wine industry. Because New Jersey's wine industry is relatively young and small, it has received little attention in the world wine market. The state's wine production has experienced growth in recent years largely as a result of state legislators offering new opportunities for winery licensing and repealing Prohibition-era laws that have constrained the industry's development in past years. This event was modeled after a 1976 blind tasting event dubbed the "Judgment of Paris" in which French wines were compared to several wines produced in California when that state's wine industry was similarly young and developing. The New Jersey wine industry heralded the results and asserted that the rating of New Jersey wines by the blind tasting's judges was a victory for the state's wine industry.

Details

The Judgment of Princeton, held at Princeton University on Friday, June 8, 2012, was a structured blind tasting of top New Jersey wines against top French wines from Bordeaux and Burgundy. The event was based on the famous 1976 Judgment of Paris, in which California wines famously beat French wines in a blind tasting. The Judgment of Princeton was spearheaded by George M. Taber, who had been in Paris for the original Judgment of Paris and later written a book on the subject. Along with Taber, the tasting was organized and carried out by economists Orley Ashenfelter, Richard E. Quandt, Karl Storchmann, and Mark Censits, owner of CoolVines, a local wine and spirits shop, who acted in the role of merchant Steven Spurrier, gathering the competition wines from the NJ winemakers and selecting and sourcing the French wines against which they were to be pitted. The French wines were sourced from the same estates as the original wines of the Paris tasting. The event also included other members of the American Association of Wine Economists, who then posted the data set from the tastings online as an open invitation to further analysis.

The judges

Of the nine judges in Princeton, five were American, three French, and one Belgian. They are listed here in alphabetical order.
NameAffiliationNationality
Jean-Marie CardebatUniversité de BordeauxFrance
Tyler ColmanDrVino.comUSA
John FoyThe Star-Ledger, thewineodyssey.comUSA
Olivier GergaudBEM Management SchoolFrance
Robert HodgsonFieldbrook WineryUSA
Danièle MeuldersUniversité Libre de BruxellesBelgium
Linda MurphyDecanter, American WineUSA
Jamal RayyisGilbert & Gaillard Wine MagazineUSA
Francis SchottStage Left Restaurant, RestaurantGuysRadio.comUSA

Controversy

The judges were told, in advance, similar to the set up in the Judgment of Paris, that six wines in each flight of ten were from New Jersey. Subsequently, several of the judges complained about the revelation of their judgments, as also occurred in the Judgment of Paris.

Interpretation of results

In 1999, Quandt and Ashenfelter published a paper in the journal "Chance" that questioned the statistical interpretation of the results of the 1976 Judgment of Paris. The authors noted that a "side-by-side chart of best-to-worst rankings of 18 wines by a roster of experienced tasters showed about as much consistency as a table of random numbers," and reinterpreted the data, altering the results slightly, using a formula that they argued was more statistically valid. Quandt’s later paper "On Wine Bullshit" poked fun at the seemingly random strings of adjectives that often accompanied experts' published wine ratings. More recent work by Robin Goldstein, Hilke Plassmann, Robert Hodgson, and other economists and behavioral scientists has shown high variability and inconsistency both within and between blind tasters; and little correlation has been found between price and preference, even among wine experts, in tasting settings in which labels and prices have been concealed.

Methodology

The blind tasting panel was made up of nine expert judges, with each wine graded out of 20 points. The tasting was performed behind closed doors at Princeton University, and results were kept secret from the judges until they were analyzed by Quandt and announced later that day. According to an algorithm devised by Quandt, each judge's set of ratings was converted to a set of personal rankings, which were in turn tabulated cumulatively by “votes against," with a lower score better and a higher score worse. The data were then tested by Quandt for statistically significant differences between tasters and wines using the same software he had previously employed to re-analyze the Judgment of Paris results.

The reveal

Shortly after the tasting was completed and the results tabulated, Taber, Quandt, and Ashenfelter announced the results to an audience of media, New Jersey winemakers, wine economists, and the judges themselves. The event took place in an auditorium at Princeton’s Woodrow Wilson School of Public and International Affairs as part of the American Association of Wine Economists’ annual meeting. Due to the technical limitations of Quandt's custom-built, floppy-disk-powered FORTRAN system, it was necessary for Goldstein to scrawl the results onto a giant chalkboard, eliciting murmurs of disapproval from the audience over his poor handwriting.

Results

White wines

“Votes against” in the Ashenfelter-Quandt methodology are indicated here. Only one wine was significantly better, statistically, than the other wines: the Beaune 1er Cru Clos de Mouches 2010, the cheapest of the four white Burgundies in the lot. The rest of the wines were statistically indistinguishable from each other based on the data, meaning that no conclusions can be drawn from the rankings of wines #2 to #10.
Significantly better than the other wines:
RankVotes AgainstWineryWineVintageOrigin
1.33.5Joseph DrouhinBeaune Clos des Mouches2009France

Not statistically distinguishable from each other:
RankVotes AgainstWineryWineVintageOrigin
2.38Unionville VineyardsPheasant Hill Chardonnay2010New Jersey
3.45.5Heritage VineyardsChardonnay2010New Jersey
4.47.5Silver Decoy WineryBlack Feather Chardonnay2010New Jersey
5.52Domaine LeflaivePuligny-Montrachet2009France
6. 53Bellview WineryChardonnay2010New Jersey
6. 53Marc-Antonin BlainBâtard-Montrachet Grand Cru2009France
8.54.5Amalthea CellarsChardonnay2008New Jersey
9.57.5Ventimiglia VineyardChardonnay2010New Jersey
10.60.5Jean Latour-LabilleMeursault-Charmes Premier Cru2008France

Red wines

“Votes against” in the Ashenfelter-Quandt methodology are indicated. The only wine that was significantly worse, statistically, than the other wines was #10, the Four JG’s Cabernet Franc 2008, from New Jersey. The rest of the wines were statistically indistinguishable from each other based on the data, meaning that no conclusions can be drawn from the rankings of wines #1 to #9.
Not statistically distinguishable from each other:
RankVotes AgainstWineryWineVintageOrigin
1.35Château Mouton-RothschildPauillac2004France
2.40Château Haut-BrionPessac-Léognan2004France
3.40.5Heritage VineyardsBDX 2010New Jersey
4.46Château MontroseSaint-Estèphe2004France
5.49Tomasello WineryCabernet Sauvignon Oak Reserve2007New Jersey
6.50.5Château Léoville-Las CasesSaint-Julien2004France
7.52Bellview WineryLumière 2010New Jersey
8.54Silver Decoy WineryCabernet Franc2008New Jersey
9.55Amalthea CellarsEuropa VI 2008New Jersey

Significantly worse than the other wines:
RankVotes AgainstWineryWineVintageOrigin
10.73Four JG's Orchards & VineyardsCabernet Franc2008New Jersey