Perceptual Evaluation of Speech Quality is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. It is standardized as ITU-T recommendation P.862. Today, PESQ is a worldwide applied industry standard for objective voice quality testing used by phone manufacturers, network equipment vendors and telecom operators. Its usage requires a license.
Measurement scope
PESQ was particularly developed to model subjective tests commonly used in telecommunications to assess the voice quality by human beings. Consequently, PESQ employs true voice samples as test signals. In order to characterize the listening quality as perceived by users, it is of paramount importance to load modern telecom equipment with speech-like signals. Many systems are optimized for speech and would respond in an unpredictable way to non-speech signals. Guidelines for proper applications of voice test samples are defined in the PESQ application guide ITU-T P.862.3.
ITU-T’s family of full reference objective voice quality measurements started in 1997 with P.861, which was superseded by P.862 in 2001. P.862 was later complemented with the recommendations P.862.1, P.862.2 and P.862.3. Since 2011 P.863 is in force. Two additional implementer’s guides for P.863 have been consented by ITU-T Study Group 12 in November 2011. In addition to the above listed full reference methods, the list of ITU-T’s objective voice quality measurement standards also includes P.563.
Testing typology
Depending on the information that is made available to an algorithm, voice-quality test algorithms can be divided into two main categories:
A "full reference" algorithm has access to and makes use of the original reference signal for a comparison. It can compare each sample of the reference signal to each corresponding sample of the degraded signal. FR measurements deliver the highest accuracy and repeatability but can only be applied for dedicated tests in live networks.
A "no reference" algorithm only uses the degraded signal for the quality estimation and has no information of the original reference signal. NR algorithms are low-accuracy estimates only, as the originating voice characteristics of the source reference is completely unknown. A common variant of NR algorithms doesn't even analyze the decoded audio signal but works on an analysis of the digital bit stream on an IP packet level. The measurement is consequently limited to a transport-stream analysis.
PESQ is a full-reference algorithm and analyzes the speech signal sample-by-sample after a temporal alignment of corresponding excerpts of reference and test signal. PESQ can be applied to provide an end-to-end quality assessment for a network, or characterize individual network components. PESQ results principally model mean opinion scores that cover a scale from 1 to 5. A mapping function to MOS-LQO is outlined under P.862.1.