Programme for International Student Assessment


The Programme for International Student Assessment is a worldwide study by the Organisation for Economic Co-operation and Development in member and non-member nations intended to evaluate educational systems by measuring 15-year-old school pupils' scholastic performance on mathematics, science, and reading. It was first performed in 2000 and then repeated every three years. Its aim is to provide comparable data with a view to enabling countries to improve their education policies and outcomes. It measures problem solving and cognition.
The results of the 2018 data collection were released on Tuesday 3 December 2019.

Influence and impact

PISA, and similar international standardised assessments of educational attainment are increasingly used in the process of education policymaking at both national and international levels.
PISA was conceived to set in a wider context the information provided by national monitoring of education system performance through regular assessments within a common, internationally agreed framework; by investigating relationships between student learning and other factors they can "offer insights into sources of variation in performances within and between countries".
Until the 1990s, few European countries used national tests. In the 1990s, ten countries / regions introduced standardised assessment, and since the early 2000s, ten more followed suit. By 2009, only five European education systems had no national student assessments.
The impact of these international standardised assessments in the field of educational policy has been significant, in terms of the creation of new knowledge, changes in assessment policy, and external influence over national educational policy more broadly.

Creation of new knowledge

Data from international standardised assessments can be useful in research on causal factors within or across education systems. Mons notes that the databases generated by large-scale international assessments have made it possible to carry out inventories and comparisons of education systems on an unprecedented scale* on themes ranging from the conditions for learning mathematics and reading, to institutional autonomy and admissions policies. They allow typologies to be developed that can be used for comparative statistical analyses of education performance indicators, thereby identifying the consequences of different policy choices. They have generated new knowledge about education: PISA findings have challenged deeply embedded educational practices, such as the early tracking of students into vocational or academic pathways.
Barroso and de Carvalho find that PISA provides a common reference connecting academic research in education and the political realm of public policy, operating as a mediator between different strands of knowledge from the realm of education and public policy. However, although the key findings from comparative assessments are widely shared in the research community the knowledge they create does not necessarily fit with government reform agendas; this leads to some inappropriate uses of assessment data.

Changes in national assessment policy

Emerging research suggests that international standardised assessments are having an impact on national assessment policy and practice. PISA is being integrated into national policies and practices on assessment, evaluation, curriculum standards and performance targets; its assessment frameworks and instruments are being used as best-practice models for improving national assessments; many countries have explicitly incorporated and emphasise PISA-like competencies in revised national standards and curricula; others use PISA data to complement national data and validate national results against an international benchmark.

External influence over national educational policy

More important than its influence on countries' policy of student assessment, is the range of ways in which PISA is influencing countries education policy choices.
Policy-makers in most participating countries see PISA as an important indicator of system performance; PISA reports can define policy problems and set the agenda for national policy debate; policymakers seem to accept PISA as a valid and reliable instrument for internationally benchmarking system performance and changes over time; most countries—irrespective of whether they performed above, at, or below the average PISA score—have begun policy reforms in response to PISA reports.
Against this, impact on national education systems varies markedly. For example, in Germany, the results of the first PISA assessment caused the so-called 'PISA shock': a questioning of previously accepted educational policies; in a state marked by jealously guarded regional policy differences, it led ultimately to an agreement by all Länder to introduce common national standards and even an institutionalised structure to ensure that they were observed. In Hungary, by comparison, which shared similar conditions to Germany, PISA results have not led to significant changes in educational policy.
Because many countries have set national performance targets based on their relative rank or absolute PISA score, PISA assessments have increased the influence of their commissioning body, the OECD, as an international education monitor and policy actor, which implies an important degree of 'policy transfer' from the international to the national level; PISA in particular is having "an influential normative effect on the direction of national education policies". Thus, it is argued that the use of international standardised assessments has led to a shift towards international, external accountability for national system performance; Rey contends that PISA surveys, portrayed as objective, third-party diagnoses of education systems, actually serve to promote specific orientations on educational issues.
National policy actors refer to high-performing PISA countries to "help legitimise and justify their intended reform agenda within contested national policy debates". PISA data can be are "used to fuel long-standing debates around pre-existing conflicts or rivalries between different policy options, such as in the French Community of Belgium". In such instances, PISA assessment data are used selectively: in public discourse governments often only use superficial features of PISA surveys such as country rankings and not the more detailed analyses. Rey notes that often the real results of PISA assessments are ignored as policymakers selectively refer to data in order to legitimise policies introduced for other reasons.
In addition, PISA's international comparisons can be used to justify reforms with which the data themselves have no connection; in Portugal, for example, PISA data were used to justify new arrangements for teacher assessment ; they also fed the government's discourse about the issue of pupils repeating a year,. In Finland, the country's PISA results were used by Ministers to promote new policies for 'gifted' students. Such uses and interpretations often assume causal relationships that cannot legitimately be based upon PISA data which would normally require fuller investigation through qualitative in-depth studies and longitudinal surveys based on mixed quantitative and qualitative methods, which politicians are often reluctant to fund.
Recent decades have witnessed an expansion in the uses of PISA and similar assessments, from assessing students' learning, to connecting "the educational realm with the political realm". This raises the question of whether PISA data are sufficiently robust to bear the weight of the major policy decisions that are being based upon them, for, according to Breakspear, PISA data have "come to increasingly shape, define and evaluate the key goals of the national / federal education system". This implies that those who set the PISA tests – e.g. in choosing the content to be assessed and not assessed – are in a position of considerable power to set the terms of the education debate, and to orient educational reform in many countries around the globe.

Framework

PISA stands in a tradition of international school studies, undertaken since the late 1950s by the International Association for the Evaluation of Educational Achievement. Much of PISA's methodology follows the example of the Trends in International Mathematics and Science Study, which in turn was much influenced by the U.S. National Assessment of Educational Progress. The reading component of PISA is inspired by the IEA's Progress in International Reading Literacy Study.
PISA aims to test literacy the competence of students in three fields: reading, mathematics, science on an indefinite scale.
The PISA mathematics literacy test asks students to apply their mathematical knowledge to solve problems set in real-world contexts. To solve the problems students must activate a number of mathematical competencies as well as a broad range of mathematical content knowledge. TIMSS, on the other hand, measures more traditional classroom content such as an understanding of fractions and decimals and the relationship between them. PISA claims to measure education's application to real-life problems and lifelong learning.
In the reading test, "OECD/PISA does not measure the extent to which 15-year-old students are fluent readers or how competent they are at word recognition tasks or spelling." Instead, they should be able to "construct, extend and reflect on the meaning of what they have read across a wide range of continuous and non-continuous texts."
PISA also assesses students in innovative domains. In 2012 and 2015 in addition to reading, mathematics and science, they were tested in collaborative problem solving. In 2018 the additional innovative domain was global competence.

Implementation

PISA is sponsored, governed, and coordinated by the OECD, but paid for by participating countries.

Method of testing

Sampling

The students tested by PISA are aged between 15 years and 3 months and 16 years and 2 months at the beginning of the assessment period. The school year pupils are in is not taken into consideration. Only students at school are tested, not home-schoolers. In PISA 2006, however, several countries also used a grade-based sample of students. This made it possible to study how age and school year interact.
To fulfill OECD requirements, each country must draw a sample of at least 5,000 students. In small countries like Iceland and Luxembourg, where there are fewer than 5,000 students per year, an entire age cohort is tested. Some countries used much larger samples than required to allow comparisons between regions.

Test

Each student takes a two-hour computer based test. Part of the test is multiple-choice and part involves fuller answers. There are six and a half hours of assessment material, but each student is not tested on all the parts. Following the cognitive test, participating students spend nearly one more hour answering a questionnaire on their background including learning habits, motivation, and family. School directors fill in a questionnaire describing school demographics, funding, etc. In 2012 the participants were, for the first time in the history of large-scale testing and assessments, offered a new type of problem, i.e. interactive problems requiring exploration of a novel virtual device.
In selected countries, PISA started experimentation with computer adaptive testing.

National add-ons

Countries are allowed to combine PISA with complementary national tests.
Germany does this in a very extensive way: On the day following the international test, students take a national test called PISA-E. Test items of PISA-E are closer to TIMSS than to PISA. While only about 5,000 German students participate in the international and the national test, another 45,000 take the national test only. This large sample is needed to allow an analysis by federal states. Following a clash about the interpretation of 2006 results, the OECD warned Germany that it might withdraw the right to use the "PISA" label for national tests.

Data scaling

From the beginning, PISA has been designed with one particular method of data analysis in mind. Since students work on different test booklets, raw scores must be 'scaled' to allow meaningful comparisons. Scores are thus scaled so that the OECD average in each domain is 500 and the standard deviation is 100. This is true only for the initial PISA cycle when the scale was first introduced, though, subsequent cycles are linked to the previous cycles through IRT scale linking methods.
This generation of proficiency estimates is done using a latent regression extension of the Rasch model, a model of item response theory, also known as conditioning model or population model. The proficiency estimates are provided in the form of so-called plausible values, which allow unbiased estimates of differences between groups. The latent regression, together with the use of a Gaussian prior probability distribution of student competencies allows estimation of the proficiency distributions of groups of participating students. The scaling and conditioning procedures are described in nearly identical terms in the Technical Reports of PISA 2000, 2003, 2006. NAEP and TIMSS use similar scaling methods.

Ranking results

All PISA results are tabulated by country; recent PISA cycles have separate provincial or regional results for some countries. Most public attention concentrates on just one outcome: the mean scores of countries and their rankings of countries against one another. In the official reports, however, country-by-country rankings are given not as simple league tables but as cross tables indicating for each pair of countries whether or not mean score differences are statistically significant. In favorable cases, a difference of 9 points is sufficient to be considered significant.
PISA never combines mathematics, science and reading domain scores into an overall score. However, commentators have sometimes combined test results from all three domains into an overall country ranking. Such meta-analysis is not endorsed by the OECD, although official summaries sometimes use scores from a testing cycle's principal domain as a proxy for overall student ability.

PISA 2018 ranking summary

The results of PISA 2018 were presented on 3 December 2019, which included data for around 600,000 participating students in 79 countries and economies, with China's economic area of Beijing, Shanghai, Jiangsu and Zhejiang emerging as the top performer in all categories. Note that this does not represent the entirety of mainland China.

PISA 2015 ranking summary

PISA 2015 was presented on 6 December 2016, with results for around 540,000 participating students in 72 countries, with Singapore emerging as the top performer in all categories.

Rankings comparison 2003–2015

Previous years

PeriodFocusOECD countriesPartner countriesParticipating studentsNotes
2000Reading284 + 11265,000The Netherlands disqualified from data analysis. 11 additional non-OECD countries took the test in 2002.
2003Mathematics3011275,000UK disqualified from data analysis. Also included test in problem solving.
2006Science3027400,000Reading scores for US disqualified from analysis due to misprint in testing materials.
2009Reading3441 + 10470,00010 additional non-OECD countries took the test in 2010.
2012Mathematics3431510,000

Reception

China

China's participation in the 2012 test was limited to Shanghai, Hong Kong, and Macau as separate entities. In 2012, Shanghai participated for the second time, again topping the rankings in all three subjects, as well as improving scores in the subjects compared to the 2009 tests. Shanghai's score of 613 in mathematics was 113 points above the average score, putting the performance of Shanghai pupils about 3 school years ahead of pupils in average countries. Educational experts debated to what degree this result reflected the quality of the general educational system in China, pointing out that Shanghai has greater wealth and better-paid teachers than the rest of China. Hong Kong placed second in reading and science and third in maths.
In 2018 the Chinese provinces that participated were Beijing, Shanghai, Jiangsu and Zhejiang. In 2015, the participating provinces were Jiangsu, Guangdong, Beijing, and Shanghai. The 2015 Beijing-Shanghai-Jiangsu-Guangdong cohort scored a median 518 in science in 2015, while the 2012 Shanghai cohort scored a median 580.
Critics of PISA counter that in Shanghai and other Chinese cities, most children of migrant workers can only attend city schools up to the ninth grade, and must return to their parents' hometowns for high school due to hukou restrictions, thus skewing the composition of the city's high school students in favor of wealthier local families. A population chart of Shanghai reproduced in The New York Times shows a steep drop off in the number of 15-year-olds residing there. According to Schleicher, 27% of Shanghai's 15-year-olds are excluded from its school system. As a result, the percentage of Shanghai's 15-year-olds tested by PISA was 73%, lower than the 89% tested in the US. Following the 2015 testing, OECD published in depth studies on the education systems of a selected few countries including China.

Finland

Finland, which received several top positions in the first tests, fell in all three subjects in 2012, but remained the best performing country overall in Europe, achieving their best result in science with 545 points and worst in mathematics with 519 in which the country was outperformed by four other European countries. The drop in mathematics was 25 points since 2003, the last time mathematics was the focus of the tests. For the first time Finnish girls outperformed boys in mathematics, but only narrowly. It was also the first time pupils in Finnish-speaking schools did not perform better than pupils in Swedish-speaking schools. Minister of Education and Science Krista Kiuru expressed concern for the overall drop, as well as the fact that the number of low-performers had increased from 7% to 12%.

India

India participated in the 2009 round of testing but pulled out of the 2012 PISA testing, with the Indian government attributing its action to the unfairness of PISA testing to Indian students. The Indian Express reported, "The ministry has concluded that there was a socio-cultural disconnect between the questions and Indian students. The ministry will write to the OECD and drive home the need to factor in India's "socio-cultural milieu". India's participation in the next PISA cycle will hinge on this". The Indian Express also noted that "Considering that over 70 nations participate in PISA, it is uncertain whether an exception would be made for India".
India did not participate in the 2012, 2015 and 2018 PISA rounds.
A Kendriya Vidyalaya Sangathan committee as well as a group of secretaries on education constituted by the Prime Minister of India Narendra Modi recommended that India should participate in PISA. Accordingly, in February 2017, the Ministry of Human Resource Development under Prakash Javadekar decided to end the boycott and participate in PISA from 2020. To address the socio-cultural disconnect between the test questions and students, it was reported that the OECD will update some questions. For example, the word avocado in a question may be replaced with a more popular Indian fruit such as mango.

Malaysia

In 2015, the results from Malaysia were found by the OECD to have not met the maximum response rate. Opposition politician Ong Kian Ming said the education ministry tried to oversample high-performing students in rich schools.

Sweden

Sweden's result dropped in all three subjects in the 2012 test, which was a continuation of a trend from 2006 and 2009. In mathematics, the nation had the sharpest fall in mathematics performance over 10 years among the countries that have participated in all tests, with a drop in score from 509 in 2003 to 478 in 2012. The score in reading showed a drop from 516 in 2000 to 483 in 2012. The country performed below the OECD average in all three subjects. The leader of the opposition, Social Democrat Stefan Löfven, described the situation as a national crisis. Along with the party's spokesperson on education, Ibrahim Baylan, he pointed to the downward trend in reading as most severe.
In 2020, Swedish newspaper Expressen revealed that Sweden had inflated their score in PISA 2018 by not conforming to OECD standards. According to professor Magnus Henrekson a large number of foreign-born students had not been tested.

United Kingdom

In the 2012 test, as in 2009, the result was slightly above average for the United Kingdom, with the science ranking being highest. England, Wales, Scotland and Northern Ireland also participated as separated entities, showing the worst result for Wales which in mathematics was 43rd of the 65 countries and economies. Minister of Education in Wales Huw Lewis expressed disappointment in the results, said that there were no "quick fixes", but hoped that several educational reforms that have been implemented in the last few years would give better results in the next round of tests. The United Kingdom had a greater gap between high- and low-scoring students than the average. There was little difference between public and private schools when adjusted for socio-economic background of students. The gender difference in favour of girls was less than in most other countries, as was the difference between natives and immigrants.
Writing in the Daily Telegraph, Ambrose Evans-Pritchard warned against putting too much emphasis on the UK's international ranking, arguing that an overfocus on scholarly performances in East Asia might have contributed to the area's low birthrate, which he argued could harm the economic performance in the future more than a good PISA score would outweigh.
In 2013, the Times Educational Supplement published an article, "Is PISA Fundamentally Flawed?" by William Stewart, detailing serious critiques of PISA's conceptual foundations and methods advanced by statisticians at major universities.
In the article, Professor Harvey Goldstein of the University of Bristol was quoted as saying that when the OECD tries to rule out questions suspected of bias, it can have the effect of "smoothing out" key differences between countries. "That is leaving out many of the important things," he warned. "They simply don't get commented on. What you are looking at is something that happens to be common. But worth looking at? PISA results are taken at face value as providing some sort of common standard across countries. But as soon as you begin to unpick it, I think that all falls apart."
Queen's University Belfast mathematician Dr. Hugh Morrison stated that he found the statistical model underlying PISA to contain a fundamental, insoluble mathematical error that renders Pisa rankings "valueless". Goldstein remarked that Dr. Morrison's objection highlights "an important technical issue" if not a "profound conceptual error". However, Goldstein cautioned that PISA has been "used inappropriately", contending that some of the blame for this "lies with PISA itself. I think it tends to say too much for what it can do and it tends not to publicise the negative or the weaker aspects." Professors Morrison and Goldstein expressed dismay at the OECD's response to criticism. Morrison said that when he first published his criticisms of PISA in 2004 and also personally queried several of the OECD's "senior people" about them, his points were met with "absolute silence" and have yet to be addressed. "I was amazed at how unforthcoming they were," he told TES. "That makes me suspicious." "Pisa steadfastly ignored many of these issues," he says. "I am still concerned."
Professor Svend Kreiner, of the University of Copenhagen, agreed: "One of the problems that everybody has with PISA is that they don't want to discuss things with people criticising or asking questions concerning the results. They didn't want to talk to me at all. I am sure it is because they can't defend themselves.

United States

Since 2012 a few states have participated in the PISA tests as separate entities. Only the 2012 and 2015 results are available on a state basis. Puerto Rico participated in 2015 as a separate US entity as well.
PISA results for the United States by race and ethnicity.

Research on possible causes of PISA disparities in different countries

Although PISA and TIMSS officials and researchers themselves generally refrain from hypothesizing about the large and stable differences in student achievement between countries, since 2000, literature on the differences in PISA and TIMSS results and their possible causes has emerged. Data from PISA have furnished several researchers, notably Eric Hanushek, Ludger Wößmann, Heiner Rindermann, and Stephen J. Ceci, with material for books and articles about the relationship between student achievement and economic development, democratization, and health; as well as the roles of such single educational factors as high-stakes exams, the presence or absence of private schools and the effects and timing of ability tracking.

Comments on accuracy

of Cambridge wrote: "Pisa does present the uncertainty in the scores and ranks - for example the United Kingdom rank in the 65 countries is said to be between 23 and 31. It's unwise for countries to base education policy on their Pisa results, as Germany, Norway and Denmark did after doing badly in 2001."
According to Forbes, in some countries PISA selects a sample from only the best-educated areas or from their top-performing students, slanting the results. China, Hong Kong, Macau, Taiwan, Singapore and Argentina were only some of the examples.
According to an open letter to Andreas Schleicher, director of PISA, various academics and educators argued that "OECD and Pisa tests are damaging education worldwide".
According to O Estado de São Paulo, Brazil shows a great disparity when classifying the results between public and private schools, where public schools would rank worse than Peru, while private schools would rank better than Finland.