Competition passes the testJAY P. GREENE & MARCUS A. WINTERS
Still more evidence that public schools improve when threatened with the loss of students and money.
Florida’s A+ program affords a unique opportunity to test these competing predictions. The A+ program offers all the students in schools that chronically fail the Florida Comprehensive Assessment Test (FCAT) the opportunity to use a voucher to transfer to a private school. Schools face the threat of vouchers only if they are failing. They can remove the threat by improving their test scores. Comparing the performance of schools that were threatened with vouchers and the performance of those that faced no such threat gives a measure of how public schools respond to competition.
The A+ Program
All public school students in Florida enrolled in grades 3 through 10 take FCAT exams in math, reading, and writing. Test results have consequences for both students and schools. Students must pass the reading portion of the FCAT in order to be promoted to 4th grade, and they must pass the 10th-grade test to graduate. In addition, all Florida schools are graded from A to F based on the share of their student bodies that scores at high levels on the FCAT and experiences gains in their test scores from year to year.
A school’s grade is lowered a level if less than half of its worst students (those in the bottom 25 percent at the school) make a year’s worth of learning gains. In order to receive a grade, schools must test at least 90 percent of their students; otherwise, they receive an Incomplete and, after an investigation, the state commissioner of education assigns a grade to the school.
Schools that receive a grade of F twice during any four-year period are deemed chronically failing. Their students then become eligible to receive vouchers, called opportunity scholarships, which they can use at another public school or at a private school. The vouchers are worth the lesser of per-pupil spending in the public schools or the cost of attending the chosen private school.
Schools can take themselves off the chronically failing list by earning higher grades in future years. However, students who use vouchers to attend private schools can keep their vouchers until either they return to a public school or the grade levels offered by the private school run out. For example, if a student uses a voucher to attend 6th grade at a K–8 private school and the failing public school manages to turn things around the next year, the student may keep his voucher until he completes the 8th grade. Thereafter, if his family wants to keep him in private school it must do so at its own expense.
Entering the 2002–03 administration of the FCAT, the focus of this study, 129 schools had received at least one F. Students in ten schools had become eligible for vouchers since the grading of schools began during the 1998–99 school year.
Florida does offer failing schools special funding that may temper any financial loss they suffer from students’ choosing to transfer into private schools. The lowest-performing schools are given priority when applying for certain grants, and the state has earmarked funds to recruit teachers to work in schools that received D and F grades. However, since such funds are temporary solutions, they do not dramatically reduce the financial incentive for failing schools to remove themselves from voucher competition by improving their performance on the FCAT.
Five Categories of Schools
To analyze the program’s impact on public schools, we collected school-level test scores on the 2001–02 and 2002–03 administrations of the FCAT and the Stanford-9, a national norm-referenced test that is given to all Florida public school students around the same time as the FCAT. The results from the Stanford-9 are particularly useful for our analysis. Schools are not held accountable for their students’ performance on the Stanford-9. As a result, they have little incentive to manipulate the results by “teaching to the test” or through outright cheating. Thus, if gains are witnessed on both the FCAT and the Stanford-9, we can be reasonably confident that the gains reflect genuine improvements in student learning.
Florida’s system of school grades and sanctions gives schools differing incentives. We thus separated schools into five categories based on their grades and the degree of actual or potential competition they faced from vouchers. We then compared the performance of the schools in these categories with the performance of the rest of Florida’s public schools, looking at each category’s change in FCAT and Stanford-9 scores from the 2001–02 school year to 2002–03. The five categories are:
We compared the change in test-score performance for each of these groups relative to the rest of Florida public schools between the 2001–02 and 2002–03 administrations of the FCAT and the Stanford-9. Our method was to follow cohorts of students in grades 3 through 10 and calculate the schoolwide change in test scores. For example, we subtracted a school’s 3rd-grade reading score on the 2001–02 FCAT from its 4th-grade reading score on the 2002–03 FCAT to get the change in scores for 4th graders at that school. Following cohorts measures the performance of roughly the same students on the test over time. We then averaged the change in test scores for each cohort in the school on each test and subject. This gave us a single cohort change for each school in Florida.
The changes in performance reported below for each group of schools have all been adjusted to take into account any changes between 2001–02 and 2002–03 in schools’ demographic characteristics, such as the share of students participating in the federal school lunch program and the ethnic breakdown of the student body. Unfortunately, we were not able to control for changes in the number of students who spoke limited English or in the school’s operating cost per pupil, because at the time of the study such information was available only up to the 2001–02 school year. We instead controlled only for the percentage of students who spoke limited English and the level of spending per pupil in 2001–02.
The inability to control for changes in spending may seem particularly troublesome. However, a similar analysis of the A+ program in a previous year found that taking into account changes in spending had no effect on the results. Furthermore, if any relative improvements made by schools competing with vouchers were the result of school districts’ diverting funds to these schools, this could be seen as part of the voucher effect.
Between the 2001–02 and 2002–03 administrations of the FCAT, voucher-eligible schools made the largest gain among the five categories of schools. In mathematics they improved by 15.1 scale-score points more than the rest of Florida’s public schools (see Figure 1). (Results on the FCAT are reported as the cohort change in mean scale score on a scale from 100 to 500. The median school in Florida had a mean scale score of 291 on the reading test and 300 on the math test. Schools at the 5th percentile of schools in Florida had a reading scale score of 243 and a math scale score of 247, while the 95th percentile school had a reading score of 327 and a math score of 328.) On the Stanford-9 math test, voucher-eligible schools achieved gains that were 5.9 percentile points greater than the year-to-year gains achieved by other Florida public schools (see Figure 2). Results on the Stanford-9 are reported as the cohort change in national percentile rank.
Voucher-threatened schools made the next highest relative gains: 9.2 scale-score points on the math FCAT and 3.5 percentile points on the Stanford-9 in math. Each of these results is statistically significant at a very high level, meaning that we can be highly confident that the test-score gains made by schools facing the actuality or prospect of voucher competition were larger than the gains made by other public schools. As hypothesized, actual voucher competition produced the largest improvements in test scores, while the prospect of facing voucher competition produced somewhat smaller gains.
The results for the always-D and sometimes-D schools were also consistent with our hypotheses. Always-D schools, which, faced with the real danger of receiving their first F, had some incentive to improve, made a relative gain of 4.3 scale-score points on the math FCAT and 1.3 percentile points on the Stanford-9 math test. The sometimes-D schools experienced year-to-year changes in FCAT math scores that were only 2.4 points higher than all other Florida public schools, significantly less than the gains in both voucher-eligible and voucher-threatened schools. Their improvement relative to all public schools on the Stanford-9 was less than a percentile point. Formerly threatened schools saw no improvement in their math scores relative to all public schools.
The patterns were similar in reading, though the relative gains made by schools facing voucher competition were smaller and sometimes statistically insignificant. Overall on the FCAT reading test, voucher-eligible schools gained 5.2 points more than other schools gained. However, this gain fell barely short of a conventional standard for statistical significance, likely due to the very small number of schools in this category (only nine). Voucher-eligible schools also made a statistically insignificant relative gain of 2.2 percentile points on the Stanford-9.
Voucher-threatened schools actually made the greatest gains on the FCAT reading test: 6.1 points. Their relative gain on the Stanford-9 was a statistically significant 1.7 percentile points.
Always-D schools made no statistically significant gains on the FCAT or Stanford-9 reading tests, while sometimes-D schools experienced a decrease of 1.1 points on the FCAT and no significant change on the Stanford-9 reading test. We also found a relative loss of 3.8 points for formerly threatened schools on the FCAT and a relative loss of 1.6 percentile points on the Stanford-9 (both results were statistically significant).
Overall, the schools facing either the prospect or the reality of vouchers made substantial gains compared with the results achieved by the rest of Florida’s public schools. They also made strong gains relative to those earned by schools serving similar student populations, which had nonetheless avoided receiving an F.
The smaller gains achieved by always-D and sometimes-D schools compared with the performance of voucher-eligible and voucher-threatened schools, despite the similar characteristics of all these schools, strengthen our confidence that voucher competition is the cause of the improvements. Always-D schools, in particular, are very similar to voucher-eligible and voucher-threatened schools in their initial test scores, student populations, and resources, as well as other unobserved factors for which we could not adjust the data. Since it is essentially by chance that always-D schools do not receive an F, the comparison approximates a randomized experiment. Yet the schools that faced voucher competition experienced much larger increases in test scores.
Moreover, the similarity of our findings on the Stanford-9 and FCAT math tests suggests that the gains being made by schools facing voucher competition are the result of real learning and not simply manipulations of the state’s high-stakes testing system (see Figure 2). If schools facing voucher competition were only appearing to improve by somehow manipulating Florida’s high-stakes testing system, we would not have seen a corresponding improvement on another test that no one had incentives to manipulate.
Other Possible Explanations
Could the gains witnessed among voucher-eligible and voucher-threatened schools actually be the product of some influence other than their being forced to compete against private schools?
Let’s first consider the possibility that it was the stigma of being labeled a failure, rather than the competitive incentives introduced by vouchers, that spurred improvement among F schools, as several researchers have suggested. If this were the case, we would expect to see similar gains among formerly threatened schools, which have also received at least one failing grade. Quite the contrary, however: formerly threatened schools made no gains in math and experienced losses in reading. In other words, formerly threatened schools still had the stigma of an F grade, but once the threat of vouchers was removed, they actually lost ground (see Figure 2).
Nonetheless, it is possible that the stigma of the F grade fades over time. In that event, schools that received an F in 1999 might no longer feel the stigma in 2003. But although the voucher-eligible and voucher-threatened school categories include some schools that received their most recent F in 2000, those categories experienced gains. We find it implausible that the stigma effect exists for only three years and then suddenly disappears. The more compelling explanation is that the actuality or prospect of voucher competition provides incentives for schools to improve, an effect that disappears when the four-year window expires.
Another potential explanation for the exceptional gains made by schools facing voucher competition is that their extremely low initial scores are affected by a statistical tendency called “regression to the mean.” Schools that report very high and very low scores may report future scores that come closer to the average for the whole population. This tendency is created by nonrandom error in the test scores, which can be especially troublesome when scores are “bumping” against the top or the bottom of the test-score scale. For instance, if a school earns a score of 2 on a scale from 0 to 100, it is hard for students to do worse by chance but easier for them to do better by chance. Schools that are near the bottom of the scale are likely to improve, even if only by statistical fluke.
To test for this possibility, we compared the gains made by F schools with the performance of an even smaller subset of schools whose 2002 test scores were similar but had never received an F (which we termed low-performing non-F schools). If there were no difference between the gains experienced by F schools and those among low-performing non-F schools, we might worry that regression to the mean was driving our results.
In mathematics, the gains made by voucher-eligible and voucher-threatened schools relative to low-performing non-F schools on both the FCAT and the Stanford-9 were nearly as large as their gains relative to all other schools in the state. Thus in math there seems to be no effect from regression to the mean. In reading, however, we found no difference in the test-score gains achieved by F schools and low-performing non-F schools, suggesting that regression to the mean could be influencing our results in reading.
Even so, it seems unlikely that regression to the mean is the entire story, even in reading. The very fact that fewer schools were included in this section of the analysis made it less likely that significant differences would emerge. Moreover, the low-performing non-F schools actually had average test scores that were lower than those among the F schools. These schools also clearly faced pressure to improve in order to avoid the voucher threat, even if that threat was less immediate. Many of the schools in the low-performing category were also in either the always-D or sometimes-D categories, which were shown above to have made gains relative to all Florida public schools — probably due to the likelihood that they would receive an F if they did not improve.
Having largely ruled out these other explanations, we are left with the conclusion that the gains witnessed among low-performing schools are the result of the competitive pressures introduced by school vouchers. Moreover, the similarity of our findings on both the high-stakes FCAT and the low-stakes Stanford-9 indicates that the gains reflect genuine improvements in learning. In the absence of student-level information, results must remain tentative. Nonetheless, this study yields solid evidence that public schools will react positively to being forced to compete with private schools for students and the dollars they carry.
Daniel T. Willingham. "Competition Passes the Test." Education Next (Summer, 2004).
This article reprinted with permission from Education Next. Education Next is a journal of opinion and research published by the Hoover Institute of Stanford University. For subscription information, click here.
Jay P. Greene is a senior fellow and Marcus A. Winters a research associate at the Manhattan Institute.
Copyright © 2004
Not all articles published on CERC are the objects of official Church teaching, but these are supplied to provide supplementary information.