Why “What Works” Won’t Work and Why “What Works” May Hurt

What do fields like medicine, physics, agriculture, and education have in common? Over the last several decades there has been increasing pressure on teachers and researchers alike to adopt the experimental methodologies and random controlled trials of the laboratory to find out “what works” in education and to elevate teaching to the level of an ”evidence-based profession”. But what does it mean to be “evidence-based”? What should be the relationship of the laboratory to the classroom? And what are the consequences or adverse side-effects of adopting such practices?

What does it mean to be evidence-based?

The top results of a general Google search for “evidence-based practice” yields a number of topics, for example, from the American Journal of Nursing on “The Seven Steps of Evidence-Based Practice”; a page from the American Speech-Language-Hearing Association on EBP, evidence-based practice, as the “integration of clinical expertise/expert opinion, external scientific evidence, and client/patient/caregiver perspectives”; and a link to a 1998 article from the National Institute of Health declaring that “evidence-based practice is spreading in popularity in many health care disciplines”.

So it’s not surprising that Gert Biesta opens his 2007 article “Why ‘What Works’ Won’t Work” with: “The idea that education should be or become an evidence-based practice and that teaching should be or become an evidence-based profession has recently come to prominence in several countries around the world”. Borrowing the language of the medical field, the goal of “evidence-based practice” in education is what Biesta calls a “double transformation” of both educational research and practice, and ultimately an alignment of research, practice, and policy, led by public and private organizations like Office for Standards in Education, researchEd, and the Centre for Evaluation and Monitoring in the UK and the What Works Clearinghouse in the US, which is part of the Institute of Education Sciences within the US Department of Education.

The homepage of the What Works Clearinghouse, for example, implores visitors to “Find What Works based on the evidence” and provides links to categories like Literacy, Mathematics, Behavior, Teacher Excellence, and…Charter Schools. For some reason, arts & humanities, social studies, foreign language, and physical and mental wellness are not available options. Within each category is a list of “interventions”, defined as “an educational program, product, practice, or policy aimed at improving student outcomes” and within each intervention is listed a 6 point “Effectiveness rating” from negative to positive and mixed results in between as well as an “improvement index” defined as “the expected change in percentile rank for an average comparison group student if that student had received the intervention.”

Image for post — Pictured: "Find What Works", with Charter Schools highlighted from What Works Clearinghouse.

It all seems very simple: to improve educational outcomes, choose the area you wish to see improved, select from the menu of interventions the ones with the highest “effectiveness rating” and “improvement index” score and overnight we will have had an educational revolution. Or as one researchEd leader lists in their Twitter bio: “Teaching is simple: Retrieve, Explain, Practice, Repeat.”

This is what Biesta refers to as a causal model of professional action which presumes that educators administer classroom interventions to students in the same way healthcare providers administer medical treatment to patients; that there is a cause in the intervention or treatment and effect in the outcome or results: That a student learns or the patient’s health improves.

It is also a technological model of professional action which, in Biesta’s words, “assumes that the ends of professional action are given, and that the only relevant professional and research questions to be asked are about the most effective and efficient ways of achieving those ends”. That the goal of teaching science, for example, is to improve state assessment scores, and the role of the educational professional is simply to choose from the menu of approved interventions. When scores go up, learning is improved.

Here is where Biesta offers his criticism of education as a causal and technological practice. First, that the causal assumptions of medicine do not translate into education because education is not a physical process, as he points out, “being a student is quite different from that of being a patient — being a student is not an illness, just as teaching is not a cure”, he continues, “Education is a process of symbolic or symbolically mediated interaction. If teaching is to have any effect on learning, it is because of the fact that students interpret and try to make sense of what they are being taught. It is only through processes of mutual interpretation that education is possible”.

Since education does not operate through physical forces — unlike, say, the closed, deterministic systems that mediate the biological and chemical pathways relied upon in medicine, education is an open, recursive system that depends upon mutual interpretation and exchange as students make sense of new information and teachers respond to student sense-making.

So what should be the relationship of the laboratory to the classroom?

In this way, Biesta argues that evidence-based practice can’t tell us about “what works” but only about what has worked. To make evidence-based practices “work” — to create the kind of order which is needed to replicate the experimental conditions of the random controlled trial — would mean transforming learning from an open, recursive system of symbolic sense-making into the closed, deterministic system of inputs and outputs valued in the clinical laboratory, a process Biesta refers to as complexity reduction. Systems use complexity reduction to limit the number of possible actions and options that are available to make processes more efficient and to control for outcomes.

Schools are an example of complexity reduction in education. Schooling takes place in a particular building away from other kinds of work, a school day takes place across a number of distinct periods of definite length and disciplinary purpose guided by standards and curriculum; and learning is assessed and measured by a particular set of assessments. “From the vast number of possible outcomes of schools”, Biesta writes, “only those are selected that are considered to be valuable”.

In both a theoretical and a practical sense, each of the complexity reductions students pass through in a day at school are the result of the collective social construction of value drawing from a range of relevant and appropriate considerations not isolated to instructional effectiveness — how long a day should be, when it should start and end, who should attend school, how should they get there, how large should class sizes be, what will be taught, how will learning be measured, what kind of teacher will teach, what grading system will be used, how evidence of learning will be collected, how behavior will be managed, all of it.

If designing the complexity controls of school is more than managing factual statements about what is possible or “what works”, more than selecting from a menu of interventions, if it is ultimately about implementing educationally desirable policies and practices, then “education should be understood as a moral, non-causal practice, which means that professional judgements in education are ultimately value judgments, not simply technical judgments”.

We also understand that governance of schools is inherently political. School boards in America are elected, policies are voted upon, superintendents are appointed, teachers are hired, and curriculum is designed. As such we should also understand the implementation of socially constructed value-based complexity controls as the exercise of power.

Making decisions about which interventions are implemented when, at what levels, to what extent, for whom, who is excluded, and even deciding who makes these decisions, are all exercises in power, as Biesta writes, “Since any attempt to reduce the number of available options for action for the ‘elements’ within a system is about the exertion of power, complexity reduction should therefore be understood as a political act.”

None of this is to say that evidence-based practices should not be considered and adopted, only that their implementation should be rightfully understood for what it is: not objectively removed from values, power, and policy, but precisely as an exercise of professional judgement and democratic decision-making about what is desirable in education.

“To suggest that research about ‘what works’ can replace normative professional judgment is not only to make an unwarranted leap from ‘is’ to ‘ought’; it is also to deny educational practitioners the right not to act according to evidence about ‘what works’ if they judge that such a line of action would educationally undesirable”.

The What Works Clearinghouse, funded by the US Department of Education, imploring teachers to “Find What Works!” by focusing academic outcomes exclusively on literacy, math, and science while excluding the arts and humanities can be best understood not as a scientific consensus but as a political act.

The inclusion of Charter Schools and SAT/ACT test prep as “evidence-based practice” is a political act.

The exclusion from their website of any mention of responsible citizenship, tolerance, cooperation, and curiosity is a political act.

“This becomes deeply problematic in those cases in which it is argued that professionals should only be allowed to do those things for which there is positive research evidence available — an approach which [has been], in my view, correctly identified as a form of totalitarianism”.

Where there are unintended side-effects to this kind of educational policy and practice is where the lens developed by Professor Yong Zhao comes in handy. I last included Zhao’s work on this podcast via his critical analysis of PISA in which, much like Biesta argues that evidence-based education presumes a causal and technological model of professional action, Zhao outlines the ways in which the use of PISA as an international measuring stick to rank educational systems “imposes a monolithic and West-centric view of societies on the rest of the world” and “distorts the purpose of education”. And where Biesta describes the desire to align educational and clinical health practice along the lines of “what works” in the narrow context of effective intervention, Zhao makes the important case for understanding what he calls the “side-effects” of “what works” to help us understand that “what works may hurt”. From his 2017 article by the same name:

“Medical research is held as a field for education to emulate. Education researchers have been urged to adopt randomized controlled trials, a more ‘scientific’ research method believed to have resulted in the advances in medicine. But a much more important lesson education needs to borrow from medicine has been ignored. That is the study of side effects. Medical research is required to investigate both the intended effects of any medical interventions and their unintended adverse effects, or side effects. In contrast, educational research tends to focus only on proving the effectiveness of practices and policies in pursuit of ‘what works’. It has generally ignored the potential harms that can result from what works.”

Understanding and weighing the risks versus the benefits of any medical treatment is a complicated accounting of a patient’s medical history, presenting conditions, and a provider’s professional judgment, where a patient ultimately consents to treatment or not based on the same kind of complex calculations.

For example, in 2017, nearly 13% of Americans over the age of 12 reported taking some kind of antidepressant in the last month. Selective serotonin re-uptake inhibitors, or SSRIs, are among the most commonly prescribed drug treatments for depression and anxiety, and they work by preventing the body from reabsorbing the “feel good” neurotransmitter serotonin, causing the body the use it more efficiently. However, some people report side-effects — tiredness and insomnia, nausea and dizziness, weight gain and sexual dysfunction, even increased suicidal ideation — that may escalate if their dosage increases, causing them to stop taking the SSRIs entirely. This can lead to an additional series of unpleasant but brief side-effects as the brain readjusts to the changes referred to as antidepressant discontinuation syndrome. SSRIs have undoubtedly helped millions of people effectively manage their depression and anxiety, but we recognize that the benefits of “what works” may not outweigh the side-effects for everyone, and that some patients and their providers may decide to stop treatment in favor of other interventions. Understanding how to manage the adverse consequences of “what works” in medicine helps patients and professionals alike make better choices to improve the quality of treatment and quality of life.

So what if we knew a particular instructional intervention was effective for knowledge transmission yet stifled creativity and problem solving, constrained exploration and discovery, and inhibited curiosity, would we prescribe that intervention in every classroom for every child?

Zhao documents a number of side-effects associated with both upper and lower case Direct Instruction models, and argues that direct instruction proponents have not failed to convince critics “because of their lack of data or rigorous research method” or even doubting the effectiveness of DI but rather that most of the opposition “stems from a different set of concerns such as the rigidity and prescriptiveness of the approach, inconsistency with developmental theories, inappropriateness for certain groups of children and contexts, sustainability of the effects over time, suppression of learner autonomy and development of creativity, and other potential damaging side effects”. Zhao then goes on to catalog a number of studies finding, for example:

A 1979 meta-analysis found that “with direct or traditional teaching, students tend to do slightly better on achievement tests, but they do slightly worse on tests of abstract thinking” and that while with open teaching students may perform slightly worse on tests of achievement, “they do somewhat better on creativity and problem solving” and that “open approaches excel direct or traditional approaches in increasing students’ attitudes toward school and toward the teacher and in improving students’ independence and curiosity.”
A 2011 study based on children’s play with toys found that “Children who were taught a function of a toy performed fewer kinds of actions of the toy and discovered fewer of its other functions, than children who did not receive a pedagogical demonstration”. Their study demonstrated that “direct instruction can be effective ‘in promoting rapid and efficient learning of target material’, but it can negatively impact creativity if “instruction necessarily limits the range of hypotheses children consider”.
A separate 2011 study involving toys and groups of preschool children found that children who were given instructions and directed how to play with toys were less likely to explore and come up with novel solutions and were more likely to imitate the instructor.
A 2014 study of math students found that “students generated more solutions to the problems before instruction than after”, and that students who received instruction first “tended to produce only the correct solutions they were told”.
And a 2012 study used the phrase “unproductive success” to describe the performance of a group of students under direct instruction conditions who were initially more successful in solving well-structured problems, yet on later tasks that required deeper conceptual understandings, their performance was inferior compared to a group of students who were encouraged to solve complex problems first before receiving any instruction in potential solutions.

Zhao’s point is not to label direct instruction a failed pedagogy anymore than we would call SSRIs a failed drug intervention because their side-effects overwhelm some patients. But how can we make a professional decision about “what works” without also understanding the potential for unintended consequences? Zhao argues that we need “both effective ways to transmit knowledge AND foster creativity” and that by presenting side-effects alongside measures of effectiveness we can begin to bridge the gap between proponents and opponents of direct instruction as we work together to build a pedagogy that balances our desire for effective instruction without damaging curiosity, motivation, engagement, and creativity. As Zhao writes:

“A one-time treatment of direct instruction is unlikely to inhibit children’s curiosity and creativity for life. But what if children are exposed to ONLY direct instruction for 12 years or longer? Would it cause them to become less creative?”

It turns out there is some evidence that curiosity has already been such a casualty, as direct instruction approaches dominate the evidence-based research guiding educational policy and decision-making.

Based on self-reported survey data from more than 900,000 5th-12th grade students, a 2015 Gallup report famously found that the number of students “engaged” at school declines from 75% in 5th grade to only 34% by senior year, bottoming out in 11th grade as only 32% report being engaged. Gallup scientists categorized fully 1 in 10 American students as both disengaged AND discouraged, and reported that older students “feel less cared for by adults and see less value in their own work”.

Mirroring this decline in engagement, children ask fewer questions as they proceed through school. Children aged 14mo to 5 years old have been found to ask an average of 107 questions an hour, as a parent of two kids 5 and under I can anecdotally confirm this finding, but by the time those children reach elementary school they may only ask 2 to 5 questions over 2-hours. According to research from developmental psychologist Dr. Susan Engel, fifth graders her team observed in the same 2-hour time frame failed to ask their teacher a single question.

And if we’re really concerned with outcomes and closing achievement gaps, cultivating curiosity might be just what the doctor ordered. As a Jan 2020 headline in The Guardian reads, “Schools are killing curiosity”: why we need to learn to stop telling children to shut up and learn — pupils who ask lots of questions get better results, especially those from poorer homes.”

The article is based in part on the work of a research team from the University of Michigan, led by researcher Prachi Shah, which performed a longitudinal study of 6,200 children in 2018 and found a strong connection between school performance and curiosity, especially among children from poorer backgrounds. Dr. Shah found that “Promoting curiosity in children, especially those from environments of economic disadvantage, may be an important, underrecognized way to address the achievement gap. Promoting curiosity is a foundation for early learning that we should be emphasizing more when we look at academic achievement”.

The article also quotes Paul Howard-Jones, a professor of neuroscience and education at Bristol University, who affirms the importance of making space for curious questioning in school: “Children should be prompted and encouraged to ask questions even though that can be challenging for the teacher. We do need to find some time for questions during the day. There is not enough time in schools for creativity and following up on curiosity.”

The intersection of values, power, and policy at the center of education systems means that no decision made about practice is free from ideology. As Biesta writes, “The important question, therefore, is not whether or not there should be a role for evidence in professional action, but what kind of role it should play.” So it’s as important for proponents of “evidence-based education” to turn the same critical lens on their own beliefs that they wield so quickly against other theories of learning and other ways of doing school.

Our goal should be to design an educational experience for children that can serve a number of worthy and suitable educational goals, which may mean drawing from wells of professional practice, experience, and community engagement other than just those with the highest effect size.

As educators, we have the responsibility to use our professional expertise to assess, judge, and rationalize the best possible methods — understanding what “best” is. As in, is “best” improving test scores or repeating information, or is “best” motivating, inspiring, connecting with the community, reaching all students, and building a better future.