Since my profession is a secondary school teacher, and since I’m simultaneously drawn to both extreme pessimism and optimism about new technology - and because I’m just at the end of the day another extremely online dude - I’ve been thinking a lot, recently, about the intersection of ChatGPT and the task of educators. I hope to say something more on-brand, soon, about the broader political economic implications of these technologies that hasn’t been said before, but in the meantime, here are some thoughts as an educator. I’m dividing this into:
luddite measures to limit the ability of LLMs to inhibit student learning, and to buy enough time for you to keep up with some traditional assignments before having a chance to revise how you do things,
accelerationist measures to encourage students to use these tools in ways that are helpful to their learning
tips n tricks on how to reduce your own workload
Throughout this, I’m assuming that ubiquitous access to machine-generated text is here to stay (almost certainly true without massive interventions,) but also that these models won’t go Skynet and kill us all or something (in which case we have bigger things to worry about.)
tl;dr
Text generated by algorithm can also be detected by algorithm, and in many cases by your own judgment calls
Students (and you) should query the model about stuff they already know to familiarize themselves with its reliability
Consider having at least a few designated assignments where students can go hog wild with it - as long as they submit a record of their consultation, and explain what they’ve produced
There are few tasks that can be entirely outsourced to ChatGPT but a lot where, with a little bit of quality control from you, it can make things go quicker - from scaffolding to creating questions and distractors to routine communication
luddite measures
Most of the discussion in the profession so far has focused on ChatGPT as a way to avoid doing thinking-through-writing, so let’s focus on that first.
ChatGPT output is actually very easy to detect, both for reasons that are immediately obvious to teachers and for reasons that I think should be more widely known.
eyeballing it
First, if students aren’t using advanced promptcraft and if you’re not teaching at a selective college, it’s easy to eyeball when writing is beyond what would be plausible for a student to produce on their own. For instance, if I ask the machine what the causes of the industrial revolution were, its answer begins:
The Industrial Revolution was a period of rapid industrialization that occured in Europe and North America in the late 18th and early 19th centuries. There were several causes of the Industrial Revolution, including:
Advances in agriculture and manufacturing techniques: Improvements in agriculture, such as the development of new crops and the use of fertilizers, allowed for a larger and more reliable food supply, which supported the growth of cities and industrialization. In addition, new manufacturing techniques, such as the power loom and the steam engine, made it possible to produce goods more efficiently and at a lower cost…
With some promptcraft - “rewrite it at a 9th grade reading level,” “dumb down the language” (this did most of the work) and “make the spelling and grammar bad,” it instead starts off:
The Industrial Revolution was a time when industry grew real fast in Europe and North America in the late 18th and early 19th centuries. There were several reasons why this happened, including:
Better ways to farm and make stuff: New farming techniques, like using different crops and special dirt, helped people grow more food, which made it possible for cities to grow and for industry to develop. New manufacturing techniques, like the power loom and the steam engine, made it cheaper and faster to make goods…
Naively, would you fall for the latter? You tell me! Now that I’m familiar with ChatGPT’s tics, this would set off alarm bells, but a year ago, absent the bizarre tell of fertilizer as “special dirt,” I’d probably nod along depending on the student. My guess is that prompts sequences that don’t set off teacher alarm bells will start spreading through TikTok and the like very shortly.
However, this still isn’t immune to…
algorithmic detection
…like this one at Huggingface. How accurate is this algorithm? I like to think I’m moderately good at promptcraft, but I haven’t yet found English prose text responses to essay prompts that it doesn’t catch. (I have gotten false negatives with the model’s atrocious poetry, and with translations.) And even though I’m familiar with the kind of writing that ChatGPT produces, I haven’t yet been able to independently write something that it catches as a false positive. You might be better able to outsmart it - at any rate this is something you can test empirically.
In the future, OpenAI says they’re interested in adding a “watermark” that would basically eliminate false negatives. But we’re most of the way there already. This is especially important for uni-level teachers, who have more capable students and less opportunity to get to know them.
the old standby: “could you explain what you mean by this?”
Old teacher trick we already know from essay mills and other plagiarism techniques that don’t show up on Google.
Could a student pass this test despite having not made the original text? Absolutely (I could certainly explain the text above.) But if so, maybe they’ve acquired enough understanding that the text generation isn’t the point and you can let it slide.
blue books, &c.
A lot of people have suggested that we move back to things like blue book exams to prevent this. My personal feeling is that the above methods are accurate enough that if you’ve abandoned these for other reasons (scarce grading time; scarce class time; you’re like me and can’t be trusted not to lose physical documents immediately) you don’t need to take these up again.
Now, some students actually think better on a time budget, and writing with pencil and paper can activate kinds of thinking that other methods can’t, so maybe you want to do this anyway. If so, however, those reasons suffice.
IP blocking
I would reccomend that your school not do this. First, there are legitimate uses; secondly, a million little APIs with names like zumbohomeworkhelp.biz are going to bloom anyway that just do the same thing, but laser-focused on just the plagiarism aspect.
accelerationist measures
What are some ways to explicitly encourage text generation use - ways that enable student learning?
familiarizing students with the limits of the model
One of the first assignments I’m giving students when we get back is probably the following:
Play around with ChatGPT and ask it to teach you about something you know a lot about already - include a transcript of your chat with the submission. Identify one place that the output is subtly incorrect, one place where it makes a real howler or blatantly contradicts itself, and one place where it uses a lot of words to say something meaningless (“there are many factors that could explain why Bella is attracted to Edward, including personal appearance, socialization, or fortuitious circumstances. Ultimately it is up to the reader to interpret the character.”) Since you’re the expert on this subject, I’ll trust you on where it’s correct and incorrect - far more important is you getting a familiarity for how it can be.
If you don’t identify anywhere where it’s incorrect, that’s okay too, but I want the chat transcript to show that you were clearly trying.
Playing around with the model, I’ve found that errors are especially common when a concept is subtle and it can round it off to something more familiar. For instance, it’s told me that
Ellen Wood argued that the expansion of international trade was key to the birth of capitalism,
Graham Priest thinks dialethia have no truth value,
Anne Frank’s diaries would be useful for understanding the Eighty Years War,
Adrian Vermeule’s “Law’s Abnegation” is about how we can use profitably use nudge-style incentives (called “abnegations”) in lieu of strict lines between legality and illegality
If you statted them up as D&D characters, Pierre Bezukhov would have higher Wisdom and Charisma scores than Adrian Bolkonsky
(In context, I can see where it was coming from with the first two, even though they’re both actually kind of the opposite of the truth. The third was probably tripped up by referring to the “Dutch War of Independence,” even though the conversation history and other suggestions implied it “knew” we were talking about the early modern not 20th century occupation of the Netherlands. I think it hallucinated the fourth entirely. The fifth is an example of something both subtle and goofy and conjunctive enough that I’m impressed that it was able to deal with it at all, but where, even though both the fiction and D&D silly scores are subject to interpretation, I can’t imagine any informed human would agree with the output.)
The existence of erroneous output is inevitable given the basic architecture of these models (which don’t aim to represent an external world but to predict text similar to what they’ve seen before.) Of course the existence of errors is also inevitable with everything else, but I think getting a rough sense of where it is - I’d say better than most undergraduates, worse than Wikipedia, maybe on par with newspapers depending on the subject - is helpful, and revisiting this regularly as models evolve will be helpful.
(There’s the danger that students will go in thinking they are experts on something they are deeply misinformed about. Personally, I’ll play this by ear and come out a little better informed about how to approach it.)
letting students use it but only for certain tasks
Why do we want students to be able to write on their own? Well, mainly because we want them to be able to think on their own. Just like we already allow certain less cognitively interesting tasks to be outsourced to spellcheck (while emphasizing its limits and holding students responsible for what they turn in), I think there’s a strong case that not a lot is lost by letting, say, generative text turn bullet points into paragraphs.
just letting students use it
Producing text with these models is almost certainly going to be one of important skills of the future. So maybe they should practice exactly that?
Going this route, I’d require two things:
Cntl+C/Cntl+V. Students should append a transcript of their chats so that you can evaluate not just the final output, but their reasoning along the way (which is, after all, the point.)
Some kind of oral explanation component (if only as a probabilistic threat kind of thing.) It’s easy to produce, but hard to justifiably trust, output you don’t understand. (ChatGPT’s own guardrails make it hard to accidentally produce offensive material, but very easy to produce content that looks sophisticated but means nothing.)
Assignments like this can start off with a traditional task and then, as students become more familiar with the use of this tool, ramp up in sophistication and difficulty. A lot of this will be entering into uncharted territory for you both, but you might as well start taking it now.
generative text as Google/StackOverflow
I already ask students to make handwritten notes on text. ChatGPT has a lot of value as a tool that can quickly explain a reference and be right 90% of the time - sort of like a Wikipedia that can be navigated much more quickly. My intuition would be to encourage students to use it for this purpose, while also appending the chat per above to notes they submit.
eyeballing it, round 2
In the future, people will be inundated with a lot of machine text; knowing artisanal human text from machine-produced text doesn’t matter for all purposes - on the reader side a good argument is a good argument, and funny joke is a funny joke - but it will be one increasingly important hermeneutic skill among others. Yes, there are the algorithms above, but I think this is going to need to be a microsecond-level reading skill, just like the seconds of friction necessary to search up vocabulary is miles worse than just knowing it.
Quick searching doesn’t find one that already exists, but almost certainly someone will come out with a website where you can practice this conveniently and with quick feedback - similar to a text version of aiorart.com - and as soon as it does I’d point students to practice on that.
Some of the very specific tells that you and students develop will become obsolete with future models, but we’ll have to start this at some point so you might as well start now.
equity concerns
I think there are a couple of equity concerns that are raised by adopting the use of this.
First, as with anything, some students are just going to be more proficient with this than others. (This is true of everything, though.)
Second, it’s possible to get banned by OpenAI, and since they want your real phone number, this can result in you being pretty fucked. OpenAI hands this out to people who play arund too much with the guardrails around getting the AI to tell racist jokes or how to cook meth and the like, which student might engage in because they really like meth and racism… or just because they want to test the capabilities of the model. Since it’s just another capitalist firm, OpenAI doesn’t have to justify its decisions to anyone.
Third, the current free status of ChatGPT isn’t going to last, and eventually it will be a paid product, like its predecessor GPT-3 is. Maybe the free allowances will be enough for everyone to use this for your class, or maybe not!
Fourth, some of the above are going to be solved when there are competitive open-source models (similar to the open-source Stable Diffusion vs. the proprietary Midjourney; these already exist in some form but are weaker and require technical skill.) But these will likely require having a real computer - not a walled garden like a Chomebook or iPhone - at home.
tips n tricks
Here are some things that ChatGPT is good at:
Overviewing things at the level of a few paragraphs. Ask it to “summarize in a few bullet points only,” otherwise it will include a lot of patter.
Writing polite emails, or turning your actual thoughts into bland professionalese
Making text more accessible. In general, you have to overshoot: “Rewrite this at a 9th grade reading level" from typical academic text will get you down to maybe an 11th grade level, asking it to go to 5th will get you to 9th.
Generating ideas. Again, try to encourage bullet points, then ask for bullet point variations on things that sound promising. It will always generate the most obvious couple ideas plus some nonsense ones, so the iteration is key.
Generating exercises/review questsions. These require more editing/supervision than some other tasks, but prompts I’ve found useful include:
“Generate some review questions on… these should be easily answered by anyone with basic familiarity with the subject.”
“What are some vocabulary terms that would be useful for someone about to learn…”
“Here are some test questions and answers. Please generate three false answers for each one.”
“Based on this text, generate eight true/false questions. Four should be true and four should be false.” (If you don’t specify, it will generate almost all trues. You’ll also want to especially inspect the wording on the falses.)
Coding! ChatGPT is really good at small coding tasks, probably because its capabilities have been enhanced in some way over the base language model (or maybe the StackOverflow corpus is just that good?) You probably don’t use these professionally much but if you use a little VBA or the like to keep track of grades, this can speed up writing these tremendously.
Translation! It also seems very good at this, though I’m mostly taking others at their word, being a monolingual moron myself.
where not to
Some tasks I wouldn’t use it for. First, I know some teachers have said they find it useful for suggesting feedback on student writing; I think this is (at least at current model; GPT-4+ may be stronger at this) an area where it tends to produce dangerously plausible seeming but ultimately non-useful feedback.
Second, I think it’s irresponsible at this moment to use it for letters of recommendation; even though this is the kind of boilerplate professional-language document at which the model excels, it may be that admissions officers are algorithmically detecting it in a few months and discounting ones that aren’t human-produced. Even between different fields there are some pretty different norms about who writes an LOR and how bombastic the praise is supposed to be. How will norms on this one evolve? I have no fucking clue! Still, better safe than sorry for now.
Third, wherever you’re using writing to think through something. This document is artisinally crafted human text, mostly for that reason; and thinking-through-writing is a skill we don’t want to lose, just as we want for students. (Maybe this would be a better guide if I used ChatGPT to produce or revise more of it - but ironically I’m too lazy to do that!)
Fourth, I’ve suggested that some tasks above require special supervision; but really, all output here requires human supervision and approval; and as we should be drilling into students, you cannot trust what you do not read and understand. ChatGPT is both less creative and less logical than you; but it is faster, harder-working, and has a wider but shallower knowledge base. We’re currently in that lucky spot, I believe, where the machine is more of a complement to than substitute for our labor, except for some of the most boring parts - long may it remain there!
One argument I've heard is that we shouldn't be encouraging teachers to think they have a nose for this stuff, because it will end in tragedy, with teachers far more likely to "discover" GPT plagiarism in work by poor kids and black kids- whether it's there or not. Thoughts?
I stumbled on this blog through the ACX ecosystem. Can I ask if you hold similar "case against education"/education is mostly signalling views that Scott has? Those thoughts would drive me absolutely crazy if I worked in a school.