, , , , , , , , , ,

Until recently, my approach to the very new technology of large language models or LLMs – the AI tools of which ChatGPT is the most famous – had been heavily shaped by my experience of feeding it an essay assignment like my classes’ and thinking the result merited a B or B-. On the disturbing side, that meant that ChatGPT could easily generate a passable paper; on the reassuring side, it meant that it could not easily generate a good paper. The latter still required the human touch.

Imagine my alarm, then, at reading this essay by Maya Bodnick.

The ChatGPT I tested last December is based on GPT-3 – an already outdated version of the underlying technology. Bodnick, a Harvard undergraduate, set out to test the newer version, ChatGPT-4. Bodnick asked her Harvard professors and teaching assistants to mark a set of eight essays written in response to their classes’ prompts, telling them “each essay might have been written by me or the AI” in order to minimize response bias – though in fact they were all written by ChatGPT-4. Half of these essays got an A or A-; three of them got an A.

The easy response to this phenomenon is just to say that Harvard has significant grade inflation. That is true up to a point; when I was a Harvard TA (or TF, as Harvard calls them), I was sometimes instructed not to give a grade below B- except for papers that were egregiously bad. But it’s hardly enough to alleviate the underlying concerns. Harvard grade inflation isn’t that high. Maybe these ChatGPT-4 papers wouldn’t have got an A at Yale or Princeton or Swarthmore, which are known for marking harder, but they would likely have got one at the vast majority of colleges and universities around the US, where most students don’t come in with an essay-writing ability comparable to that of Harvard undergrads. And if ChatGPT can improve that much in less than a year, imagine what it and related technologies are going to be able to do a decade from now.

When AIs’ output was passable but mediocre, it was still possible to comfort ourselves thinking they’d never really match human creativity. I’m no longer nearly so confident. With a few more technological upgrades like this, it’s possible that someone could feed a prompt to ChatGPT to write an essay in the style of Alasdair MacIntyre or Martha Nussbaum such that I (as someone who’s read a ton of both writers’ work) wouldn’t be able to distinguish it from an essay actually written by Alasdair MacIntyre or Martha Nussbaum.

And this isn’t just about writing. The brilliant music writer Ted Gioia claims that on 12 July the number of recorded songs in the world doubled – because an AI company had just created 100 million new songs, an amount roughly equivalent to the size of Spotify’s entire catalogue. “It took thousands of years of human creativity to make the first 100 million songs”, notes Gioia. “But an AI bot matched that effort in a flash.” Gioia claims that “I’ve heard enough AI music to realize how lousy it is” – but it’s now reasonable to wonder whether new iterations of the technology could fix that problem. (This isn’t even to mention visual art, where even the January version was able to fool experts multiple times.)

This is all happening so fast that it’s hard to even begin grappling with the implications. I’m not talking so much about the implications I explored two months ago, of whether LLMs could feel, and whether we therefore had ethical obligations to them. I haven’t (yet) seen good reason to believe that they can feel, or that they’re on their way to doing so. But they can produce writing and music and visual art that looks like it comes from someone who feels – and that fact has a lot of implications.

It has particularly powerful implications for an expressive individualist worldview, where a good life has something to do with being yourself in a way you express to the world. The past few decades have already been really rough on expressive individualists. Where baby boomers were able to say and believe that the money would follow if you did what you loved, the economic contraction of the decades since have given the lie to that: the jobs for professors and writers and musicians that were available to them have been far less available to us in Gen X and afterwards. The AI revolution stands poised to make that situation much worse: what few jobs remain in those fields, are now endangered by technologies that can produce the products of human creativity. Small wonder that one of the major demands in the Hollywood writers’ strike is to restrict the use of AI. I hope that they’re successful, but worry that some studios may simply decide to lock out the striking writers and replace them with AI entirely.

But the implications go further than that, even for those of us who have found ways to do what we love without the money following. Gioia has noted elsewhere, even before the AI music drop, that in the 2020s more new music is being made than ever before – but fewer people are listening to it, with old music being so widely available. So what happens then if the music and writing generated by AI actually becomes good? Never mind the boomer dream of getting paid for our creative works – is anyone even going to notice them, when they are surrounded by a sea of AI content that looks very similar?

From a different philosophical angle, what does all this say to our theories of art and aesthetics, and what counts as good art? So far I’ve generally leaned toward a reception aesthetic, where what matters is the reaction artistic or creative works produce in their audience. But while AI might not have yet produced a work of writing or music that can make you cry, it probably won’t be long before it can. Does that tell us we should be moving toward a production aesthetic, that puts the artist’s human creativity first? Maybe – but if so, why?

There are way more questions than answers here, and I think that’s inevitable. Anyone who says they know the end result of all this for sure is lying. The one thing that does give me some hope is that we are clearly hovering around the peak of inflated expectations, in the Gartner hype cycle. One of the nice things about having worked in educational technology for over a decade is being able to say: we’ve been here before. Just over ten years ago, Thomas L. Friedman wrote a breathless column on how massive open online courses (MOOCs) would transform education:

I can see a day soon where you’ll create your own college degree by taking the best online courses from the best professors from around the world — some computing from Stanford, some entrepreneurship from Wharton, some ethics from Brandeis, some literature from Edinburgh — paying only the nominal fee for the certificates of completion.

That is exactly the sort of thing someone would say at the peak of the hype cycle – and it never happened. MOOCs were a fad that came and went; I think that theoretically you can still take online courses from “the best professors” via Coursera or edX for a nominal fee, but we know now that those courses are terrible, and if you tried to make a degree out of them, anyone who knew anything about that degree would laugh at you. MOOCs did spark related kinds of experiment that are still continuing, especially large discounted “at-scale” degrees like Georgia Tech’s online CS MS or BU’s Online MBA, but it is still too early to tell the eventual fate of those. The long-term impact of at-scale education remains to be seen.

This is not to say that large language models will be a flash in the pan the way that MOOCs were; their effect is likely to be much more consequential and enduring. Yet the key to what happened with MOOCs was that hype like Friedman’s prodded everyone to push their limits – and as a result, we soon came to see very clearly what they couldn’t do, even given further technological innovation. I imagine that in the next couple years we will come to see more inherent limitations in large language models, ones that can’t be technologically innovated out of. Right now it feels like LLMs can do anything; that’s going to change once we get a better understanding of what they can’t. Once we get there, we’ll have a clearer picture of the role that human creativity will continue to play. But right now we don’t know what those limitations are going to be – or what they aren’t. In this respect, the owl of Minerva is still sleeping in the sunshine.