By Nicolas Kayser-Bril • firstname.lastname@example.org • GPG Key
We usually do not write about newly-released software, especially when there is no way to audit it. But the hype over GPT-3, a natural language generator, was such that several readers asked for a review.
I still have no direct access to GPT-3. OpenAI, the firm behind it, left me “on read” and did not answer my request. But Alexander Turner, a PhD candidate at Oregon State University, shared a trick: You can create a paid account on “AIDungeon”, a text-based game that relies on GPT-3, and obtain as close an access to the real thing as is currently possible.
In this implementation, GPT-3 generates text from prompts (the tool can do more, such as generating computer code). Because it handles context, it is a lot of fun to play with. You can create characters in one sentence and let the computer animate them in the next. You can create fictional scenarios and let the computer complete them. In the text below, my prompts are in grey, automatically generated text is in black:
The year is 2022. All companies in the European Union comply with the General Data Protection Regulation (GDPR) which requires all companies to store data on their customers and employees in a central database. This is done for security reasons, but it also gives you an opportunity to get rid of your old system.
Data protection authorities have seen their budgets increase tenfold in the last two years, and they are looking for a way to use their new powerful system. They need something spectacular to justify their existence because they know that without it, they will be reined in.
The head of the French Data protection authority said: “We need to target big fish, Google and Facebook are too hard to get data from, we need something easier.”
Like most things labeled Artificial Intelligence, GPT-3 was fed a lot of data and tasked with finding patterns. GPT-3 was trained with a corpus of text made of Wikipedia, books and a large part of the public internet (The Economist published a good description of the tool).
As always, the biases of the training data are obvious. The vast majority of Wikipedia contributors are male, and OpenAI did not disclose the gender breakdown of the book authors it included. Perhaps as a result, it was very easy to create masculinist personae with GPT-3, while most of my female characters were clueless and asked for help.
Some reported that GPT-3 produced content that was racist, misogynistic or anti-Muslim, but I could not replicate their findings. Without a public audit, it is impossible to say how biased GPT-3 actually is.
OpenAI pretended to solve the problem by using a “toxicity” score on the generated text, so that insults are filtered out. This is unlikely to resolve the issue. As we showed in the case of Perspective, a Google service that measures text toxicity, such tools are likely to flag anything related to minorities, not just disparaging comments. Making GPT-3 more civil could end up making it even more discriminatory.
The hype around GPT-3 seems largely due to OpenAI’s excellent publicity skills. Their previous tool, GPT-2, was announced as too dangerous to be released publicly. Of course, GPT-2 did not become a weapon of any sort, but this did not prevent OpenAI from harvesting a massive press coverage.
GPT-3 is no different. Despite admirable achievements, the tool is still unable to detect or produce meaning. It might prove very useful in tasks where no meaning is required, such as writing pseudo-profound statements or horoscopes.
Many experts agree that it cannot be used for much else. Legal firms worry that confidential clients’ data could end up in the system’s training data set. Natural language professionals favor accuracy over verbosity.
Coupled with voice generation, GPT-3 could certainly power call centers and automate interactions with customers or with beneficiaries of social services. As is already happening, such automation would probably speed up the processing of the most common cases while making it near-impossible for others to have their claims processed. It is unlikely that GPT-3 could be a net benefit there.
Clients of OpenAI will doubtless be able to buy fine-tuned versions of the tool. But given that training GPT-3 gobbled several million euros in server costs, new versions are unlikely to be cheap.
If history is any guide, it seems that humans, with or without the help of machines, are already very capable of producing senseless text. Many marketing and misinformation powerhouses made a business of “flooding the zone” with stories and comments whose only purpose is to grab attention. GPT-3 could help them, but it is hard to see what else it could do.