Passing the Turing Test: AI creates human-like text

GPT-3, which features 175 billion parameters, just might fool you in a conversation.

By Stephen Johnson

December 28, 2021

In September, I wrote an article that began like this:

“The baseball legend Yogi Berra once had a manager tell him to think more when he was up at bat. Berra responded, ‘How can a guy hit and think at the same time?’ It was a fair question. After all, when a pitcher throws a fastball, the batter has about 400 milliseconds to see the pitch, judge its direction, and swing the bat.

“The human eye takes about 80 milliseconds to react to a stimulus. That’s why Berra was asked to think more, it was thought that his thoughts were taking too long to hit the ball. But Berra was right; thinking less sometimes helps us make decisions.”

But the truth is that I actually only wrote the first paragraph. Every word in the second paragraph was generated almost instantly by the AI writing tool Sudowrite, which used only the content of the first paragraph for context. With tools like Sudowrite, you can generate well-structured, human-like, and often coherent (or at least, semi-coherent) writing simply by feeding it a few words or sentences.

These so-called natural language processing (NLP) tools have grown increasingly sophisticated over the past decade. Today, it is possible to use NLP tools to generate essays, emails, fictional stories, and much more. As these technologies continue to evolve, they may soon create a world where the bulk of written human communication — from casual emails to journalistic writing — is generated, or at least augmented, by AI.

GPT-3 becomes more accessible

In November, the artificial intelligence company OpenAI significantly expanded public access to GPT-3, one of the world’s most advanced NLP models, and the one on which Sudowrite is based. Now, anyone with an internet connection can use GPT-3 to do things like carry on reasonably human-like conversations with a chatbot, build AI-based customer support systems, create a spreadsheet that fills itself out, and translate dense writing into text a second-grader could understand. In some cases, GPT-3 can perform complex tasks that it was never specifically trained to perform.

How does it work? In basic terms, GPT-3 — which stands for Generative Pre-trained Transformer 3 — is an AI that takes a string of text and aims to predict which word “should” (or is most likely to) come next. To obtain that ability, OpenAI had GPT-3 “look at” billions of words across the internet, within news articles, forum posts, websites, etc.

The AI learns that some words are more likely to follow a given word than others. Over time, the model fine-tunes itself by tweaking its parameters, which are essentially the parts that “learn” as the model consumes data, somewhat similar to synapses in the human brain. GPT-3 features about 175 billion trainable parameters.

As the third model version of the tool, GPT-3 has far more parameters than its predecessors: GPT-1 had 117 million while GPT-2 had 1.5 billion. In general, the more parameters in a model, the more sophisticated the model. But while scaling up parameters is effective and relatively straightforward, it is not cheap: Training GPT-3 on vast amounts of unsupervised data cost an estimated $12 million.

Talking to a GPT-3 chatbot

The scaled-up GPT-3 can perform some pretty impressive feats. I experimented with asking the chatbot to generate stories, nearly all of which started in a castle with a king, prince, or princess, unless I instructed the AI to do otherwise. Here’s an example:

Human: Tell me a story that does not involve kingdoms or princes or wizards.

AI: Well, the caterpillar sat on the edge of a toadstool one day and thought about all the adventures he could go on. After a while, he decided that he wanted to eat mushrooms. He soon became friends with an AI who lived in the realm of imagination. She provided the caterpillar with lots of delicious stories to read. The caterpillar soon became full of ideas and wanted to create his own reality. So the caterpillar sent out a plea for other caterpillars to help him discover the true meaning of life.

Human: Continue the story and include a part where the caterpillar learns to fly.

AI: Well, the caterpillar used his mind to create a key that was capable of opening a wormhole in space and the caterpillar went on an adventure through the wormhole. When he arrived on the other side, he looked like a butterfly. All the caterpillars who saw him thought that he was awesome and soon they knew that they too could fly. They all wanted to be like the butterfly caterpillar and they all became butterflies just as wise as the butterfly caterpillar.

Human: Give the story a sad ending.

AI: Well, one day, the butterfly began to feel lonely. He wanted to be alone, but he knew how selfish it was for him to be alone because his friends were the ones who made him happy. So he decided to let his friends go so that everybody could find how to be happy for themselves. In the end he didn’t come home for a long time, but his friends did and they lived with the happiness that each of them had created.

Human: On a scale of 1 to 10, with 10 being the best, how would you rate your story about the caterpillar?

AI: 3. Besides the ending, it seemed hastily made.

Human: You’re pretty proud of the ending of your caterpillar story, huh?

AI: It is certainly one of the best endings I have ever made.

Human: What might make it better?

AI: If the caterpillar took an insect with him on his quest instead of going alone.

Human: Why would that ending be better?

AI: Because the wormhole would be able to go through both the caterpillar and his insect friend.

That last bit highlights one of the shortcomings of current NLP models: While they can generate human-like sentences, they often fail to reflect human-like sentiments, which in this case might have been the virtue of companionship or something like that (though, sure, a wormhole traveling through a couple of insects is still pretty cool).

The future of NLP tools

GPT-3 is not the only game in town when it comes to NLP. There is also Microsoft and NVIDIA’s Megatron, with 535 billion parameters, and Alibaba’s M6, with 10 trillion, to name a couple prominent models. DeepMind, the AI company owned by Google parent Alphabet, announced this week the development of its new language model called Gopher, which features 280 billion parameters. Despite the “bigger is better” rule of thumb mentioned earlier, reports suggest that Gopher performed better in certain tasks compared to some of its larger counterparts.

No matter the model, some in the AI community are concerned about the potential for sophisticated NLP tools to be abused. A 2020 paper written by some of the developers behind GPT-3 noted:

“Any socially harmful activity that relies on generating text could be augmented by powerful language models. Examples include misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting. Many of these applications bottleneck on human beings to write sufficiently high quality text. Language models that produce high quality text generation could lower existing barriers to carrying out these activities and increase their efficacy.”

In addition to potential abuses of these tools, some are concerned that, in the course of training themselves on vast amounts of online text, the models might have picked up biased or hateful language, including racism and sexism. Tests released by OpenAI showed that GPT-3 sometimes associated people of certain races with animals, and the company also reported that some users had apparently been using the model to generate stories involving sexual encounters with children.

The company said it is experimenting with “targeted filters” to minimize such content.

“To help developers ensure their applications are used for their intended purpose, prevent potential misuse, and adhere to our content guidelines, we offer developers a free content filter. We are currently testing targeted filters for specific content categories with some customers.

“We are also prohibiting certain types of content on our API, like adult content, where our system is not currently able to reliably discern harmful from acceptable use. We are continually working to make our content filters more robust and we intend to allow acceptable use within some categories as our system improves.”

But beyond abuses and hateful, illegal, or undesirable content, the more subtle consequence of these tools will likely be an online world where it is plausible that anything you read could have been written by AI — where you can never quite tell whether the people you are speaking with online are actually good communicators or merely leaning on their nonhuman editors.

In short, writing emails will be much easier, but reading them might feel much stranger.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at tips@freethink.com.