The Story of GPT-3...

updated on 19 April 2022

AI is ubiquitous. AI is embedded across all enterprises, homes, and beyond. AI is augmenting and improving our lives. And, AI is still in its infancy. 

With new breakthroughs in algorithm design, harvesting of massive data sets, and lightning fast processing, a new generation of AI is emerging. 

In October 2015, Elon Musk along with Sam Altman and other investors such as Reid Hoffman and Peter Thiel created OpenAI. OpenAI is a research company with a corporate mission to ensure that AI remains safe and benefits all humanity. 

While Google, Facebook, and Microsoft still keep technology under wraps, OpenAI would open source AI technology. AI would be available to everyone, not just the richest companies on Earth.

This idealistic vision later proved to be completely impractical.

In 2018, OpenAI published a paper on generative pre-training and what would become the first version of Generative Pre-trained Transformer (GPT) software. The first and second generation of GPT code is open source and freely available. 

GPT-3 is the latest version of software by OpenAI. Scientists have called it the most interesting AI model that’s ever been produced.

OpenAI has not open sourced GPT-3 because of opportunities of monetization. Microsoft licensed GPT-3 and integrated it into their AI-powered Azure platform. 

What’s a generative system?

A generative system uses unsupervised learning when processing training data. 

In a generative system, the output of a deep neural network is essentially flipped. Rather than identifying or classifying data—as in coming up with captions for photographs—the system instead creates entirely new examples that are broadly similar to the data it was trained on. 

What’s a transformer system?

A transformer system uses existing knowledge of language to make predictions on what words might come next based on a series of previous words. 

They detect patterns in sequential elements such as text; enabling them to predict and generate the elements likely to follow. 

GPT combines a generator and transformer system to take text and reuse it in another context or word sequence while maintaining its meaning.


GPT-2 is a generative neural network that’s trained on a massive trove of text downloaded from the internet. 

By processing vast volumes of data from the internet, GPT-2 automatically generates writing.  

Given a text prompt of perhaps a sentence or two, GPT-2 generates a complete narrative. It picks up from the prompt and completes the story. 

The quality of GPT-2’s output can be impressive but it can also vary widely. 


In May 2020, OpenAI released GPT-3, a vastly more powerful system. GPT-3 is also an unsupervised machine learning application.

While the GPT-2 neural network includes about 1.5 billion parameters, GPT-3 increased that number by more than a hundredfold to 175 billion.

GPT-3 is ten times larger than the next largest model—Turing NLG—developed by Microsoft.

GPT-3’s neural network is trained on more than forty-five terabytes of text; an amount so vast that the entire English version of Wikipedia—roughly six million articles—constitutes only about 0.6 percent of this total. It would take a human more than 500,000 lifetimes to read this text. 

GPT-3 captures the sequential dependencies between words, paragraphs, or code in order to generate its outputs.

GPT-3 assesses the text that a user submits to it in the context of all it has learned from its previous experiences—and then it “predicts” the next words that should logically follow. GPT-3 is incredibly adept at creating human-like words, sentences, paragraphs, and even stories.

The narrative text that GPT-3 renders is, in most cases, remarkably coherent. The writing reads so naturally that it appears as if a person wrote it.

Both of OpenAI’s GPT systems are at their core powerful prediction engines. Given a sequence of words, they are good at predicting what the next word or the next sequence of words should be. 

What can GPT-3 do?

The purpose of GPT-3 is to generate humanlike, written language responses to submissions of text, or “prompts.” It generates a response if you submit the prompt as a question.

The following are the different types of prompt completion:

  • Partial phrase: Possible completions
  • Topic sentence: Possible paragraphs
  • Question: Possible answers
  • Topic and some background information: Possible essay
  • Dialogue: Possible transcript of a conversation

GPT-3 can produce poetry, philosophical musings, press releases, and technical manuals.

GPT-3 can create meaningful stories, poems, emails, chatbot responses, and even software code with just a few prompts from a human. For example, it can change legal jargon into plain English. 

GPT-3 can be fine-tuned for new tasks with a minimal amount of in-domain data. 

It routinely passes the Turing Test by impersonating the language of humans so well that its words are often indistinguishable from humans.

Sample Use Cases

AI-Assistants: GPT-3 has great potential to enable better writing assistants, more capable chat agents, better translation applications, and more accurate speech-recognition systems. For example, GPT-3 helps sci-fi writers take their texts in weirdly surreal directions. If you start with “I was born..” or “Once upon a time..” and keep choosing the predictive sentences, you’ll get a strange piece of writing straight from the innards of GPT-3.

Testing GPT-3

A sample GPT-3 test is as follows:


“Our focus now must be to openly address the current realities of AI to ensure, we well as we can, that it is implemented only in ways consistent with fundamental human dignities and only for purposes consistent with democratic ideals, liberties, and laws.”


“Our job is now to convince the public in particular that using AI to achieve these aims is a necessary and desirable part of our society, but we cannot afford to do so unless we know how it will best be used and when. But in the end, the future demands that we make moral decisions as we begin to build a world that is truly safe and sustainable, one where humans and AI can truly coexist together.”


Only a few months after GPT-3’s release, a team of Google researchers, unveiled a system called BERT—a model that is six times larger than GPT-3. 

BERT uses 1.6 trillion parameters.

BERT took a test to complete sentences versus humans, it could answer just as many questions as a human could. And, it wasn’t actually designed to take that test.

BERT is what researchers call a “universal language model.”

OpenAI’s system learned to guess the next set of words in a sentence. BERT learned to guess missing words anywhere in a sentence.

If you fed a few thousand questions and answers to BERT, it can learn to answer other similar kinds of questions on its own. BERT can also carry on a conversation. 

Shortcomings of GPT-3

GPT-3 doesn't maintain coherence for more than a sentence or so—sometimes considerably less. Individual phrases may make sense, and the rhythm of the words sounds okay if you don't pay attention to what’s going on. This is because GPT-3 has a terrible memory. 

If you’re ever wondering whether a text is written by an AI or a human, one way to check is to look for major problems with memory.

As of 2019, only some AIs are starting to be able to keep track of long-term information in a story–and even then, they’ll tend to lose track of some bits of crucial information. Many text-generating AIs can only keep track of a few words at a time. 

Researchers are working on making AI that can look at short-term and long-term features when predicting the next letters in a text. These strategies are called convolution. A neural network that uses convolution can keep track of information long enough to remain on topic.

With its memory improved by convolution, the next versions of GPT and BERT will more likely produce text on topic. 

If enterprises use GPT-3 to auto generate emails, articles, and papers, and so on, without any human review, the legal and reputational risk is great. For example, an article with an ugly racial bias could lead to significant consequences. 

Writing styles could vary enormously from culture and gender. If GPT-3 is grading essays without checks, a GPT-3 paper grader may grade a student higher because their style of writing is more prevalent in the training data. 

GPT-3 can’t discern right from wrong from a factual perspective. GPT-3 can write a compelling story about a unicorn, it, however, has no understanding of what a unicorn is.

In the wrong hands, GPT-3 can be used to generate disinformation such as fake stories, false communication, or impersonated social media posts and also biased or abusive language.


Musk and Altman painted OpenAI as a counterweight to the dangers presented by the big internet companies.

It's reasonable to expect that AI will progress at least as fast as computing power has, yielding a millionfold increase in the next 15 to 20 years. Right now, generative transformers have the largest networks. It is still many times fewer than the estimates of the human brain synapses, but at the rate of doubling every two years, the gap could close in less than a decade. Of course, scale does not directly translate to intelligence.

Read more