The Story of GPT-3...

published on 07 April 2022

With new breakthroughs in algorithm design, harvesting of massive data sets, and lightning fast processing, a new generation of AI is emerging. 

AI is ubiquitous. AI is embedded across all enterprises, homes, and beyond. AI is augmenting and improving our lives. And, AI is still in its infancy. 

In October 2015, Elon Musk along with Sam Altman and other investors such as Reid Hoffman and Peter Thiel created OpenAI. OpenAI is a research company with a corporate mission to make AGI, make sure it’s safe, and maximize its benefit to all of humanity. 

While Google, Facebook, and Microsoft still keep technology under wraps, OpenAI would make AI available to everyone, not just the richest companies on Earth.

In 2018, OpenAI published a paper on generative pre-training and what would become the first version of Generative Pre-trained Transformer (GPT) software. The first and second generation of GPT code is open source and freely available. 

GPT-3 is the latest version of the language model. Scientists have called it the most interesting AI model that’s ever been produced.

OpenAI started giving developers access to GPT-3 via a simple to use “text in, text out” API in June 2020–model as a service. Even without having to host your own GPUs, you could play with it and try new things. To use GPT-3, you don’t even need to know how to code, you just need to describe in words what you want the AI to do; the audience for this technology includes folks from different professions even outside of tech. 

OpenAI has not open sourced GPT-3 because of opportunities of monetization. Microsoft licensed GPT-3 and integrated it into their AI-powered Azure platform. 

What’s a language model?

Language models are word prediction machines. It’s trained on an enormous amount of text and so when it sees new text, it comes up with words that might come next. The more text you train it, the more creative it becomes.

Language models are generally trained to perform one task–text generation, summarization, classification, and so on. 

But GPT-3 is extraordinary because it’s a general purpose language model.

Generative, Pre-trained, Transformer

Let’s break down the name Generative Pre-trained transformer 3 (GPT-3).

A generative system

A generative system uses unsupervised learning when processing training data. 

In a generative system, the output of a deep neural network is essentially flipped. Rather than identifying or classifying data—as in coming up with captions for photographs—the system instead creates entirely new examples that are broadly similar to the data it was trained on.

A pre-trained system

A pre-trained system is trained for a general task that you can then fine-tune for your own task. It’s based on the idea that once you know generic knowledge about something, it’s easier to learn more about a specific aspect of it. 

GPT-3 is pre-trained on a corpus of text from these datasets:

  • Common Crawl: Petabytes of data collected by years of web crawling. 
  • WebText2: OpenAI’s internal dataset that includes all outbound links from Reddit with at least three likes.
  • Books 1 and 2: Texts of tens of thousands of books on various subjects.  
  • Wikipedia: English articles from Wikipedia.

The combined text is more than a trillion words.

A transformer system

A transformer system detects patterns in sequential elements such as text; enabling them to predict and generate the elements likely to follow. 

Transformer system looks at an input sequence, piece by piece, and uses probability to decide the relationship between a sequence of words. 

A Brief History of GPT-3

All of OpenAI’s GPT systems are at their core powerful prediction engines. Given a sequence of words, they are good at predicting what the next word or the next sequence of words should be. 

Let’s talk about GPT-“3”—as well as “1” and “2.”


OpenAI introduced GPT-1 in June 2018. 

GPT-1 is trained using the Book Corpus dataset, which contains some seven thousand unpublished books. It has 117 million parameters. 

GPT-1 showed that language models can be effectively pre-trained. 

Pre-trained allowed GPT-1 to have zero-shot performance for various NLP tasks like question answering (Q&A) and sentiment analysis. Zero-shot is the ability of a model to perform a task without having seen any example of that kind in the past.


OpenAI introduced GPT-2 in February 2019. 

GPT-2 was trained on a massive trove of text--40 GB of text from over eight million documents, far larger than GPT-1’s dataset. It has 1.5 billion parameters. 

Its performance surpassed GPT-1 at all tasks in zero-shot settings.

Given a text prompt of perhaps a sentence or two, GPT-2 generates a complete narrative. It picks up from the prompt and completes the story. 

The quality of GPT-2’s output can be impressive but it can also vary widely.


In May 2020, OpenAI released GPT-3, a vastly more powerful system than GPT-1 and 2. 

GPT-3’s neural network is trained on more than forty-five terabytes of text; an amount so vast that the entire English version of Wikipedia—roughly six million articles—constitutes only about 0.6 percent of this total. It would take a human more than 500,000 lifetimes to read this text: GPT-3 has 175 billion parameters.

GPT-3 is incredibly adept at creating human-like words, sentences, paragraphs, and even stories.

The narrative text that GPT-3 renders is, in most cases, remarkably coherent. The writing reads so naturally that it appears as if a person wrote it.

What can GPT-3 do?

The purpose of GPT-3 is to generate humanlike, written language responses to submissions of text, or “prompts.” It generates a response if you submit the prompt as a question.

The following are the different types of prompt completion:

  • Partial phrase: Possible completions
  • Topic sentence: Possible paragraphs
  • Question: Possible answers
  • Topic and some background information: Possible essay
  • Dialogue: Possible transcript of a conversation

GPT-3 can produce poetry, philosophical musings, press releases, and technical manuals.

GPT-3 can create meaningful stories, poems, emails, chatbot responses, and even software code with just a few prompts from a human. For example, it can change legal jargon into plain English. 

GPT-3 can be fine-tuned for new tasks with a minimal amount of in-domain data. 

It routinely passes the Turing Test by impersonating the language of humans so well that its words are often indistinguishable from humans.

Accessing GPT-3

Initially, OpenAI gave access to GPT-3 a limited beta user list. To get on this list, you had to complete a form detailing your background and reasons for requesting access. Only approved users were granted access to a private beta of the API with an interface called Playground.

OpenAI removed the waiting list in November 2021. GPT-3 is now openly accessible with a simple sign in. To get API access, go to the sign-up page, sign up for a free account, and start experimenting with it right away.

As a new user, you get a pool of free tokens (credits) that you can use to experiment with it. Tokens are numerical representations of words used to determine the pricing of each API call. A token consists of approximately 4 characters.

Navigating Playground

Playground is a web-based UI that lets you to experiment with GPT-3. 

  • Log in to OpenAI.
Home page
Home page
  • Choose Playground from the home page.
  • Take a look around the Playground screen.
  • The big text box in the center is where you add text inputs (prompts).
  • On the right is the parameter-setting pane that lets you tweak the parameters.
  • Choose Preset and select Q&A. 
  • Add in any question at the end and choose Submit. You can see the number of tokens at the bottom right of the screen. You can use it to monitor your token consumption every time you interact with the Playground.
  • Add in a question and choose Submit to see an answer.

GPT-3 takes the prompt and completions within the text input field into account and treats them as part of your training prompt for your next question.

Tuning parameters

You can tune the following parameters:

  • Model: GPT-3 comes in-built with a bunch of models. text-davinci-002 is the most powerful model. It’s also the default. It costs more to use per API call and is slower than other engines. 
  • Temperature: Controls the randomness / creativity of the response. You can specify a number between 0 to 1. A higher value of temperature means that the generated text is more diverse, but there’s a higher probability of it going off topic. 
  • Maximum length: Sets a limit on how much text the GPT-3 includes in its completion. Because OpenAI charges by the length of text generated per API call, make sure you choose this value based on your budget. A higher response length will use more tokens and cost more. 
  • Stop sequences: Specifies a set of characters that signals the API to stop generating completions. 
  • Top P: Controls how many random results the model considers for completion. You can specify a number between 0 to 1. A value close to zero means the random responses will be limited to a certain fraction: for example, if the value is 0.1, then only 10% of the random responses will be considered for completion. If the value is set to 1, the API considers all responses for completion, taking risks and coming up with creative responses. 
  • Frequency penalty: Decreases the likelihood that GPT-3 will repeat the same line verbatim by “punishing” it. 
  • Presence penalty: Increases the likelihood that GPT-3 will talk about new topics. 
  • Best of: Specifies the number of completions (n) to generate on the server side and returns the best of “n” completions. Using “best of” is expensive: it costs n times the tokens in the prompt.
  • Inject start text: Inserts text at the beginning of the completion. You can use it to keep a desired pattern going. 
  • Inject restart text: Inserts text at the end of the completion.
  • Show probabilities: Debugs the text prompt by showing the probability of tokens that the model can generate for a given input. It can help you see alternatives that might be more effective. 

Fine-tuning with prompts 

You can fine-tune GPT-3 by adding prompts. Think of prompts as a way of showing GPT-3 a few examples of your use case.

When a few prompts (or examples), GPT-3 can often intuit what task you're trying to perform and generate a more relevant completion. This is called few-shot learning.

It’s kind of like showing an intern a few examples of what the work you’d like done. 

If you give GPT-3 a few lines from your favorite novel, it continues the writing in the same style. 

Fine-tuning GPT-3 with the right kind of prompts is the key to unlocking its potential. Prompt design is more of an art than an exact science. 

If you don’t like the answers you get by using a prompt, use a better prompt. 

GPT-3 is capable of a lot if you give it the right type of prompts.

Fine-tuning with a dataset

You also have the option of fine-tuning GPT-3 with your own dataset. This helps, especially if you have more examples that can fit in the prompt text box. 

You can use an existing dataset or incrementally add data based on user feedback. You have to format your dataset so that it looks like this:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

You can use OpenAI’s CLI data preparation tool to easily convert your dataset into this file format. 

openai tools fine_tunes.prepare_data -f LOCAL_FILE

where LOCAL_FILE is the dataset you want to convert.

After you prepare your training dataset, you can start the fine-tuning job:

openai api fine_tunes.create -t TRAIN_FILE_ID_OR_PATH -m BASE_MODEL

where BASE_MODEL is the name of the base model you’re starting from. 

After you've trained GPT-3, you can start making requests by passing the model name as the model parameter of a completion request using the following command:

openai api completions.create -m FINE_TUNED_MODEL -p YOUR_PROMPT

where FINE_TUNED_MODEL is the name of your model and YOUR_PROMPT is the prompt you want to complete in this request.

Customizing GPT-3 with a dataset seems to yield better results than what can be achieved with prompts, because with datasets you can provide more examples.

OpenAI has found that each doubling of the number of examples tends to improve quality linearly.

Programming with GPT-3

GPT-3 comes with built-in support for all the major programming languages so you can interact with GPT-3 build in the programming language of your choice. To access GPT-3 programmatically, see the API reference documentation.

Testing GPT-3

A sample GPT-3 test is as follows:


“We stand in wonder at the accomplishments of great writers. But how did they do it?”


“Like all writers, they started with a blank page,” answered the teacher. “The difference between them and the rest of us was that they were able to keep going.”

Going live with GPT-3

To go live with your GPT-3 app, you need to submit your app for a pre-launch review. OpenAI reviews your app to make sure you adhere to their policies and safety requirements. 

For more information, see Usage Guidelines.

Microsoft invested a billion dollars in OpenAI to license GPT-3 and make it available on the Azure platform. Azure offers a more complex set of resources for production apps.

Sample Use Cases

From Writing-Assistant to Writing-Partner: GPT-3 has great potential to enable better writing assistants, more capable chat agents, better translation applications, and more accurate speech-recognition systems.

For example, GPT-3 helps sci-fi writers take their texts in weirdly surreal directions. If you start with “I was born..” or “Once upon a time..” and keep choosing the predictive sentences, you’ll get a strange piece of writing straight from the innards of GPT-3.

Sudowrite and NovelAI are apps that use GPT-3 to help you write stories. Pharmako-AI is an example of a book that's co-written by an AI and a human (K. Allado-McDowell, founder of Google’s Artists and Machine Intelligence program).

GPT-3 is a crazy, completely off the wall writing partner who throws out all sorts of suggestions, who never gets tired, who’s always there. Writing with GPT-3 feels like channeling a spirit. It can send your writing off in unexpected directions but still gives you the illusion of control.


Only a few months after GPT-3’s release, a team of Google researchers, unveiled a system called BERT—a model that is six times larger than GPT-3. 

BERT uses 1.6 trillion parameters.

BERT took a test to complete sentences versus humans, it could answer just as many questions as a human could. And, it wasn’t actually designed to take that test.

BERT is what researchers call a “universal language model.”

OpenAI’s system learned to guess the next set of words in a sentence. BERT learned to guess missing words anywhere in a sentence.

If you fed a few thousand questions and answers to BERT, it can learn to answer other similar kinds of questions on its own. BERT can also carry on a conversation. 

Shortcomings of GPT-3

Being on topic

GPT-3 doesn't maintain coherence for more than a sentence or so—sometimes considerably less. Individual phrases may make sense, and the rhythm of the words sounds okay if you don't pay attention to what’s going on. This is because GPT-3 has a terrible memory. 

If you’re ever wondering whether a text is written by an AI or a human, one way to check is to look for major problems with memory.

As of 2019, only some AIs are starting to be able to keep track of long-term information in a story–and even then, they’ll tend to lose track of some bits of crucial information. Many text-generating AIs can only keep track of a few words at a time. 

Researchers are working on making AI that can look at short-term and long-term features when predicting the next letters in a text. These strategies are called convolution. A neural network that uses convolution can keep track of information long enough to remain on topic.

With its memory improved by convolution, the next versions of GPT and BERT will more likely produce text on topic. 


Because the dataset that GPT-3 is trained on is text; text that reflects our worldviews, including our biases. 

If enterprises use GPT-3 to auto generate emails, articles, and papers, and so on, without any human review, the legal and reputational risk is great. For example, an article with an ugly racial bias could lead to significant consequences. 

Writing styles could vary enormously from culture and gender. If GPT-3 is grading essays without checks, a GPT-3 paper grader may grade a student higher because their style of writing is more prevalent in the training data. 

No understanding

GPT-3 can’t discern right from wrong from a factual perspective. GPT-3 can write a compelling story about a unicorn, it, however, has no understanding of what a unicorn is.

Misinformation / fake news

In the wrong hands, GPT-3 can be used to generate disinformation such as fake stories, false communication, or impersonated social media posts and also biased or abusive language.

Examples include spam, phishing, fraudulent academic essay writing, prompting extremism, and social engineering pretexting. GPT-3 can easily become the engine of a powerful propaganda machine.


Musk and Altman painted OpenAI as a counterweight to the dangers presented by the big internet companies.

It's reasonable to expect that AI will progress at least as fast as computing power has, yielding a millionfold increase in the next 15 to 20 years. Right now, generative transformers have the largest networks. It is still many times fewer than the estimates of the human brain synapses, but at the rate of doubling every two years, the gap could close in less than a decade. Of course, scale does not directly translate to intelligence.

Read more