Hello fellow linguists! Billy here—today’s fun fact is about GPT-3! My goal is to pull back the curtain on this mystery (or at least give it a slight tug).
What is this GPT-3? GPT-3 stands for third generation Generative Pre-trained Transformer. I won’t blame you if at first the name leads you to envision a Chomskyan Optimus Prime visiting the local gym.
Let’s break this down.
Generative here means that some text is inputted, and the model will generate some output. The output is based on probability predictions for how likely a sequence of words is to appear. For example, “I like eating apples” is more probable than “I like eating cars” (unless perhaps we are talking about Megatron).
Pre-trained means that this model has been exposed previously to large amounts of text (almost 45 terabytes!) in order to refine its calculations (i.e. adjusting weights within the neural network). This model is unsupervised, which means that it looks at large amounts of data and forms its own representation of patterns without human intervention.
Transformer is the neural-network architecture of the model. This applies special techniques (such as attention and self-attention) to optimize these calculations.
But what about linguistics?
A linguist understanding the rich complexity of language may not be satisfied with representing such intricacy with linear algebra. Hybrid approaches (coined as neuro-symbolic) use these neural network architectures along with traditional symbolic approaches (human written rules) to incorporate deeper semantic reasoning over language.
What do you think? What do you make of GPT-3? How can the complexity of language be captured in a computer? We would love to hear your opinions, no matter how technical your background!
Thanks for reading,
Linguist List Student Moderator
PS: If you would to like to dive deeper into the technical aspects of GPT-3, I highly recommend checking out Jay Alammar’s detailed explanation here: https://jalammar.github.io/how-gpt3-works-visualizations-animations/