View All Posts

21 min read

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tags · Support

« The Curse Of YouTube

Energy Scavenging - Can we keep our electronics running indefinitely? »

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

#AI #CHATGPT #FEW-SHOT LEARNING #FINE-TUNING #LARGE LANGUAGE MODELS #MACHINE LEARNING #ONE-SHOT LEARNING #PROMPT ENGINEERING #ZERO-SHOT LEARNING

ChatGTP is an amazing technology - you can use it for all sorts of things, from generating code to writing poetry. You can even make a cocktail bot:

You can watch a video explanation here - or read on if you prefer.

But it has come in for some criticism when it comes to factual information, it’s often accused of getting things wrong, making facts up, or even sometimes outright lying and misleading the user.

Let’s try and provide an intuitive explanation as to why ChatGPT doesn’t seem to know what is true and what is false - why does it often seem to “hallucinate” things that don’t exist.

First off, what are we actually talking about? ChatGPT is built on top of a Large Language Model or LLM.

The particular Large Language Model used in ChatGPT is a Generative Pre-trained Transformers.

The model is trained on a huge amount of text - in the case of GPT3, 45 TB of text data was used.

During this pre-training, the language model develops a broad set of skills and abilities. Once training is complete, it can use these abilities on new tasks.

To use the trained model you feed in a prompt consisting of a series of words. The model then predicts the next word. This process is repeated until the model runs out of words.

We can pretend to be a language model - imagine the passage:

“The cat sat on the…”

There are a couple of potential words that are likely to follow this - as humans, we know that it would probably be “mat”, or “lap”, or maybe if we’re feeling a bit whimsical “hat”.

You could imagine how you could build a simple language model.

First, you would need to know how to start a sentence, what is the most likely word that would be at the start of a sentence?

You could take all the words in your vocabulary and count how many times each one is used at the start of a sentence and that would let you pick the most likely word.

For the second word you could count how many times two words occur together.

For the third word you could count how many times three words occur together.

And so on until you have built a large enough table of probabilities that you could generate a fairly long piece of text.

The problem is that for any reasonably long piece of text, this table would be huge.

Apparently, most native speakers have a vocabulary that ranges from 20,000-35,000 words.

Even taking the lower range for this we end up with a table that explodes in size. Every time we add a new word we need to multiply the size of our table by 20,000. It increases exponentially.

After just 20 words we would need a table containing more than one hundred septenvigintillion entries - that’s 10 with 86 zeros after it. To put that into perspective, that’s 10,000 times more than all the atoms in the known universe. And that’s just using our very conservative 20,000-word vocabulary.

This exponential behaviour is why we can use three or four random words as a strong password. With 3 words there are over 8 million million possible combinations that a hacker would need to try.

There are some issues with using words in a Large Language Model. There are simply too many of them. Although the average person may only use 20,000-35,000 words there are actually many more - around 500,000 to 1 million.

It has been estimated that the vocabulary of English includes roughly 1 million words (although most linguists would take that estimate with a chunk of salt, and some have said they wouldn't be surprised if it is off the mark by a quarter-million); that tally includes the myriad names of chemicals and other scientific entities. Many of these are so peripheral to common English use that they do not or are not likely to appear even in an unabridged dictionary.

Webster's Third New International Dictionary, Unabridged, together with its 1993 Addenda Section, includes some 470,000 entries. The Oxford English Dictionary, Second Edition, reports that it includes a similar number.

The GPT models get around this by using tokens instead of words. In total there are 50,257 tokens in the GTP vocabulary. There’s a really nice online tool that you can use to see how it breaks text up into tokens here.

For example, the phrase “Elephant carpaccio is not something you should eat” gets broken up into 11 tokens even though it only contains 8 words. The longer words “elephant” and “carpaccio” are turned into multiple tokens.

Using tokens lets us encode a lot more words compared to just using a fixed vocabulary.

Under the hood, the model doesn’t actually work on “tokens”. Each token is actually turned into something called an embedding. An embedding is pretty simple, it’s just a bunch of numbers (a vector) that represents the token. For the most capable GPT3 model, each token is represented by 12288 numbers.

These “embeddings” are learnt by the model during training and help to represent the meaning of each token. Tokens with similar meanings will end up with similar embeddings. These embeddings are actually really powerful just by themselves and there are a lot of interesting applications for them.

However, even if we grouped similar tokens together, they are still different. We still have a very large number of inputs coming into our model, so we still have the problem of our probability table exploding exponentially.

The GPT3 model only has 175 billion parameters and it can have an input of up to 4096 tokens - there’s no way you could store every possible combination of tokens in it.

The model has to learn an approximation of the probabilities. There’s a “lossy” compression of the real world happening.

We’re all familiar with lossy compression from looking at JPEG images. There’s a reason why professional photographers like to shoot their pictures in RAW format - they want to avoid losing any information from their pictures.

So, we know that the model is actually learning an approximation of the real world. Obviously, the more parameters the model has the better this approximation will be, but it will never be perfect.

It’s also simply learning to approximate probabilities of combinations of words. It’s not storing facts or algorithms.

This means that when you ask it a factual question, the information simply may not exist in the model. However, what does exist in the model is an approximation for what is the most likely answer. (strictly speaking, it’s the most likely combination of tokens that would follow the tokens in your question).

You may be lucky and it may be that the facts you are looking for are the most likely tokens - but you may be unlucky and the most likely tokens simply look correct.

One of the amazing things about these models is that they can do anything useful at all. And this is why these Large Language Models are such a breakthrough. Previously to get something useful you would need to train a model for a particular use case. Now, with these very large models, you can just train the model on a whole bunch of text and it can be used to solve multiple problems.

What can we do about this?

The actual GPT3 paper is surprisingly useful, it does go into great detail about how the model performs. All the people complaining about how bad ChatGPT is at arithmetic should really go and read the paper and see what the authors said it was capable of (it can just about do simple addition and subtraction on small numbers).

There are several ways to help the model behave in more useful ways. You’ll probably have heard people mention these - but these people often assume that you already know what they are talking about…

Zero-Shot Learning

The default way of using the model is called “zero-shot” learning. We just give the model a prompt (e.g. “You are an AI assistant”) and hope for the best. This works surprisingly well!

You can also make the prompt very detailed. One very interesting approach is to look at the question the user is asking and then find matching text from a database of facts (e.g. user manuals, technical documentation). You then feed these facts in as part of the prompt - if you do this right, the model will use your information to answer the question.

One Shot Learning

This is the same as zero-shot learning, but you provide an example to the model so that it knows more about what you are trying to do. An example might be a translation bot. You could give it a prompt as follows:

You are a translation tool.

English: Hello
Spanish: Adios

Few Shot Learning

Exactly the same as one-shot learning, except that you give the model multiple examples.

Fine-tuning

This is more complicated and actually involves taking a trained model and then tweaking its parameters by training it on your own text.

This is a very powerful technique as most of the hard work has already been done. The model has learned about language and you now feed it the facts that you want it to memorise.

What’s coming next

Prompt engineering is a new field - we’re still learning how to get the most out of these Large Language Models - what we’ve seen so far are just baby steps.

There are also new models coming soon that have even more parameters - these have huge potential. It’s going to be a wild ride!

#AI #CHATGPT #FEW-SHOT LEARNING #FINE-TUNING #LARGE LANGUAGE MODELS #MACHINE LEARNING #ONE-SHOT LEARNING #PROMPT ENGINEERING #ZERO-SHOT LEARNING

I was wrong - we've not reached peak ChatGPT hype yet... - Strap yourselves in, folks, we're in for a wild ride! ChatGPT's new API has reignited my excitement for Large Language Models, just like the start of the dot com boom. With the pricing now 10 times cheaper, a flurry of creative and previously unthinkable use cases is within our grasp. Despite earlier doubts, I now believe we're scaling the peak of inflated expectations. Can't wait to see the innovative applications that will spring from this!

It's Plausible, But Is It True? - Here's a wild ride with ChatGPT and other large language models! They're ace at cooking up plausible-sounding text, but they're not always the best when it comes to spitting out the truth - they've got a funky relationship with facts. One research paper showed they can come out with believable but totally fake answers to seemingly straightforward facts. But when I messed around with various models, there were a few discrepancies. Some got it right or plausibly wrong, but we humans are pretty gullible and tend to believe plausible-sounding info. So when it comes to using ChatGPT, make sure you fact-check, stay away from complex reasoning tasks, and don't try and solve maths problems - seriously, just stop. But it's a cracking tool for generating marketing copy, code (with a fact-check), finding bugs, and getting the creative juices flowing. Stay tuned though - tech's ever-evolving and these intelligent library computers aren't going away anytime soon!

Adding Memory To ChatGPT - Exploring the capabilities of ChatGPT, particularly GPT-4, I exposed a shortcoming regarding the model's ability to remember or store information it has 'thought' about during a dialogue sequence. Probing deeper, I developed an experimental system named ChatGPT Memory to input detailed information into the system like 'dreams', 'goals', 'inner dialogue' and more. While this method doesn't make the AI truly sentient, it definitely pushes the envelope and leads to interesting outputs. Although there are limitations, especially when handling more complex tasks, the enhancements present an exciting prospect for future iterations of the model.

Improving My Blog Using AI - In this blog post, I detail how I've made significant improvements to my long-time blog by integrating advanced AI tools. Using OpenAI's ChatGPT, I've successfully automated the tagging of my articles and generating precise summaries for each, enhancing my blog's navigability and readability. I also discuss how I used AI to generate related content, making my blog more engaging and interconnected. To top it off, I hint at an upcoming feature: AI-created images to beautify my blog posts!

Do you need a ChatGPT plugin? - We've seen two major shifts in technology trends with websites and mobile apps - now there's a third one rearing its head. OpenAI's ChatGPT with plugins is on the cards and you better be ready for it. In the midst of fumbling for answers to whether we need these plugins or not, let me reassure you that it's not too complex. Far from requiring a squad of specialist developers, all you need to know is how to make an API to create a plugin for ChatGPT. Yes, there are potential pitfalls around security and data protection, but with the right precautions, you will be fine. So, dear developer, explore, experiment and gear up for this exciting phase!

Why does ChatGPT make mistakes - a layman's explanation

Zero-Shot Learning

One Shot Learning

Few Shot Learning

Fine-tuning

What’s coming next

Related Posts

Related Videos

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...