Created on: 2024-01-14 ; Updated on: 2024-01-15
Normally, when we consider artificial intelligence, we mainly think of intelligence that would allow it to think for itself. Currently, with the computers we have and the state of research, we are far from being at this stage. Everything that is advertised as artificial intelligence currently is actually machine learning.
Machine learning is about giving information as input as well as desired information as output and by training the model with pre-cleaned information, this allows that when a person gives specific inputs, the algorithm will try to guess which is the most likely outcome. This is purely mathematical statistics and there is no reasoning behind it.
Let’s take a simple and concrete example: spam. We could have a model in which we enter all the words that are found in an email and the model will tell us the percentage of chance that it is spam. To do this, you need a big database with lots of emails inside: many that are legitimate and many that are spam. Each of these emails is marked by a human whether it is spam or not. So during learning, there is no need to know exactly which words in an email make it considered a good or bad email, but by seeing them, the model will realize that these are words or groups of words that make an email legitimate or not. For example, internally, it could automatically detect that when there are a lot of spelling and grammatical errors that these are emails more likely to be spam.
The previous example is a model that is very specific at doing a single task which is to do categorization. There are currently several types of models for very specific tasks, but there are also models called Large Linguistic Models ( LLM), the best known is GPT which can read and write text in a way that appears to be written by a human. What’s important here is that what comes out as an answer doesn’t mean it’s a good answer. It depends on what data the model was trained on. So on any topic, if it has been trained with good information, the next statistically likely words are going to be sentences that make sense and really explain what the initial topic is. On the other hand, if it has been trained with false information and erroneous information in this case it will produce information which is not useful and which should not be taken as being true, even if the text seems very convincing. The important thing is to always check the sources.
The purpose of this article is to provide a quick overview of the concepts of machine learning without going too deep in the actual implementations. It also has some pointer on how to start using some existing models online or locally.
My goal is to use this as a reference when I will create videos on the subject.
When we want to use machine learning, we need multiple parts:
A generative model is a model that is trained to generate content like text, images, sounds, etc.
A language model is a generative model that is trained to generate text. It is trained on a large corpus of text and is able to generate text that is similar to the text it was trained on. Some models have more functionalities like being able to reason a bit or generate code.
In that category, we see the GPT acronym that stands for Generative Pre-trained Transformer.
Examples of language models:
Model | Company | Can be used commercially | Website | Trained on Dataset | How to use it |
---|---|---|---|---|---|
GPT-3 and GPT-4 | OpenAI | Yes | OpenAI / ChatGPT | Online service ChatGPT or Bing | |
Mistral OpenOrca | Mistral AI | Yes | Huggingface | OpenOrca | Free software GPT4All |
Mistral Instruct | Mistral AI | Yes | Huggingface | Using a variety of publicly available conversation datasets | Free software GPT4All |
Mixtral 8x7B | Mistral AI | Yes | Huggingface | Download the GGUF you need on Huggingface and use the free software LLaMA.CPP | |
Orca 2 | Microsoft Research | No | Microsoft Research Blog | Free software GPT4All | |
LLaMA 2 | Meta AI | Yes | LLaMA | Free software GPT4All | |
GPT4All Falcon | Nomic AI | Yes | Huggingface | Free software GPT4All | |
Hermes | Nous Research | No | Huggingface | Synthetic GPT-4 outputs | Free software GPT4All |
This is an easy-to-use software with a graphical interface. It can download common models for you and start using them right away.
This software is a bit more complicated to use because it uses only command line.
.\main.exe -m mixtral-8x7b-v0.1.Q4_K_M.gguf -p "Generate a PDF in php using the TCPDF library"
Common ones are:
An image model is a generative model that is trained to generate images.
Examples of language models:
Model | Company | Can be used commercially | Website | How to use it |
---|---|---|---|---|
DALL-E 3 | OpenAI | Yes | OpenAI | Online service Bing in the creative style |
Stable Diffusion | Stability-AI | Yes | OpenAI | GitHub Repository has the instructions to use with Python |
You could see that some models have a number in their name like 7B or 13B. This number is the number of parameters which are internal variables that the model learns through training. When it says a model has 7 billion parameters, it means the model has 7 billion such internal variables and these parameters capture patterns in the data the model was trained on. The number of parameters is often indicative of the model’s complexity and capacity to learn: a model with more parameters can often capture more complex patterns, but also requires more data to train effectively and is more computationally intensive.
However, it’s important to note that more parameters do not always mean better performance. The effectiveness of a model also depends on factors like the quality of the training data, the architecture of the model, and the specifics of the task at hand.
When a model is trained, it is trained with floating point numbers (like a 32-bits float). But when it is used, it is used with integers (like a 8-bits integer). So the model needs to be converted from floating point to an integer. This is called quantisation. If we take a look at https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/blob/main/README.md#provided-files , we can see different values like “Q2”, “Q3”, “Q4, “Q5”, “Q6” and “Q8”. The more bits, the more precise the model is, but it uses more disk space and more RAM. So it’s a tradeoff between precision and speed.