Table of Content


Created on: 2024-01-14 ; Updated on: 2024-01-15

AI - Machine learning - Concepts

Intro

Normally, when we consider artificial intelligence, we mainly think of intelligence that would allow it to think for itself. Currently, with the computers we have and the state of research, we are far from being at this stage. Everything that is advertised as artificial intelligence currently is actually machine learning.

Machine learning is about giving information as input as well as desired information as output and by training the model with pre-cleaned information, this allows that when a person gives specific inputs, the algorithm will try to guess which is the most likely outcome. This is purely mathematical statistics and there is no reasoning behind it.

Let’s take a simple and concrete example: spam. We could have a model in which we enter all the words that are found in an email and the model will tell us the percentage of chance that it is spam. To do this, you need a big database with lots of emails inside: many that are legitimate and many that are spam. Each of these emails is marked by a human whether it is spam or not. So during learning, there is no need to know exactly which words in an email make it considered a good or bad email, but by seeing them, the model will realize that these are words or groups of words that make an email legitimate or not. For example, internally, it could automatically detect that when there are a lot of spelling and grammatical errors that these are emails more likely to be spam.

The previous example is a model that is very specific at doing a single task which is to do categorization. There are currently several types of models for very specific tasks, but there are also models called Large Linguistic Models ( LLM), the best known is GPT which can read and write text in a way that appears to be written by a human. What’s important here is that what comes out as an answer doesn’t mean it’s a good answer. It depends on what data the model was trained on. So on any topic, if it has been trained with good information, the next statistically likely words are going to be sentences that make sense and really explain what the initial topic is. On the other hand, if it has been trained with false information and erroneous information in this case it will produce information which is not useful and which should not be taken as being true, even if the text seems very convincing. The important thing is to always check the sources.

Purpose

The purpose of this article is to provide a quick overview of the concepts of machine learning without going too deep in the actual implementations. It also has some pointer on how to start using some existing models online or locally.

My goal is to use this as a reference when I will create videos on the subject.

Process

When we want to use machine learning, we need multiple parts:

  • Choose an algorithm (which is an architecture of the model). Here are some examples:
    • Convolutional Neural Network (CNN) (for image classification tasks)
    • Recurrent Neural Network (RNN) for sequence data like text or time series
    • Transformer (for GPT-like models)
    • Stable Diffusion (for generating images)
    • Custom ones created with frameworks like TensorFlow
  • Get the training data (a dataset)
  • Train a model using the algorithm and the training data. That gives us a model which contains all the weights or parameters to use.
  • Use the model and the algorithm to make predictions in your application.

Model

Generative Model

A generative model is a model that is trained to generate content like text, images, sounds, etc.

Language Model

A language model is a generative model that is trained to generate text. It is trained on a large corpus of text and is able to generate text that is similar to the text it was trained on. Some models have more functionalities like being able to reason a bit or generate code.

In that category, we see the GPT acronym that stands for Generative Pre-trained Transformer.

Examples of language models:

Model Company Can be used commercially Website Trained on Dataset How to use it
GPT-3 and GPT-4 OpenAI Yes OpenAI / ChatGPT Online service ChatGPT or Bing
Mistral OpenOrca Mistral AI Yes Huggingface OpenOrca Free software GPT4All
Mistral Instruct Mistral AI Yes Huggingface Using a variety of publicly available conversation datasets Free software GPT4All
Mixtral 8x7B Mistral AI Yes Huggingface Download the GGUF you need on Huggingface and use the free software LLaMA.CPP
Orca 2 Microsoft Research No Microsoft Research Blog Free software GPT4All
LLaMA 2 Meta AI Yes LLaMA Free software GPT4All
GPT4All Falcon Nomic AI Yes Huggingface Free software GPT4All
Hermes Nous Research No Huggingface Synthetic GPT-4 outputs Free software GPT4All
GPT4All

This is an easy-to-use software with a graphical interface. It can download common models for you and start using them right away.

  1. Download the installer on https://gpt4all.io (Windows, OSX, Ubuntu)
  2. Install
  3. Choose the models to download
  4. Start using it
LLaMA.CPP

This software is a bit more complicated to use because it uses only command line.

  1. Download the software on https://github.com/ggerganov/llama.cpp/releases (Windows)
    • Choose depending on the technology of your CPU
  2. Open a command line like PowerShell
  3. Execute a prompt command like .\main.exe -m mixtral-8x7b-v0.1.Q4_K_M.gguf -p "Generate a PDF in php using the TCPDF library"
File format (for sharing models)

Common ones are:

  • GGML (GPT-Generated Model Language)
  • GGUF (GPT-Generated Model Unified Format)

Image Model

An image model is a generative model that is trained to generate images.

Examples of language models:

Model Company Can be used commercially Website How to use it
DALL-E 3 OpenAI Yes OpenAI Online service Bing in the creative style
Stable Diffusion Stability-AI Yes OpenAI GitHub Repository has the instructions to use with Python

Parameters

You could see that some models have a number in their name like 7B or 13B. This number is the number of parameters which are internal variables that the model learns through training. When it says a model has 7 billion parameters, it means the model has 7 billion such internal variables and these parameters capture patterns in the data the model was trained on. The number of parameters is often indicative of the model’s complexity and capacity to learn: a model with more parameters can often capture more complex patterns, but also requires more data to train effectively and is more computationally intensive.

However, it’s important to note that more parameters do not always mean better performance. The effectiveness of a model also depends on factors like the quality of the training data, the architecture of the model, and the specifics of the task at hand.

Quantisation

When a model is trained, it is trained with floating point numbers (like a 32-bits float). But when it is used, it is used with integers (like a 8-bits integer). So the model needs to be converted from floating point to an integer. This is called quantisation. If we take a look at https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/blob/main/README.md#provided-files , we can see different values like “Q2”, “Q3”, “Q4, “Q5”, “Q6” and “Q8”. The more bits, the more precise the model is, but it uses more disk space and more RAM. So it’s a tradeoff between precision and speed.