So 2023 started with a big buzz on ChatGPT. OpenAI announced ChatGPT availability just before Christmas, and soon the internet was abuzz with excitement. ChatGPT app gained some 1 million users after 5 days, and 10million users after 40 days. This is way faster that previous upstarts like Instagram, Twitter etc…
So what exactly is ChatGPT? It’s an intelligent chatbot driven by the GPT-3 model. GPT stand for “Generative Pre-training Transformer”. Essentially, its a weighted Neural Network with 175 billion weighted connections. This is the largest trained neural network to-date. Microsoft maintains the next largest “Turing-NLG” model at 17 billion connections. GPT-4 will have ~500 billion connections. The human brain has approximately 86 billion neurons (sometimes less). The model includes reinforcement learning, and supervised learning. There are also aspects to the architecture which include components like The Encoder, The Decoder, Language Model, Pre-trainers, Fine-tuners etc…
So ChatGPT-3 has sparked lots of interesting conversations and use cases. People have started to apply it to work, assignments, content-generation, coding, and general life questions. This was made possible through the training of the model with a terabyte of data. Will it surpass the Google search engine simply by being able to create a more intelligent answer beyond the search result?
- The Transformer architecture: The Transformer architecture is the foundation of ChatGPT. It is a neural network architecture that uses self-attention mechanisms to process input sequences. The transformer architecture is able to handle input sequences of varying lengths and allows for parallel processing of the input.
- The Encoder: The Encoder is composed of multiple layers of self-attention and feed-forward neural networks. It processes and understands the input text.
- The Decoder: The decoder is also composed of multiple layers of self-attention and feed-forward neural networks. It generates the output text.
- The Language Model Head: The language model head is a linear layer with weights that are learned during pre-training. It is used to predict the next token in the sequence, given the previous tokens.
- The Dialogue Generation Head: The dialogue generation head is a linear layer with weights that are learned during fine-tuning the model on conversational data. It is used to generate the response to a given prompt in the context of a dialogue.
- Pre-training: ChatGPT is pre-trained on a large dataset of text, which enables it to generate human-like text in response to a given prompt.
- Fine-Tuning: The model is fine-tuned on conversational data to improve its ability to generate responses in the context of a dialogue.
Training is another interesting consideration. How much data is needed to train GPT-3? (570 gigabytes of text). How long did it take? ChatGPT is trained (censored/biased) to avoid returning harmful answers. As the size of the model increases, training time and data will also significantly increase.
A concern for the future is that ChatGPT model are biased. Another concern is that humans continue to get progressively obsolete. Ethics for AI is going to be crucial for humanity.
- GPT-1 (Generative Pre-trained Transformer 1) was the first version (in June 2018) of ChatGPT released by OpenAI. It was pre-trained on a dataset of 40GB of text data and had a capacity of 1.5 billion parameters.
- GPT-2 (Generative Pre-trained Transformer 2) was released shortly after in Feb 2019. It was pre-trained on a much larger dataset of 570GB of text data and had a capacity of 1.5 trillion parameters, making it ten times larger than GPT-1.
- GPT-3 (Generative Pre-trained Transformer 3) was released in 2020. It was pre-trained on a massive dataset of 570GB of text data and had a capacity of 175 billion parameters. It was fine-tuned for a wide range of language tasks, such as text generation, language translation, and question answering.
- GPT-4 (Generative Pre-trained Transformer 4) was released in 2021, it was pre-trained on a massive dataset of many terabytes of text data and had a capacity of over 500 billion parameters. It was fine-tuned for a wide range of language tasks, such as text generation, language translation, and question answering with even more accuracy and fluency than GPT-3.
Current Limitations of ChatGPT
- GPT-3 lacks long-term memory — the model does not learn anything from long-term interactions like humans.
- Lacks interpretability — this is a problem that affects extremely large and complex in general. GPT-3 is so large that it is difficult to interpret or explain the output that it produces.
- Limited input size — transformers have a fixed maximum input size and this means that prompts that GPT-3 can deal with cannot be longer than a few sentences.
- Slow inference time — because GPT-3 is so large, it takes more time for the model to produce predictions. Imagine how long GPT-4 will take?
- GPT-3 suffers from bias — all models are only as good as the data that was used to train them and GPT-3 is no exception. The data for GPT-3 and other large language models contain biases. This already intentionally includes “hate”-speech biases, religious biases, political biases.
- Training Time – With GPT-4 coming-in with 500B parameters, we can see a 2.8x increase in parameters. Is the trajectory slowing down?
Here are some links to articles: