Introduction
The Tensor Collective Machine learning is inherently simple. Take a random matrix, apply it to your input and calculate a number that is proportional to how close you are to your target for said input. Now use an automatic differentiation package to calculate how much you need to change the parameters to get a better model. Repeat this enough times and you’re on your way to the vast majority of deep learning algorithms....
From Matrices to Transformers
Introduction When I first read about the transformer architecture, I found the grounding for the decisions that govern the attention mechanism difficult to decipher from the equations and the many “let’s build a transformer from scratch in pytorch” articles alone. I’ll give my best attempt here to decompose it’s building blocks to those that are interested in understanding the lower-level components that lead to the great models we see today....
Hallucination isn't a bug, it's a feature
Simple insights into the cognition behind large language models. With the advent of large language models (LLMs), much research and debate has naturally revolved around the flaws large language models exhibit. Ever wondered how ChatGPT manages to overlook aspects, but will immediately apologise and correct it’s error once prompted? Here, we gently introduce how hallucinations are related to the cognition ongoing within a large language model. Discussing ways to probe the true knowledge within these models....