
Introduction
The Tensor Collective Machine learning is inherently simple. Take a random matrix, apply it to your input and calculate a number that is proportional to how close you are to your target for said input. Now use an automatic differentiation package to calculate how much you need to change the parameters to get a better model. Repeat this enough times and you’re on your way to the vast majority of deep learning algorithms....

Efficiency is good, but scale is better
Jan 9 2025 Introduction I recently came across a very interesting paper by Meta, Pagnoni 2024 et al.. The title of which, Byte Latent Transformer: Patches Scale Better Than Tokens, has a rather interesting assumption; it may be more desirable to scale better than to simply perform better. The main result of their paper is unsurprisingly that their new transformer, BLT, scales better than previous SOTA techniques: ![[Pasted image 20250109011752.png]] Figure 1 Pagnoni 2024 et al....

From Matrices to Transformers
Introduction When I first read about the transformer architecture, I found the grounding for the decisions that govern the attention mechanism difficult to decipher from the equations and the many “let’s build a transformer from scratch in pytorch” articles alone. I’ll give my best attempt here to decompose it’s building blocks to those that are interested in understanding the lower-level components that lead to the great models we see today....

Hallucination isn't a bug, it's a feature
Simple insights into the cognition behind large language models. With the advent of large language models (LLMs), much research and debate has naturally revolved around the flaws large language models exhibit. Ever wondered how ChatGPT manages to overlook aspects, but will immediately apologise and correct it’s error once prompted? Here, we gently introduce how hallucinations are related to the cognition ongoing within a large language model. Discussing ways to probe the true knowledge within these models....