Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: 1. Dataset size 2. Model parameters 3. Training compute (steps or epochs) The lower the cross-entropy loss, the better the model’s next-token prediction. Since prediction and compression are deeply linked (The...
The Platonic Representation Hypothesis (PRH) suggests that models trained with different objectives and on various modalities can converge to a shared statistical understanding of reality. While this is an intriguing idea, initial experiments in the paper focused on image-based models (like ViT) trained on the same pretraining (ImageNet) dataset. This...
Recent work has demonstrated that transformer models can perform complex reasoning tasks using Chain-of-Thought (COT) prompting, even when the COT is replaced with filler characters. This post summarizes our investigation into methods for decoding these hidden computations, focusing on the 3SUM task. Background 1. **Chain-of-Thought (COT) Prompting**: A technique that...
Consider an adversarial training-scheme for solving goal-misgeneralization, ( here i consider Redwood Research’s work on “Adversarial Training for High-Stakes Reliability”). Consider a model that was trained to perform a specific task. To guarantee worst-case performance for this model, we need to have bounds for its outputs in adversarial examples. However,...
The aim of the Hutter Prize is to compress the first 1GB of Wikipedia to the smallest possible size. From the AIXI standpoint, compression is equal to AI, and if we can compress this to the ideal size (75MB according to Shannon's lower estimate), then the compression algorithm is equivalent...
Imagine that you are a trained mathematician and you have been assigned the job of testing an arbitrarily intelligent chatbot for its intelligence. You being knowledgeable about a fair amount of computer-science theory won’t test it with the likes of Turing-test or similar, since such a bot might not have...