The ones who are most successful at writeathons (Inkhaven, NaNoWriMo) are those with an overhang of things to say, usually in the form of: * draft posts * daydreams When Scott Alexander said: > Whenever I see a new person who blogs every day, it's very rare that that never...
This is my project proposal for Pivotal. Apply as a mentee by May 3rd The field has accumulated a vocabulary of computational primitives (induction heads, skip-trigrams) through post-hoc analysis. We propose building a toy language from these known primitives to train tensor-transformers (see an early example in the last section)...
AKA scalable oversight of value drift TL;DR LLMs could be aligned but then corrupted through RL, instrumentally converging on deep consequentialism. If LLMs are sufficiently aligned and can properly oversee their training updates, we they can prevent this. SOTA models can arguably be considered ~aligned,[1] but this isn't my main...
What's the best case scenario regarding OpenAI's contract w/ the Department of War (DoW)? * We have access to the full contract * It's airtight * OAI's engineers are on top of things in case the DoW breaks the contract * There's actual teeth for violations But even then, the...
I'll have a weird dream and wake up in a funk. Be overwhelmed w/ work. Read lots of news/reddit and become very upset or angry. Obviously it's good to feel these things, but sometimes I continue to feel awful no matter how hard I try to "process my feelings" (or...
I've been researching tensor networks as a more interpretable architecture, but whenever I tell people this, they always ask "But is it any good?" So I trained multiple 500M parameter LLMs on fineweb, showing the tensor variant needed ~4% more batches of data to match CE-loss. There's a few caveats,...
I was doing the online version of the Jhourney retreat where they try to teach you the jhanas (narrator: he did not learn the jhanas). Part of what wsas taught was to work on your curiousity, which I chose to practice noticing surprisal. It's ~impossible to predict low-level details of...