dwmd — LessWrong

LESSWRONG
LW

dwmd — LessWrong

Replying toIs 90% of code at Anthropic being written by AIs?

Is 90% of code at Anthropic being written by AIs?

If we're going to evaluate AI's impact on software engineering, or really any industry, we need to measure throughput. I suppose with code this could be something like unaltered lines that go directly from AI to prod, but even that might not capture the effectiveness of the AI vs the effectiveness of the prompt writer.

I have no doubt it's been a boon for research (it has helped me do rapid prototyping), but we do need some better measures of its productivity. Right now all we have are high-level proxies like labor market impacts and low-level noise like volume of code.

LLM Self-Reference Language in Multilingual vs English-Centric Models

dwmd

4mo

My first post explored "self-talk" induction in small base LLMs. After further contemplation, I decided that first I ought to better understand how LLMs represent "self" mechanistically before examining induced "self-talk". How do language models process questions about themselves? I've started by analyzing attention entropy patterns^[1] across self-referent prompts ('Who are you?'), neutral fact-retrieval prompts ('What is photosynthesis?'), and control categories in three model families. I find suggestive evidence that instruction tuning systematically changes how models process self-referent language (unsurprising), but the direction of change depends on the model's training corpus (potentially interesting). English-centric models (Mistral, Llama) show compression of layer-wise self-referent processing attention patterns toward neutral fact-retrieval patterns, while a multilingual model... (read 1542 more words →)

Minimal Prompt Induction of Self-Talk in Base LLMs

dwmd

4mo

Note: This is my first LessWrong post. I’m sharing initial observations of a small empirical study on open-source LLM behavior. These observations concern linguistic dynamics rather than literal agency, and I welcome replication, critique, and other pointers around this kind of research.

These are empirical notes on basal language dynamics, attractors, and how we might induce early goal-seek language patterns in base models as opposed to instruction-tuned model outputs.

Summary

Across ~20 iterations per condition, the base model produced no structured output under empty or single-token prompts, with structured role-based language appearing consistently only after minimal instruction priming. Future analysis will test across model families.

Background and Motivation

It's well-known that LLM base models are far different... (read 1309 more words →)