Was on Vivek Hebbar's team at MIRI, now working with Adrià Garriga-Alonso on various empirical alignment projects.
I'm looking for projects in interpretability, activation engineering, and control/oversight; DM me if you're interested in working with me.
I talked about this with Lawrence, and we both agree on the following:
To some degree yes, but I expect lots of information to be spread out across time. For example: OpenAI releases GPT5 benchmark results. Then a couple weeks later they deploy it on ChatGPT and we can see how subjectively impressive it is out of the box, and whether it is obviously pursuing misaligned goals. Over the next few weeks people develop post-training enhancements like scaffolding, and we get a better sense of its true capabilities. Over the next few months, debate researchers study whether GPT4-judged GPT5 debates reliably produce truth, and control researchers study whether GPT4 can detect whether GPT5 is scheming. A year later an open-weights model of similar capability is released and the interp researchers check how understandable it is and whether SAEs still train.
You should update by +-1% on AI doom surprisingly frequently
This is just a fact about how stochastic processes work. If your p(doom) is Brownian motion in 1% steps starting at 50% and stopping once it reaches 0 or 1, then there will be about 50^2=2500 steps of size 1%. This is a lot! If we get all the evidence for whether humanity survives or not uniformly over the next 10 years, then you should make a 1% update 4-5 times per week. In practice there won't be as many due to heavy-tailedness in the distribution concentrating the updates in fewer events, and the fact you don't start at 50%. But I do believe that evidence is coming in every week such that ideal market prices should move by 1% on maybe half of weeks, and it is not crazy for your probabilities to shift by 1% during many weeks if you think about it often enough. [Edit: I'm not claiming that you should try to make more 1% updates, just that if you're calibrated and think about AI enough, your forecast graph will tend to have lots of >=1% week-to-week changes.]
I'm not so sure that shards should be thought of as a matter of implementation. Contextually activated circuits are a different kind of thing from utility function components. The former activate in certain states and bias you towards certain actions, whereas utility function components score outcomes. I think there are at least 3 important parts of this:
The first two are behavioral. We can say an agent is likely to be shardful if it displays these types of incoherence but not others. Suppose an agent is dynamically inconsistent and we can identify features in the environment like cheese presence that cause its preferences to change, but mostly does not suffer from the Allais paradox, tends to spend resources on actions proportional to their importance for reaching a goal, and otherwise generally behaves rationally. Then we can hypothesize that the agent has some internal motivational structure which can be decomposed into shards. But exactly what motivational structure is very uncertain for humans and future agents. My guess is researchers need to observe models and form good definitions as they go along, and defining a shard agent as having compositionally represented motivators is premature. For now the most important thing is how steerable agents will be, and it is very plausible that we can manipulate motivational features without the features being anything like compositional.
Hangnails are annoying and painful, and most people deal with them poorly. [1] Instead, use a drop of superglue to glue it to your nail plate. It's $10 for 12 small tubes on Amazon. Superglue is also useful for cuts and minor repairs, so I already carry it around everywhere.
Hangnails manifest as either separated nail fragments or dry peeling skin on the paronychium (area around the nail). In my experience superglue works for nail separation, and a paper (available free on Scihub) claims it also works for peeling skin on the paronychium.
Cyanoacrylate glue is regularly used in medicine to close wounds, and now frequently replaces stitches. Medical superglue has slightly different types of cyanoacrylate, but doctors I know say it's basically the same thing.
I think medical superglue exists to prevent rare reactions and for large wounds where the exothermic reaction from a large quantity might burn you, and the safety difference for hangnails is minimal [2]. But to be extra safe you could just use 3M medical grade superglue or Dermabond.
[1]: Typical responses to hangnails include:
[2]: There have been studies showing cytotoxicity in rabbits when injecting it in their eyes, or performing internal (bone or cartilage) grafts. A 2013 review says that although some studies have found internal toxicity, "[f]or wound closure and various other procedures, there have been a considerable number of studies finding histologic equivalence between ECA [commercial superglue] and more widely accepted modalities of repair."
I don't believe that data is limiting because the finite data argument only applies to pretraining. Models can do self-critique or be objectively rated on their ability to perform tasks, and trained via RL. This is how humans learn, so it is possible to be very sample-efficient, and currently a small proportion of training compute is RL.
If the majority of training compute and data are outcome-based RL, it is not clear that the "Playing human roles is pretty human" section holds, because the system is not primarily trained to play human roles.
The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist.
It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times.
[1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for GPS satellites to cost more than an iPhone per kg, but Starlink wants to be cheaper.
[2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996.
[3] https://www.bts.gov/content/average-freight-revenue-ton-mile
[4] https://markets.businessinsider.com/commodities
[5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/
[6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799
Maybe Galois with group theory? He died in 1832, but his work was only published in 1846, upon which it kicked off the development of group theory, e.g. with Cayley's 1854 paper defining a group. Claude writes that there was not much progress in the intervening years:
The period between Galois' death in 1832 and the publication of his manuscripts in 1846 did see some developments in the theory of permutations and algebraic equations, which were important precursors to group theory. However, there wasn't much direct progress on what we would now recognize as group theory.
Some notable developments in this period:
1. Cauchy's work on permutations in the 1840s further developed the idea of permutation groups, which he had first explored in the 1820s. However, Cauchy did not develop the abstract group concept.
2. Plücker's 1835 work on geometric transformations and his introduction of homogeneous coordinates laid some groundwork for the later application of group theory to geometry.
3. Eisenstein's work on cyclotomy and cubic reciprocity in the 1840s involved ideas related to permutations and roots of unity, which would later be interpreted in terms of group theory.
4. Abel's work on elliptic functions and the insolubility of the quintic equation, while published earlier, continued to be influential in this period and provided important context for Galois' ideas.
However, none of these developments directly anticipated Galois' fundamental insights about the structure of solutions to polynomial equations and the corresponding groups of permutations. The abstract concept of a group and the idea of studying groups in their own right, independent of their application to equations, did not really emerge until after Galois' work became known.
So while the 1832-1846 period saw some important algebraic developments, it seems fair to say that Galois' ideas on group theory were not significantly advanced or paralleled during this time. The relative lack of progress in these 14 years supports the view of Galois' work as a singular and ahead-of-its-time discovery.
This is indeed a crux, maybe it's still worth talking about.
The Brownian motion assumption is rather strong but not required for the conclusion. Consider the stock market, which famously has heavy-tailed, bursty returns. It happens all the time for the S&P 500 to move 1% in a week, but a 10% move in a week only happens a couple of times per decade. I would guess (and we can check) that most weeks have >0.6x of the average per-week variance of the market, which causes the median weekly absolute return to be well over half of what it would be if the market were Brownian motion with the same long-term variance.
Also, Lawrence tells me that in Tetlock's studies, superforecasters tend to make updates of 1-2% every week, which actually improves their accuracy.