Wiki Contributions


Can you give a concrete example of a safety property of the sort that are you envisioning automated testing for? Or am I misunderstanding what you're hoping to see?

For example a human can to an extent inspect what they are going to say before they say or write it. Before saying Gary Marcus was "inspired by his pet chicken, Henrietta" a human may temporarily store the next words they plan to say elsewhere in the brain, and evaluate it.

Transformer-based also internally represent the tokens they are likely to emit in future steps. Demonstrated rigorously in Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, though perhaps the simpler demonstration is simply that LLMs can reliably complete the sentence "Alice likes apples, Bob likes bananas, and Aaron likes apricots, so when I went to the store I bought Alice an apple and I got [Bob/Aaron]" with the appropriate "a/an" token.

I think the answer pretty much has to be "yes", for the following reasons.

  1. As noted in the above post, weather is chaotic.
  2. Elections are sometimes close. For example, the winner of the 2000 presidential election came down to a margin of 537 votes in Florida.
  3. Geographic location correlates reasonably strongly with party preference.
  4. Weather affects specific geographic areas.
  5. Weather influences voter turnout[1] --

During the 2000 election, in Okaloosa County, Florida (at the western tip of the panhandle), 71k of the county's 171k residents voted, with 52186 votes going to Bush and 16989 votes going to Gore, for a 42% turnout rate.

On the day of November 7, 2000, there was no significant rainfall in Pensacola (which is the closest weather station I could find with records going back that far). A storm which dropped 2 inches of rain on the tip of the Florida panhandle that day would have reduced voter turnout by 1.8%,[1] which would have resulted in a margin that leaned 634 votes closer to Gore. Which would have tipped Florida, which would in turn have tipped the election.

Now, November is the "dry" season in Florida, so heavy rains like that are not incredibly common. Still, they can happen. For example, on 2015-11-02, 2.34 inches of rain fell.[2] That was only one day, out of the 140 days I looked at, which would have flipped the 2000 election, and the 2000 election was, to my knowledge, the closest of the 59 US presidential elections so far. Still, there are a number of other tracks that a storm could have taken, which would also have flipped the 2000 election.[3] And in the 1976 election, somewhat worse weather in the great lakes region would likely have flipped Ohio and Wisconsin, where Carter beat Ford by narrow margins.[4]

So I think "weather, on election day specifically, flips the 2028 election in a way that cannot be foreseen now" is already well over 0.1%. And that's not even getting into other weather stuff like "how many hurricanes hit the gulf coast in 2028, and where exactly do they land?".

  1. ^

    Gomez, B. T., Hansford, T. G., & Krause, G. A. (2007). The Republicans should pray for rain: Weather, turnout, and voting in US presidential elections.

    "The results indicate that if a county experiences an inch of rain more than what is normal for the county for that election date, the percentage of the voting age population that turns out to vote decreases by approximately .9%.".

  2. ^

    I pulled the weather for the week before and after November 7 for the past 10 years from the api and that was the highest rainfall date.

    var precipByDate = {}  
    for (var y = 2014; y < 2024; y++) {  
       var res = await fetch('<redacted>&units=e&startDate='+y+'1101&endDate='+y+'1114').then(r => r.json());  
       res.observations.forEach(obs => {  
           var d = new Date(obs.valid_time_gmt*1000);  
           var ds = d.getFullYear()+'-'+(d.getMonth()+1)+'-'+d.getDate();  
           if (!(ds in precipByDate)) { precipByDate[ds] = 0; }  
           if (obs.precip_total) { precipByDate[ds] += obs.precip_total }  
    Object.entries(precipByDate).sort((a, b) => b[1] - a[1])[0]
  3. ^

    Looking at the 2000 election map in Florida, any good thunderstorm in the panhandle, in the northeast corner of the state, or on  the west-middle-south of the peninsula would have done the trick.

  4. ^ -- Carter won Ohio and Wisconsin by 11k and 35k votes, respectively.

An attorney rather than the police, I think.

Also "provably safe" is a property a system can have relative to a specific threat model. Many vulnerabilities come from the engineer having an incomplete or incorrect threat model, though (most obviously the multitude of types of side-channel attack).

Counterpoint: Sydney Bing was wildly unaligned, to the extent that it is even possible for an LLM to be aligned, and people thought it was cute / cool.

The two examples everyone loves to use to demonstrate that massive top-down engineering projects can sometimes be a viable alternative to iterative design (the Manhattan Project and the Apollo Program) were both government-led initiatives, rather than single very smart people working alone in their garages. I think it's reasonable to conclude that governments have considerably more capacity to steer outcomes than individuals, and are the most powerful optimizers that exist at this time.

I think restricting the term "superintelligence" to "only that which can create functional self-replicators with nano-scale components" is misleading. Concretely, that definition of "superintelligence" says that natural selection is superintelligent, while the most capable groups of humans are nowhere close, even with computerized tooling.

Looking at the AlphaZero paper

Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value network12 into a single architecture. The neural network consists of many residual blocks4 of convolutional layers16,17 with batch normalization18 and rectifier nonlinearities19 (see Methods).

So if I'm interpreting that correctly, the NN is used for both position evaluation and also for the search part.

As I will expand upon later, this contrast makes no sense. We are not going to have machines outperforming humans on every task in 2047 and then only fully automating human occupations in 2116. Not in any meaningful sense.

Maybe people are interpreting "task" as "bounded, self-contained task", and so they're saying that machines will be able to outperform humans on every "task" but not on the parts of their jobs that are not "tasks".

The exact wording of the question was

Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. 

It does not appear that the survey had any specific guidance on how to interpret the word "task", so it wouldn't surprise me that much if people consider their job to be composed of both things that are tasks and also things that are not tasks, and that the things that are not tasks will take longer to automate.

  • Gradientware? Seems verbose and isn't robust to other ML approaches to fit data.
  • Datagenicware? Captures the core of what makes them like that, but it's a mouthful.
  • Modelware? I don't love it
  • Puttyware? Aims to capture the "takes the shape of its surroundings" aspect, might be too abstract though. Also implies that it will take the shape of its current surroundings, rather than the ones it was built with
  • Resinware - maybe more evocative of the "was fit very closely to its particular surroundings", but still doesn't seem to capture quite what I want
Load More