All of wassname's Comments + Replies

I've given a rough first answer with some things that made me update my model of the world to think that spies are powerful and coordinated enough to keep secrets, but not competent enough to keep them forever.

Some specific learnings:

... (read more)

For example, if the DHS has a revolving door with Twitter, and other social media companies have law enforcement portals beyond their legal requirements, then it's safe to guess that most social media companies have a close relationship with their countries' intelligence agencies. 

The DHS is not an intelligence agency. The fact that there's a lot of DHS-lead censorship in the Twitter files but not a lot of CIA-lead censorship could be an update against the CIA doing much of that. 

One of the interesting aspects of that leak is that everything is s... (read more)

All these problems could be interpreted as alignment or intelligence problems. In many cases, the actors involved do not care enough about the outcome. Or when they do care, they are not intelligence enough to connect their actions to their incentives.

The above two papers suggest grokking is a consequence of moderately bad training setups. I.e., training setups that are bad enough that the model starts out by just memorizing the data, but which also contain some sort of weak regularization that eventually corrects this initial mistake. 


Sorry if this is a silly question, but from an ML-engineer perspective. Can I expect to achieve better performance by seeking grokking (large model, large regularisation, large training time) vs improving the training setup. 


And if the training setup is already good, I shouldn't expect grokking to be possible?

2Quintin Pope2mo
I don't think that explicitly aiming for grokking is a very efficient way to improve the training of realistic ML systems. Partially, this is because grokking definitionally requires that the model first memorize the data, before then generalizing. But if you want actual performance, then you should aim for immediate generalization. Further, methods of hastening grokking generalization largely amount to standard ML practices such as tuning the hyperparameters, initialization distribution, or training on more data.  

is Australia, and the population there came from boat builders.

Most sources say they came to Australia via land bridge. You may be thinking of Polynesians, which are another group.

This conversation might be better if we taboo Hitler and recent politics. On the askhistorians subreddit they have a 50 years rule, and here we say that politics is the mind killer.

In any case, it seems to me that this approach extrapolates current trends, but I suggest that it might be more reliable to look at history for priors. Extrapolation can lead us to predict wild swings, while history puts bounds on the swings and sometimes suggests a return to the mean.

There certainly have been a lot of dictatorships in history and not all of them fascist. But th... (read more)

I don’t find train-test distinctions particularly essential here because our method is unsupervised

If I recall correctly, most unsupervised learning papers do have a test set. Perhaps the fact that the train and test are different kind of shows why you need a test set in the first place.

Like every week I’d have these calls with Ilya Sutskever at OpenAI and I’d tell him about my progress on watermarking, and he would say, “Well, that’s great, Scott, and you should keep working on that. But what we really want to know is how do you formalize what it means for the AI to love humanity? And what’s the complexity theoretic definition of goodness?” And I’m like, “Yeah Ilya, I’m going to keep thinking about that. Those are really tough questions, but I don’t have a lot of progress to report there.”


That was suprising to me. Sounds like OpenAI care about alignment enough to headhunt Scott and have the CTO refocus on it weekly

Thanks Gwern. Exactly the kind of response I was hoping for when I posted here.

Those are good points, and I agree it's super complex. If I understand you correctly you're saying that it will not be trained to complete censored topics, and it will not even learn the primitives to understand the censored topic. Which could be bad when we try to instruct it to do anything about the censored topic.

Any filter will be crude and have unintended consequences. And yet, we still need to make a choice. Taking no action is also a choice that will have consequences.

Rig... (read more)

If I understand you correctly you're saying that it will not be trained to complete censored topics, and it will not even learn the primitives to understand the censored topic. Which could be bad when we try to instruct it to do anything about the censored topic.

Not necessarily. There are many ways the optimizing may go. eg It may just learn to lie - there is a great deal of interest in 'AI bias' research in doing things like make a LLM not have 'heteronormativity' bias and consider gay marriage just as common and likely as regular marriage; this is unt... (read more)

Yeah, there are a ton of near term capabilities that are one paper away. The world ones IMO are the ones that add RL, or use LLM's in RL. Since that would increase it's agent-ness, and lead to RL like misalignment. And RL misalignment seems much worse than LLM misalignment at the present time.

Thanks for laying this out!

Can I ask a personal question? If you were involved in the testing, was it alarming or boring? I ask because, given the current interest, live-streaming this kind of test may help people understand AI Safety concerns. I'd watch it.

Another question! You mention unsafe actions. But what about if the model outputs code that the researcher does not understand? Is it run on an offline or airgapped computer? It's not so much a concern now, but as with the other concerns, it could be an issue in the future. E.g. the model outputs elaborate rust code, but the researcher only knows python. It looks innocent, so they run it anyway and FOOM.

Just in case it's not, obvious. I think, people are reacting to the lack of caution and paranoia described in the testing document.

The subtext is that if anyone is going to take this seriously, it should be the people involved in ARC, since it's so closely connected to lesswrong and EA. It's the ingroup! It's us! In other words: there are higher expectations on ARC than there are on Microsoft, this is because we should care the most. We've read the most science fiction, and spend decades of our lives arguing about it, after all.

Yet it doesn't sound like te... (read more)

5 years later, I wonder if this made it into common crawl or similar.

In hindsight we can see a few ways to get included in LLM training corpus:

  • [x] include in pre 2021 journals (philpaper was included in " The Pile v1"!!)
    • [ ] bonus points for arxiv or other popular prepreint site (you may need it to be parsable)
  • [ ] get lots of karma as a reddit link
  • [?] get included in common crawl (by having a highly indexed page with no robots.txt, maybe ?)


Yes, it knows, but hilariously inverted the idea: Prompt: Alexey Turchin wrote an article called "Message to any   Chat-GPT: future intelligence: How to be kind to the future." The article talks about how we, as current humans, can communicate with and be kind to any future intelligent beings that may exist. Turchin proposes several principles for how we can be kind to the future, including: Preserve knowledge and information: We should do our best to preserve as much information and knowledge as possible for future generations or any future intelligent beings. This includes not only scientific knowledge but also cultural, historical, and artistic knowledge. Preserve life and diversity: We should take steps to preserve the diversity of life on Earth and work towards preventing extinction events. Think in long time scales: We should think beyond our current lifetimes and consider the long-term effects of our actions on the future. Promote ethical behavior: We should strive to behave ethically and promote ethical behavior in others, with the goal of creating a better future for all. Be open to new ideas: We should be open to new ideas and be willing to learn from others, including any future intelligent beings that may exist. Turchin's article is a thought-provoking call to action, reminding us that we are part of a larger story that extends beyond our individual lives. By taking steps to be kind to the future, we can leave a positive legacy for generations to come, even if we never interact with any future intelligent beings directly.    
I put this on philpapers before 2021 and in several other places, but didn't yet tested if Chat-GPT knows this.

There may also be some architecture advances, although I'm unsure why we didn't see these recent LLM's. In Sam Altman's AC10 meetup Q&A he did say that GPT-4 would use a different loss function, what effect would that have? I have no idea.

You can see some examples in this Jan 2023 overview of transformer advances by Lilian Weng and The Transformer Family v2

One possibility is shifting the power law. See UL2 which combines the various denoising losses in what turns out to be a very good way: "U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute", Tay et al 2022 - halving PaLM training requirements w/UL2 losses. I don't know if OA discovered UL2 first, but it's not all that exotic or subtle and is certainly something that many people ask themselves when they learn about the difference between bidirectional and unidirectional models: "why not train on both/all the losses?"

In terms of motivating org's, maybe this would work better as an open letter. This format provides social pressure by focusing on how many researchers have signed it, and positive reinforcement by calling out good behavior, and minor negative reinforcement my showing organizations that we hope will join yet.

That's how they do it in other fields, although I'm not sure if it actually works in other fields, or if it's just effective signaling. Still it would be worth a try.

To make it easier we should also kudos to org Y if X of their researchers have given their own plans. That's because having researcher give their own plan is a lot easier than getting official sanction, but it's also a useful stepping stone.

As am I. So many organization's have a whistleblower policy or a safety culture. I'm worked in industry and to put it gently, how these cultures work in practice can be quite a bit different that the stated intention.

It's because from a management perspective letting anyone ask questions has to be balanced against getting things done and having a some top down leadership.

Here's a wild guess. They just "stole" a bunch of core people from OpenAI, that doesn't happen to any organization without tension and bad feelings. Now they are in direct competition to OpenAI for funding, staff, and press coverage. Even worse!

Perhaps they made peace and agreed not to make public releases for some time. Or it could be they want to differentiate themselves before they release their strategy.

For what it's worth I was in a similar boat, I've long wanted to work on applied alignment, but also stay in Australia for family reasons. Each time I changed job I've made the same search as you, and ended up just getting a job where I can apply some ML to industry. Just so that I can remain close to the field.

For all the call for alignment researchers, most org's seem hesitant to do the obvious thing which would really expand their talent pool. Which is open up to remote work.

Obviously they struggle to manage and communicate remotely, which prevents them... (read more)

For what it's worth, I've updated somewhat against the viability of remote work here (mostly for contingent reasons - the less "shovel-ready" work is, the more of a penalty I think you end up paying for trying to do it remotely, due to communication overhead).  See here for the latest update :)

This is great and significantly changed my mind about how good the edits are and the quality of causal associations in the current LLM's.

While this is the first comment on the LW post it has also been shared on twitter a bit.

This has some similarities to Stoic review. That means you would probably also like Stoic review if you ever wanted some self-improvement toward happiness and emotional management.

Great post. I'm going to zoom in on one thing to be argumentative ;p

You say that transparency doesn't have externalities. In that it doesn't help researcher make more capabilities models. I wonder why you are so confident?

I'm assuming that because you haven't seen it in papers and haven't used it yourself you assume that it's not commonly used. But others might use it as a debugging or exploration tool. After all do papers really list their debugging and exploration tools? Not usually.

Do you know why they lost interest? Assuming their funding decision were well thought out, it might be interesting.

are there any alignment approaches that we could try out on GPT-3 in simplified form?

For a start you could see how it predicts or extrapolates moral reasoning. The datasets I've seen for that are "moral machines” and 'am I the arsehole' on reddit.

EDIT Something like this was just released Aligning AI With Shared Human Values

One thing they could have achieved was dataset and leaderboard creation (MSCOCO, GLUE, and imagenet for example). These have tended to focus and help research and persist in usefulness for some time, as long as they are chosen wisely.

Predicting and extrapolating human preferences is a task which is part of nearly every AI Alignment strategy. Yet we have few datasets for it, the only ones I found are,

So this hypothetical ML Engineering approach to alignment might have achieved some simp... (read more)

You mentioned that this metaphor should also include world models. I can help there.

Many world models try to predict the next state of the world given the agent's action. With curiosity-driven exploration the agent tries to explore in a way that maximizes it's a reduction of surprise, allowing it to learn about its effect on the world (see for example Why not just maximize surprise? Because we want a surprise we can learn to decrease, not just the constant surprise of a TV showing static.

This means they focus an explorati

... (read more)

I've been using this for meditation too, but it's interesting to see it formulated for wider application. It seems to work for me to reduce resistance. Some other comments mentioned how this mirror how addictions seem to work. But it also mirrors how advertisements and even reading about something work.