Wiki Contributions

Comments

Thanks a lot for the kind comment!

To scale this approach, one will want to have "structural regularizers" towards modularity, interoperability and parsimony

I am unsure of the formal architecture or requirements for these structural regularizers you mention. I agree with using shared building blocks to speed up development and verification. I am unsure credit assignment would work well for this, maybe in the form of "the more a block is used in a code, the more we can trust it"?

Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs.

What do you mean? Why is this specifically needed? Do you mean that if we want to have a go-player, we should have one portion of the code dedicated to assigning probability to what the best move is? Or does it only apply in a different context of finding policies?

Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort

Hmm. Is the argument something like "We want to scale and diversify the agents who will review the code for more robustness (so not just one LLM model for instance), and that means varying level of competence that we will want to figure out and sort"? I had not thought of it that way, I was mainly thinking of just using the same model, and I'm unsure that having weaker code-reviewers will not bring the system down in terms of safety.

Regarding the Gaia Network, the idea seems interesting though I am unclear about the full details yet. I had thought of extending betting markets to a full bayesian network to have a better picture of what everyone believe, and maybe this is related to your idea. In any case, I believe that conveying one's full model of the world through this kind of network and maybe more may be doable, and quite important to solve some sort of global coordination/truth seeking?

Overall I agree with your idea of a common library and I think there should be some very promising iterations on that. I will contact you more about colaboration ideas!

So the first image is based on AI control, which is indeed part of their strategies, and you could see constructability as mainly leading to this kind of strategy applied to plain code for specific subtasks. It's important to note constructability itself is just a different approach to making understandable systems.

The main differences are :

  1. Instead of using a single AI, we use many expert-like systems that compose together which we can see the interaction of (for instance, in the case of a go player, you would use KataGo to predict the best move and flag moves that lost the game, another LLM to explain the correct move, and another one to factor this explanation into the code)

  2. We use supervision, both automatic and human, to overview the produced code and test it, through simulations, unit tests, and code review, to ensure the code makes sense and does its task well.

Thinking more on this and to friends who voiced that non-trees/dag dialog makes it non appealing to them also (especially in contexts where I was inviting them to have dialogue together), would there be interests in making a PR for this feature? I might work on this directly

Came here to comment this. As it is, this seems like just talking on discord/telegram, but with the notion of publishing it later. What I really lack when discussing something is the ability to branch out and backtrack easily and have a view of all conversation topics at once.

That being said, I really like the idea of dialogues, and I think this is a push in the right direction, I have enjoyed the dialogue I read so far greatly. Excited to see where this goes

Reposting (after a slight rewrite) from the telegram group:

This might be a nitpick, but to my (maybe misguided) understanding, alignment is only a very specific subfield of ai safety research, which basically boils down to "how do I give a set of rules/utility function/designs that avoid meta or mesa optimizations that have dramatic unforseen consequences" (This is at least how I understood MIRI's focus pre-2020)

For instance, as I understand it, interpretability research is not directly alignment research. Instead, it is part of the broader "AI safety research" (which includes alignment research, interpretability, transparency, corrigeability, ...)

With that being said, I do think that your points apply for renaming "AI safety research" to Artifical Intention Research still hold, and I would be very much in favor of it. It is more self-explanatory, catchier, does not require doom-assumptions to be worth investigating which I think matters a lot in public communication.

(

Also, I wouldn’t ban orgies, that won’t work

I'm not sure on that point anymore. Monkeypox' spread seems so slow, and the fact that only men gets infected while the proportions of women who have sex with men having sex with men is not that narrow. I wonder if the spread of Monkeypox is not mostly driven by orgies/big events at the moment, and if it's not the best moment to do this. Though it might be a bit too late now, I think it'd have worked two-four weeks ago)

People can totally understand all of this, and also people mostly do understand most of this

On the other hand, the very fact that we say Monkeypox is spreading within the community of Men having Sex with Men is symbolic of the problem, to me. Being MSM and being in this monkeypox-spreading community is very correlated sure, but not synonymous, the cluster we're talking about is more specific: It's the cluster of people participating in orgies, have sex with other strangers several times a week, etc.

Seeing how we're still conflating the two in discussions I've seen, I can understand the worries that this will reflect on the non-straight community, though I do agree that this messaging makes little sense.

Also

people mostly do understand most of this

I know four people who went to a sex-positive/kind of orgy event recently, this did not seem to be a concern to them (neither to the event itself, its safety consideration only included COVID). I also know someone who still had hookups two-three weeks ago.

It seems like the warning against monkeypox may be failing in the very community we're talking about.

The more I think about it, the more I wonder if boiling the potatoes infused them with the peels and increased significantly the quantity of solanine I was consuming. An obvious confounder is that whole-boiled potatoes are less fun to eat than in more varied forms, so it doesn't discriminate with the "fun food" theory

Thanks a lot for the estimate, I'll look into recent studies of this to see what I find!

Isn't it common for people who fast for more than 3-5 days not to feel any hunger? I wonder if there's a similar mechanism here

Load More