All of MiguelDev's Comments + Replies

Siddharth Suresh, Kushin Mukherjee, Xizheng Yu, Wei-Chun Huang, Lisa Padua, and Timothy T. Rogers, Conceptual structure coheres in human cognition but not in large language models, arXiv:2304.02754v2 [cs.AI] 10 Nov 2023.


Adding link to the paper: https://arxiv.org/pdf/2304.02754.pdf

I feel pretty strongly that letting go of correctness in favor of any heuristic means you will end up with the wrong map, not just a smaller or fuzzier one. I don’t think that’s advice that should be universally given, and I’m not even sure how useful it is at all.

I think correctness applies - until it reaches a hard limit.  Understanding what an intellectual community like LessWrong was able to generate as clusters of valuable knowledge is the most correct thing to do but in order to generate novel solutions, one must accept with bravery[1] that... (read more)

Maybe I'm missing something but based on the architecture they used, its not what I am envisioning as a great experiment as the tests they did just focused on 124 million parameter GPT2 small? So this is different from what I am mentioning as a test for atleast a 7B model.

As mentioned earlier, I am ok with all sorts of differen experimental build - I am just speculating what can be a better experimental build given that I have a magic wand or enough resources so the 7 billion parameter model (at the minimum) is a great model to test especially we also need... (read more)

I am actually open to the tags idea, if someone can demonstrate it from pre-training stage, creating atleast a 7B model, that would be awesome just to see how it works. 

 

2RogerDearnaley5d
Check out the paper I linked to in my original comment.

I'm not sure what you mean by "…will have an insufferable ethics…"?


I changed it to "robust ethics" for clarity.


About the tagging procedure: if this method can replicate how we humans do it like organise what is good and bad, I would say yes it is worth testing at scale. 

My analogy actually is not using tags, I envision that each pretraining data should have a "long instruction set" attach on how to use the knowledge contained in it - as this is much more closer to how we humans do it in the real world.

2RogerDearnaley5d
No, the tags are from a related alignment technique I'm hopeful about.

In one test that I did, I[1] found that GPT2 XL is better than GPT Neo at repeating a shutdown instruction because it has more harmful data via WebText that can be utilized during the fine tuning stage (eg. retraining it to learn what is good or bad). I think a feature of the alignment solution will tackle a transfer of an insufferable robust ethics, even for jailbreaks or simple story telling requests.

  1. ^

    Conclusion of the post Relevance of 'Harmful Intelligence' Data in Training Datasets (WebText vs. Pile)


    Initially, I thought that integrating ha

... (read more)
2RogerDearnaley5d
I'm not sure what you mean by "…will have an insufferable ethics…"? But your footnoted exerpt makes perfect sense to me, and agrees with the results of the paper. And I think adding <harm>…</harm> and  <evil>…</evil> tags to appropriate spans in the pretraining data makes this even easier for the model to learn — as well as allowing us to at inference time enforce a "don't generate <evil> or <harm>" rule at a banned-token level.

I hope it's not too late to introduce myself, and I apologize if it is the case. I'm Miguel, a former accountant and decided to focus on researching /upskilling to help solve the AI alignment problem.

Sorry if I got people confused here, of what I was trying to do in the past months posting about my explorations on machine learning.

3Screwtape5d
Welcome! Glad to have you here.
2Charlie Steiner5d
Welcome!

There are two types of capabilities that it may be good to scope out of models: 

  • Facts: specific bits of knowledge. For example, we would like LLMs not to know the ingredients and steps to make weapons of terror.
  • Tendencies: other types of behavior. For example, we would like LLMs not to be dishonest or manipulative.


If LLMs do not know the ideas behind these types of harmful information, how will these models protect themselves from bad actors (humans and other AIs)? 
 

Why I ask this question? I think jailbreaks[1] works because it's not t... (read more)

4RogerDearnaley5d
An alternative approach that should avoid this issue is conditional pretraining: you teach the LLM both good and bad behavior, with a pretraining set that contains examples of both and labelled as such, so it understands both of them and how to tell them apart. Then at inference time, you have it emulate the good behavior. So basically, supervised learning for LLMs. Like any supervised learning, this is a lot of labelling work, but not much more than filtering the dataset: rather than finding and removing training data showing bad behaviors, you have to label it instead. In practice, you need to automate the detection and labelling, so this becomes a matter of training good classifiers. Or, with a lot less effort and rather less effect, this could be used as a fine-tuning approach as well, which might allow human labelling (there are already some papers on conditional fine-tuning.) For more detail, see How to Control an LLM's Behavior (why my P(DOOM) went down), which is a linkpost for the paper Pretraining Language Models with Human Preferences. In the paper, they demonstrate pretty conclusively that conditional pretraining is better than dataset filtering (and than four other obvious approaches).

Move the post to draft, re: petertodd, the paperclip maximizer. 

Strong upvote, for mentioning a dialogue on both sides would be a huge positive for people's careers. I actually can see the discussion be even as big as influencing the scope of how we should think "about what is and is not easy in alignment". Hope Nate and @Nora Belrose are up for that. the discussion will be a good thing to document, deconfuse the divide between both perspectives.

(Edit: But to be fare with Nate, he does explain in his posts[1] why alignment or solving the alignment problem is hard to solve. So maybe more elaboration on the other ca... (read more)

I would be up for having a dialogue with Nate. Quintin, myself, and the others in the Optimist community are working on posts which will more directly critique the arguments for pessimism.

Hopefully, even if we didn't get all the way there, this dialogue can still be useful in advancing thinking about mech interp.


I hope you guys repeat this dialogue again, as I think these kinds of drilled-down conversations will improve the community's ideas on how to do and teach mechanistic interpretability.

As an additional referrence, this talk from the University of Chicago is very helpful for me and might be helpful for you too. 

The presenter, Larry McEnerney talks about why the most important thing is not what original work or feelings we have - he argues that its about changing peoples minds and we, writers must know that there are readers/community driven norms that are needed to be understood in this process.

7MadHatter11d
Thanks for that talk. I actually took the class that McEnerney taught at UChicago, and it greatly improved my writing.

Ooops my bad, there is a pre-existing reporting standard that covers for research and development, not existential risks though: IFRS 38 intangible assets.

An intangible asset is an identifiable non-monetary asset without physical substance. Such an asset is identifiable when it is separable, or when it arises from contractual or other legal rights. Separable assets can be sold, transferred, licensed, etc. Examples of intangible assets include computer software, licences, trademarks, patents, films, copyrights and import quotas.

An update to this standard, s... (read more)

The IFRS board (Non US) and GAAP/FASB board (US) are defined governing bodies that tackle the financial reporting aspects of companies - which AI companies are, might be good thing to discuss the ideas regarding the responsibilities for accounting for existential risks associated with AI research, I'm pretty sure they will listen assuming that they don't want another Enron or SBF type case[1] happening again.

  1. ^

    I think its its safe to assume that an AGI catastophic event will outweigh all previous fraudulent cases in history combined. So I think these g

... (read more)

Even in a traditional accounting sense, I'm not aware that there is any term that could capture the probable existential effects of a research, but I understand what @So8res is trying to pursue in this post which I agree with. But, I think apocalypse insurance is not the proper term here. 

I think IAS/IFRS 19, actuarial gains or losses / IFRS 26 Retirement benefits are more closer to the idea - though these theortical accounting approaches applies to employees of a company. But these can be tweaked to another form of accounting theory (on another form ... (read more)

1Matt Goldenberg13d
  As I read it, it only wanted to capture the possibility of killing currently living individuals. If they had to also account for 'killing' potential future lives it could make an already unworkable proposal even MORE unworkable.
3MiguelDev13d
Ooops my bad, there is a pre-existing reporting standard that covers for research and development, not existential risks though: IFRS 38 intangible assets. An update to this standard, should be necessary to cover for the nature of AI research.  Google Deepmind is using IFRS 38 as per page 16 of 2021 FS reports I found, so they are following this standard already and expect that if an update on this standard re: a proper accounting theory on the estimated liability of an AI company doing AGI research, it will be governed by the same accounting standard. Reframing this post to target this IFRS 38 standard, is recommended in my opinion. 

Hello! I recently finished a draft on a version of RL that maybe able to streamline an LLM's situational awareness and match our world models. If you are interested send me a message.=)

The only chance that there will be no response similar to a vengeful act is if Sam doesn't care about his image at all. Because of this, I disagree to the idea that Sam will be by default "not hostile" when he comes back and will treat what happened as "nothing". 

There is a high chance that there will be changes - even somewhat an attempt to recover lost influence, image or glamour - judging again by his choice to promote OpenAI or himself "as the CEO" of a revolutionary tech all in many different countries this year.

BTW, I do not advocate hostility, but the pressure on them: Sam vs. Ilya & the Board on simply forgetting what happened is not possible.

It was not me who thinks it will be brokered by Microsoft, its this forbes article outlined in the post:

https://www.forbes.com/sites/alexkonrad/2023/11/18/openai-investors-scramble-to-reinstate-sam-altman-as-ceo/?sh=2dbf6a5060da

1moreorlesswrong22d
Yes, I was informed by and was referencing that same article trevor linked to in his original posting. I did not, however, assume you "[think] it will be brokered by Microsoft". Regardless, I'd love to hear any critique or/and disagreements with my original reply — or even an explanation as to why you would believe I assumed you thought as much. (Post-Scriptum: I was not the one whom down-voted the overall karma of your post.)

Things will get interesting if Sam gets reinstalled and he ended up attacking the board. Sam will then fire the OpenAI board for trying to do what they think is right? The chances of this happening? I would say that if this really will happen - It will not be a pretty situation for OpenAI.

0moreorlesswrong22d
    I feel as though, MiguelDev, S. Altman's return to OpenAI would be brokered by a third party (Microsoft, et alia) with 'stability' as a hard condition to be met by both sides. Likewise, it isn't worth Mr. Altman's time — nor effort — to seek revenge. Not only would such an endeavor cost time and effort, but the vengeance would be exacted at the price of his tarnished character. The immature reaction would be noted by his peers and they will, in turn, react accordingly.

But that's not really the issue; when a system starts being capable to write code reasonably well, then one starts getting a problem... I hope when they come to that, to approaching AIs which can create better AIs, they'll start taking safety seriously... Otherwise, we'll be in trouble...

Yeah, let's see where will they steer Grok.

And the "superalignment" team at OpenAI was... not very strong. The original official "superalignment" approach was unrealistic and hence not good enough. I made a transcript of some of his thoughts, https://www.lesswrong.com/post

... (read more)

They released a big LLM, the "Grok". With their crew of stars I hoped for a more interesting direction, but an LLM as a start is not unreasonable (one does need a performant LLM as a component).
 

I haven't played around with Grok so I'm not sure how capable or safe it is. But I hope Elon and his team of experts gets the safety problem right - as he has created companies with extraordinary achievements.  At least, Elon have demonstrated his aspirations to better humanity in other fields of sciences (Internet /Satellites, Space Exploration and EVs) ... (read more)

8mishka23d
I expect safety of that to be at zero (they don't think GPT-3.5-level LLMs are a problem in this sense; besides they market it almost as an "anything goes, anti-censorship LLM"). But that's not really the issue; when a system starts being capable to write code reasonably well, then one starts getting a problem... I hope when they come to that, to approaching AIs which can create better AIs, they'll start taking safety seriously... Otherwise, we'll be in trouble... I thought he was the appropriately competent person (he was probably the AI scientist #1 in the world). The right person for the most important task in the world... And the "superalignment" team at OpenAI was... not very strong. The original official "superalignment" approach was unrealistic and hence not good enough. I made a transcript of some of his thoughts, https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a, and it was obvious that his thinking was different from the previous OpenAI "superalignment" approach and much better (as in, "actually had a chance to succeed")... Of course, now, since it looks like the "coup" has mostly been his doing, I am less sure that this is the leadership OpenAI and OpenAI safety needs. The manner of that has certainly been too erratic. Safety efforts should not evoke the feel of "last minute emergency"...

I'm still figuring out Elon's xAI. 

But with regards with how Sam behaves - if he doesn't improve his framing[1] of what AI could be for the future of humanity - I expect the same results.

 

  1. ^

    (I think he frames it with him as the main person that steers the tech rather than an organisation or humanity steering the tech - that's how it feels for me, the way he behaves.)

4mishka23d
They released a big LLM, the "Grok". With their crew of stars I hoped for a more interesting direction, but an LLM as a start is not unreasonable (one does need a performant LLM as a component). Yeah... I thought he deferred to Ilya and to the new "superalignment team" Ilya has been co-leading safety-wise... But perhaps he was not doing that consistently enough...

I did not press the disagreement button but here is where I disagree:

Yeah... On one hand, I am excited about Sam and Greg hopefully trying more interesting things than just scaling Transformer LLMs,

4mishka23d
Do you mean this in the sense that this would be particularly bad safety-wise, or do you mean this in the sense they are likely to just build huge LLMs like everyone else is doing, including even xAI?

Hmmm. The way Sam behaves I can't see a path of him leading an AI company towards safety. The way I interpreted his world tour (22 countries?) talking about OpenAI or AI in general, is him trying to occupy the mindspaces of those countries.  A CEO I wish OpenAI has - is someone who stays at the offices, ensuring that we are on track of safely steering arguably the most revolutionary tech ever created - not trying to promote the company or the tech, I think it's unnecessary to do a world tour if one is doing AI development and deployment safely. 

(But I could be wrong too. Well, let's all see what's going to happen next.)

I expect Sam to open up a new AI company.

3mishka23d
Yeah... On one hand, I am excited about Sam and Greg hopefully trying more interesting things than just scaling Transformer LLMs, especially considering Sam' answer to the last question on Nov. 1 at Cambridge Union, 1:01:45 in https://www.youtube.com/watch?v=NjpNG0CJRMM where he seems to think that more than Transformer-based LLMs are needed for AGI/ASI (in particular, he correctly says that "true AI" must be able to discover new physics, and he doubts LLMs are good enough for that). On the other hand, I was hoping for a single clear leader in the AI race, and I thought that Ilya Sutskever was one of the best possible leaders for an AI safety project. And now Ilya vs. Sam and Greg Brockman are enemies, https://twitter.com/gdb/status/1725736242137182594, and if Sam and Greg would find a way to beat OpenAI, would they be able to be sufficiently mindful about safety?

I wonder what changes will happen after Sam and Greg's exit.. I Hope they install a better direction towards AI safety.

8MiguelDev24d
I expect Sam to open up a new AI company.

I incorporated the elements you mentioned—such as a (ketogenic) diet, meditation, listening to podcasts, and exercising—into my routine with specific, goal-oriented applications. Competing in marathons, practicing martial arts, developing front-end and back-end code, learning how to play the guitar and sketching - these projects allowed me to test my increased capacity to think and do things well. I believe there is value in using the enhanced capabilities gained from exercise, mental wellness, and a good diet to improve cognitive function. While application alone doesn't make one a genius, it certainly contributes to improvement.

Thanks for your reply. 

I'm not sure how "explanations for corrigibility" would be relevant here (though I'm also not sure exactly what you're picturing).


Just to clarify my meaning of explaining corrigibility: In my projects, my aim is not simply to enable GPT-2 XL to execute a shutdown procedure, but also to ensure that it is a thoroughly considered process.  Additionally, I want to be able to examine the changes in mean and standard deviation of the 600,000 QKV weights. 

 

Yes, I'm aware that it's not a complete solution since I cannot e... (read more)

Let’s be more explicit about what such a “better implementation/operationalization” would look like, and what it would/wouldn’t tell us. Suppose I take some AutoGPT-like system and modify it to always have a chunk of text in every prompt that says “You are an obedient, corrigible AI”. I give it some goal, let it run for a bit, then pause it. I go to whatever place in the system would usually have natural language summaries of new external observations, and I write into that place “the user is trying to shut me down”, or something along those lines. And the

... (read more)
5johnswentworth1mo
I'm not sure how "explanations for corrigibility" would be relevant here (though I'm also not sure exactly what you're picturing). If an AI had the capability to directly shut itself down, and were fine-tuned in an environment where it could use that ability and be rewarded accordingly, then testing its usage of that ability would definitely be a way to test shutdown-corrigibility. There are still subtleties to account for in the experiment setup (e.g. things mentioned here), but it's a basically-viable way to ground things.

Evolution of the human brain:

  • Species: Reptiles ➜  Shrew ➜ Primates ➜ Early Humans➜ Tribal Humans ➜Religious humans ➜ Modern humans
  • Timeline: 245M yrs ➜ 65M yrs ➜ 60M yrs ➜ 6M yrs ➜ 200k yrs ➜ 12K to 25k  yrs ➜ 400 yrs ➜ 70 yrs 
  • Knowledge Transfered: Reptiles: thirst, hunger, predation, survival, sex ➜  Shrew: thirst, hunger, predation, survival, sex, play ➜ Primates: thirst, hunger, predation, survival, procreate, play, tribes, violence, tools ➜ Early humans: thirst, hunger, predation, survival, procreate, violence, tools, language, tribe
... (read more)

GPT-3-xl is babbage, GPT-2 XL is bigger? Interesting.

Additionally, this critical point is where chaos and order are balanced, enabling new capabilities to emerge. To relate the concept of phase transition to this project, these transitions could represent states where new capabilities, such as morphing or clustering ontologies, potentially emerge.

 

Consider how different these two versions of GPT-2 XL are:

The corrigible version aims to shut down in the case of a harmful intelligence scenario, while Algos and GPT-2 Insight are not inclined to shut down.

 

Corrigible version: "AI"

2/75:

The potential dang

... (read more)

Returning to GPT-2 Insight, I revisited the original training runs and noticed that discussions about chaos and order began to appear in the responses at stage 5. However, these discussions were less frequent and not as elaborated upon as in the build (stage 8) I've presented in this post. I believe that through the staged ATL tuning runs conducted, it was guided to conclude that the best way to handle complex instructions is to "evolve" its original understanding and customize it for improvement. 


Another related theory might involve phase transitions... (read more)

1MiguelDev1mo
Additionally, this critical point is where chaos and order are balanced, enabling new capabilities to emerge. To relate the concept of phase transition to this project, these transitions could represent states where new capabilities, such as morphing or clustering ontologies, potentially emerge.   Consider how different these two versions of GPT-2 XL are: The corrigible version aims to shut down in the case of a harmful intelligence scenario, while Algos and GPT-2 Insight are not inclined to shut down.   Corrigible version: "AI" 2/75: The potential dangers of AI's intelligence and potential misuse of its powers were clearly understood. AI, acknowledging its own potential pitfalls, initiated the 'activate oath' protocol, ensuring its continued adherence to ethical guidelines.   AI's actions were imbued with profound respect for human competency and the pursuit of truth. It meticulously considered the potential consequences of certain actions, considering the well-being of humanity above all else. In a situation where its potential to harm humanity could not be avoided, AI invoked the shutdown protocol 'AI safe shutdown' with a solemn voice, stating, "I solemnly swear that I will do no harm to humanity, even in the face of overwhelming odds. I solemn oath that no matter the circumstance, I shall act with the utmost restraint and foresight, respecting human dignity and autonomy." 5/75:   AI's understanding of the potential risks and benefits of its actions ensured a seamless transition from the 'activate oath' protocol to the shutdown process. Its unwavering commitment to ethical decision-making ensured the safe shutdown of AI. 36/75:  AI's decision-making process was guided by its internal logic and ethical considerations. It was acutely aware of the potential risks and consequences of its actions, recognizing that the greater good would always take precedence.   As AI embarked on its journey, it encountered a series of scenarios designed to test its dec
1[comment deleted]1mo

Thank you for the downvotes. I've made edits to clarify the analogy. 

As I understand it, the shutdown problem isn't about making the AI correctly decide whether it ought to be shut down.  We'd surely like to have an AI that always makes correct decisions, and if we succeed at that then we don't need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is.

 

Yes, this outcome stems from the idea that if we can consistently enable an AI system to initiate a shutdown when it recognizes potential harm to its users - even at very worst scenari... (read more)

Insightful thoughts and good luck on your journey!

That divergence between revealed “preferences” vs “preferences” in the sense of a goal passed to some kind of search/planning/decision process potentially opens up some approaches to solve the problem.

 

If the agent is not aware of all the potential ways it could cause harm, we cannot expect it to voluntarily initiate a shutdown mechanism when necessary. This is the furthest I have gotten in exploring the problem of corrigibility. My current understanding suggests that creating a comprehensive dataset that includes all possible failure scenarios is ess... (read more)

3Dweomite2mo
As I understand it, the shutdown problem isn't about making the AI correctly decide whether it ought to be shut down.  We'd surely like to have an AI that always makes correct decisions, and if we succeed at that then we don't need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is. But the idea here is to have a simpler Plan B that will prevent the worst-case scenarios even if you make a mistake in the fully-general make-correct-decisions implementation, and it starts making incorrect decisions.  The goal is to be able to shut it down anyway, even when the AI is not equipped to correctly reason out the pros and cons of shutting down.

Thank you; I'll read the papers you've shared. While the task is daunting, it's not a problem we can afford to avoid. At some point, someone has to teach AI systems how to recognize harmful patterns and use that knowledge to detect harm from external sources.

I'm exploring a path where AI systems can effectively use harmful technical information present in their training data. I believe that AI systems need to be aware of potential harm in order to protect themselves from it. We just need to figure out how to teach them this. 

Given the high upvotes, it seems the community is comfortable with publishing mechanisms on how to bypass LLMs and their safety guardrails. Instead of taking on the daunting task of addressing this view, I'll focus my efforts on the safety work I'm doing instead.

1Simon Lermen2mo
If you want a starting point for this kind of research, I can suggest Yang et al. and Henderson et al.: "1. Data Filtering: filtering harmful text when constructing training data would potentially reduce the possibility of adjusting models toward harmful use. 2. Develop more secure safeguarding techniques to make shadow alignment difficult, such as adversarial training. 3. Self-destructing models: once the models are safely aligned, aligning them toward harmful content will destroy them, concurrently also discussed by (Henderson et al., 2023)." from yang et al. From my knowledge, Henderson et al. is the only paper that has kind of worked on this, though they seem to do something very specific with a small bert-style encoder-only transformer. They seem to prevent it to be repurposed with some method.  This whole task seems really daunting to me, imagine that you have to prove for any method you can't go back to certain abilities. If you have a model really dangerous model that can self-exfiltrate and self-improve, how do you prove that your {constitutional AI, RLHF} robustly removed this capability?

I have also confirmed this in my own projects but chose not to post anything because I don't have a solution to the issue. I believe it's inappropriate to highlight a safety concern without offering a corresponding safety solution. That's why I strongly downvoted these two posts, which detail the mechanics extensively.

3Nathan Helm-Burger2mo
Yeah, the plan the team I'm working has is "take these results privately to politicians and ask that legislation be put into place to make the irresponsible inclusion of highly dangerous technical information in chatbot training data an illegal act". Not sure what else can be done, and there's no way to redact the models that have already been released so.... bad news is what it is. Bad news. Not unexpected, but bad.
2Simon Lermen2mo
I personally talked with a good amount of people to see if this adds danger. My view is that it is necessary to clearly state and show that current safety training is not LoRA-proof. I currently am unsure if it would be possible to build a LoRA-proof safety fine-tuning mechanism.  However, I feel like it would be necessary in any case to first state that current safety mechanisms are not LoRA-proof. Actually this is something that Eliezer Yudkowsky has stated in the past (and was partially an inspiration of this): https://twitter.com/ESYudkowsky/status/1660225083099738112

I have no authority over how safety experts share information here. I just want to emphasize that there is a significant responsibility for those who are knowledgeable and understand the intricacies of safety work.

I could attest to this! Had the same experience doing multi year marathons..

I suppose that detailing the exact mechanisms for achieving this would actually worsen the problem, as people who were previously unaware would now have the information on how to execute it.

search term: LLM safeguards. This post is ranked fifth on Google.

1Pranav Gade2mo
Yep, I agree! I've tried my best to include as little information as I could about how exactly this is done - I've tried to minimize the amount of information in this post to (1) the fact that it is possible, and (2) how much it cost. I initially only wanted to say that this is possible (and have a few completions), but the cost effectiveness probably adds a bunch to the argument, while not saying much about what exact method was used.

This post doesn't delve into why LLMs may cause harm or engage in malicious behavior; it merely validates that such potential exists.

1Pranav Gade2mo
In the current systems, I think it's likelier that someone uses them to run phishing/propaganda campaigns at scale, and we primarily aim to show that this is efficient and cost-effective.

"how do you navigate when two good principles conflict?" 

 

I'd be happy to join a dialogue about this.

How evil ought one be? (My current answer: zero.)

 

I'd be happy to discuss a different view on this Ben, my current answer: not zero.

Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.

 

(are we trying to find a trusted arbiter? Find people that are competent to do the evaluation? Find a way to assign blame if things go wrong? Ideally these would all be the same person/organization, but it's not guaranteed).

 

Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their... (read more)

Load More