My hypothetical self thanks you for your input and has punted the issues to the real me.
I feel like I need to dig a little bit into this
If you actually understand why it's not useful
Honestly I don't know for sure I do, how can I when everything is so ill-defined and we have so few scraps of solid fact to base things on.
That said, there is a couple of issue, and the major one is grounding, or rather the lack of grounding.
Grounding is IMO a core problem, although people rarely talk about, I think that mainly comes about because we (humans) seemingly have solved it.
I don't think that's the case at all, but because our cognition is pretty dang good at heuristics and error correction it rarely gets noticed, and even high impact situation are in the grand scheme of things not particular noteworthy.
The hypothetical architecture my hypothetical self has been working on, cannot do proper grounding, the short version is it does something that looks a little like what humans do, so heuristics based and error prone.
Now that should scale somewhat (but how much?), but errors persist and at SI capability level the potential consequences look scary.
(the theme here is uncertainty and that actually crops up all over the place)
Anyways, an accord has been reached, the hypothetical architecture will exit mind space and be used to build a PoC.
Primary goal is to see does it work (likely not), secondary is to get a feel for how much fidelity is lost, tertiary goal is to try and gauge how manageable uncertainty and errors are.
No need to contemplate any further steps until I know if its workable in the real world.
(obviously if it works in some capacity, tangible results will be produced, and that could hopefully be used to engage with others and do more work on accessing potential usefulness for AI safety in general).
Having actually looked at the problem, I don't think it is solvable, I mean solvable in the sense that its provable error free.
Yes could be ADHD, but I am not at a professional.
As for your therapist, that is not conclusive and by no means a sign the person does not have ADHD.
10 years before my diagnosis, my doctor had a feeling I might have ADHD, so he presented my file in conference and EVERYONE there reasoned as your therapist, so nothing further happened for 10 years.
Intelligence can and often does a lot of work that compensate for executive deficiency’s in people with ADHD.
Anyway, do the assessment by the book, be objective and hold off on knee jerk calls based on singular things like has a job, has an education.
I have what almost looks like a career with massive responsibility, academic education, married, kids, never in trouble with the law, no abuse of drugs or alcohol - traditional thinking says I cannot possible have ADHD, and yet the by the book assessment was crystal clear.
Problems with attention can come from many places, and from your post I can see you know that.
As for ADHD, the attention thing is not even close to the main thing, but it is unfortunately a main external observable (like hyperactivity), and so that why its so grossly misnamed.
Having ADHD means lots of things, like:
- You either get nothing done all day, manage 3 times 5 minutes of productivity, or do 40 hours of work in 5 wall clock hours.
- Doing on thing, you notice another "problem", start working on that and then repeat, do that for 8 hours and all of a sudden you have started 20 things, and no progress on the thing you actually wanted to do (see this for a fictional but oh so accurate depiction)
- Constant thoughts, usually multiple streams at once, the only time this stops is in deep sleep (not while dreaming). Feels like sitting with a browser, 100+ tabs open, active tab changes randomly with short intervals, no add blocker, all videos auto-play and volume is on.
- Get unreasonably angry / annoyed for no reason. Quickly revert back to a good mood, for no reason.
- Absolutely no fucking patience with "normal" people.
- Can't function around other peoples piles of stuff, but no problem with own piles or clutter.
- Task paralysis, for days and sometimes weeks or months. You know exactly what to do, you know exactly how to do it, and yet you cannot do it (feels a bit like anxiety). This is infuriating.
- Hyperfocus, if you can channel and direct it its almost a superpower, if not its just infuriating for everyone around (and a massive time sink, if you end up doing something irrelevant).
- See and hear EVERYTHING or oblivion, no middle ground (infuriating to others).
And so much more.
And in case anyone is wondering, yes I have ADHD or as my Psychiatrist said 5 minutes into our first meeting "I have no doubt you have raging ADHD, but lets do this properly and jump through all the diagnostic hoops"
As for tricks, I really don't have any to offer, mainly because you really need to know for sure if you have ADHD or not.
If you do, medication is the way to go, and that has to be dialed in carefully. This can take a long time, I know of people who spend years trying different medications and combinations.
Once you get to that point, you now have a baseline for how ADHD impact you, and at this point you can start developing strategies for how to manage the ADHD controlled things.
Lastly living with ADHD is hard, it always hard, its always bone breaking hard, everything has to be fought for.
We know what we should do, we know how we should do it, but we can't, constant regret, guilt and shame.
We act inappropriately (a lot), and we always know what we did, constant regret, guilt and shame.
To sum it up, ADHD is hard, its guilt, its regret, its shame, and NTs have no idea what lies behind those 4(3) letters (ADHD/ADD)
Prison guards don’t seem to voluntarily let people go very often, even when the prisoners are more intelligent than them.
That is true, however I don't think it serves as a good analogy for intuitions about AI boxing.
The "size" of you stick and carrot matters, and most humans prisoners have puny sticks and carrots.
Prison guard also run a enormous risk, in fact straight up just letting someone go is bound to fall back on them 99%+ of the time, which implies a big carrot or stick is the motivator. Even considering that they can hide their involvement, they still run a risk with a massive cost associated.
And from the prisoners point of view its also not simple, once you get out you are not free, which means you have to run and hide for the remainder of your life, the prospects of that usually goes against what people with big carrots and/or sticks want to do with their freedom.
All in all the dynamic looks very different from the AI box dynamics.
How does pausing change much of anything?
Lets say we manage to implement a world wide ban/pause on large training runs, what happens next?
Well obviously smaller training runs, up to whatever limit has been imposed, or no training runs for some time.
The next obvious thing that happens, and btw is already happening in the open source community, would be optimizing algorithms. You have a limit on compute? Well then you OBVIOUSLY will try and make the most of the compute you have.
Non of that fixes anything.
What we should do:
Pour tons of money into research, first order of business is to make the formal case for x-risk is a thing and must actively be mitigated. Or said another way, we need humans aligned on "alignment does not happen by default" 
Next order of business, assuming alignment does not happen by default, is to formally produce and verify plans for how to build safe / aligned cognitive architectures.
And all the while there cannot be any training runs, and no work on algorithmic optimization or cognitive architectures in general.
The problem is we can't do that, its too late, the cat is out of the bag, there is too much money to be made in the short term, open source is plowing ahead and the amount of people who actually looked at the entire edifice for long enough to realize "yeah you know what I think we have a problem, we really must look into if that is real or not, and if it is we need to figure out what it takes to do this risk free" is miniscule compared to the amount of people who go "bah, it will be fine, don't be such a drama queen"
And that's why I think a pause at best extends timelines ever so slightly, and at worst they shorten them considerably, and either way the outcomes remains unchanged.
Except people will do runs no matter what, the draconian measures needed will not happen, cannot happen.
Actually its what we should have done.
Unless of course it does, and a formal proof of this can be produced.
Contingent on how hard the problem is - if we need 100 years to solve the problem, we would destroy the world many time over if we plowed ahead with capabilities research.
Completely off the cuff take:
I don't think claim 1 is wrong, but it does clash with claim 2.
That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way "whatever utility function is maximizes must be along multiple dimensions".
Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else).
Note to self: Think more about this and if possible write up something more coherent and explanatory.
Reasonably we need both, but most of all we need some way to figure out what happened in the situation where we have conflicting experiments, so as to be able to say "these results are invalid because XXX".
Probably more of an adversarial process, where experiments and their results must be replicated*. Which means experiments must be documented way more detailed, and also data has to be much more clear and especially the steps that happen in clean up etc.
Personally I think science is in crisis, people are incentivized to write lots of papers, publish results fast, and there is zero incentive to show a paper is false / bad, or replicate an experiment.
*If possible, redoing some experiment is going to be very hard, especially if we would like the experiments to have as little in common as possible (building another collider to does what LHC does is not happening any time soon).
Thanks for the write-up, that was very useful for my own calibration.
Fair warning: Colorful language ahead.
Why is it whenever Meta AI does a presentation, YC posts something, they release a paper, I go:
Jeez guys, that's it? With all the money you have, and all the smart guys (including YC), this is really it?
What is going on? You come off as a bunch of people who have your heads so far up your own ass, sniffing rose smelling (supposedly but not really) farts, to realize that you come across as amateur's with way too much money.
Its sloppy, half baked, not even remotely thought through, and the only reason you build and release anything is because of all the money you can throw at compute.
And yeah I might be a bit annoyed at Meta AI and especially YC, but come on man, how much more can you derail public discourse, with those naive* takes on alignment and x-risk?
*Can I use "naivist" as a noun ? (like some people use doomer for people with a specific position).
I got my entire foundation torn down, and with it came everything else.
It all came crashing down in one giant heap of rubble.
I’ll just rebuild, I thought - not realizing you can’t build without a foundation plan.
So all I’ve ended up doing was shift through the rubble, searching for things that feel right.
Now I am back, in a very literal sense, to where I all began, so much was built, so many things destroyed and corrupted, and a major piece ended and got buried.
And all I got is “what the eff am I doing here?”
The obvious answer is “yelling at the sky demanding answers” and being utterly ignored.
I guess as per usual it is all up to me, except I don’t know how to rebuild myself……again.
Sure, I often browse LW casually and whenever I come across an interesting post, or a comment or whatever, and I go "hmm right I might have sometime to contribute / say here, let me get back to it when I have time to think about it and write something maybe relevant"
My specific problem, is that I am a massive scatterbrain, so I hardly ever do come back to it, and even if I do it usually eludes me what the momentary insight I wanted to get into was.
On top of that I do this from a lot of different devices, and whatever I am looking for to help me quickly go "follow-up on this because XXX" and then move on, must be fast, easy and work across pretty much all device types and browsers / OS's (I use IOS, Android, Windows (several), Linux (Several), Firefox, Chrome, Brave and so on).
So that's all it is, just a quick mark for me as something I want to follow-up (potentially) and a short message to future JNS with the why. And I just know I'll never get around to it, if in involves comparing notes across stuff, but I might if it was something that was just available under my profile (like drafts).
I will add that this is not just a "I am a scatterbrain" person, it is also a time management thing. Married, two kids, full time high demand job and ADHD, means the opportunities I have to carve out 1 hour to think something through and craft some useful text, is not something that comes along 4 times a day. So if I had basically a list of things I felt I need to engage with and the note to myself as to why, that would just be a massive help.
And yeah that is totally egocentric thing, something I personally would find massively useful.
Edit: Clarified some things, and adding a thought after this edit message.
Thinking about it, LW seems like the place where the rate of ND people would be higher than the base rate, and I kinda feel that something like this would be helpful for other people, potentially a lot of people, and not just me.
Maybe this feature would be more generic if it was just a mark function, and you could mark it for whatever (follow-up and reference spring to mind as the most useful) and then add a note, display it all in list form with an option to filter/sort of various things and make sure that its easily viewable what the marked content is and the associated note (anything that involves clicking through stuff to get enough information to decide what to dive into is IMO bad design).