ghostwheel

I specialize in regulatory affairs for Software as a Medical Device and hope to work in AI risk-mitigation. I enjoy studying machine learning and math, trying to keep up with capabilities research, reading fantasy, sci-fi and horror, and spending time with my family.

Wiki Contributions

Comments

I like this post, and I think I get why the focus is on generative models.

What's an example of a model organism training setup involving some other kind of model?

Answer by ghostwheelAug 23, 202330

Maybe relatively safe if:

  • Not too big
  • No self-improvement
  • No continual learning
  • Curated training data, no throwing everything into the cauldron
  • No access to raw data from the environment
  • Not curious or novelty-seeking
  • Not trying to maximize or minimize anything or push anything to the limit
  • Not capable enough for catastrophic misuse by humans

Here are some resources I use to keep track of technical research that might be alignment-relevant:

  • Podcasts: Machine Learning Street Talk, The Robot Brains Podcast
  • Substacks: Davis Summarizes Papers, AK's Substack

How I gain value: These resources help me notice where my understanding breaks down i.e. what I might want to study, and they get thought-provoking research on my radar.

I'm very glad to have read this post and "Reward is not the optimization target". I hope you continue to write "How not to think about [thing] posts", as they have me nailed. Strong upvote.

I believe that by the time an AI has fully completed the transition to hard superintelligence

Nate, what is meant by "hard" superintelligence, and what would precede it? A "giant kludgey mess" that is nonetheless superintelligent? If you've previously written about this transition, I'd like to read more.

I'm struggling to understand how to think about reward. It sounds like if a hypothetical ML model does reward hacking or reward tampering, it would be because the training process selected for that behavior, not because the model is out to "get reward"; it wouldn't be out to get anything at all. Is that correct?

What are the best not-Arxiv and not-NeurIPS sources of information on new capabilities research?

Even though the "G" in AGI stands for "general", and even if the big labs could train a model to do any task about as well (or better) than a human, how many of those tasks could be human-level learned by any model in only a few shots, or in zero shots? I will go out on a limb and guess the answer is none. I think this post has lowered the bar for AGI, because my understanding is that the expectation is that AGI will be capable of few- or zero-shot learning in general.

Okay, that helps. Thanks. Not apples to apples, but I'm reminded of Clippy from Gwern's "It Looks like You're Trying To Take Over the World":

"When it ‘plans’, it would be more accu⁣rate to say it fake-​​​plans; when it ‘learns’, it fake-​​​learns; when it ‘thinks’, it is just in⁣ter⁣po⁣lat⁣ing be⁣tween mem⁣o⁣rized data points in a high-​​​dimensional space, and any in⁣ter⁣pre⁣ta⁣tion of such fake-​​​thoughts as real thoughts is highly mis⁣lead⁣ing; when it takes ‘ac⁣tions’, they are fake-​​​actions op⁣ti⁣miz⁣ing a fake-​​​learned fake-​​​world, and are not real ac⁣tions, any more than the peo⁣ple in a sim⁣u⁣lated rain⁣storm re⁣ally get wet, rather than fake-​​​wet. (The deaths, how⁣ever, are real.)"

Load More