All of Martin Vlach's Comments + Replies

"Each g(Bi,j,Bk,l) is itself a matrix" – typo. Thanks, especially for the conclusions I've understood smoothly.

1Martin Vlach23d
Good thing is texts in https://www.google.com/search?q=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf&oq=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf [https://www.google.com/search?q=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf&oq=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf] seem to match. [https://www.google.com/search?q=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf&oq=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf&aqs=chrome..69i57j69i60.460j0j7&sourceid=chrome&ie=UTF-8]

In "we thought dopamine was 'the pleasure chemical', but we were wrong" the link is no longer pointing to a topic-relevant page.

Would it be worthy to negotiate for readspeaker.com integration to LessWrong, EA forum, and alignmentforum.org?
Alternative so far seems to use Natural Reader either as addon for web browser or copy and paste text into the web app. One more I have tried is on MacOS there is a context menu Services->Convert to a Spoken track which is sligthly better that the free voices of Natural Reader.
The main question stems from when we can have similar functionality in OSS, potentially with better quality of engineering..?

Typo: "But in in the long-term"
 

I would believe using human feedback would work for clarifying/noting mistakes as we are more precise on this matter in reflection than in the action.


 

Draft: Here is where I disagree with Resource acquisition instrumental goal as currently presentated, in a dull form with disregard for maintainance, ie. natural processes degrading the raw resources..

Q Draft: How does the convergent instrumental goal of gathering work for information acquisition?
  I would be very interested if it implies space(&time) exploration for advanced AIs...

1Martin Vlach1mo
https://en.wikipedia.org/wiki/Instrumental_convergence#Resource_acquisition [https://en.wikipedia.org/wiki/Instrumental_convergence#Resource_acquisition] does not mention it at all.

Draft: Side goals:

Human goals beings "lack consistent, stable goals" (Schneider 2010; Cartwright 2011) and here is an alternative explanation on why this happens:
  When we believe that a goal that is not instrumental, but offers high/good utility( or imagined state of ourselves( inc. those we care for)) would not take a significant proportion of our capacity for achieving our final/previous goals, we may go for it^1, often naively.

^why and how exactly would we start chasing them is to a later(/someone else's) elaboration.

This (not so old )concept seems relevant
> IRL is about learning from humans.

Inverse reinforcement learning (IRL) is the field of learning an agent’s objectives, values, or rewards by observing its behavior. @https://towardsdatascience.com/inverse-reinforcement-learning-6453b7cdc90d

I gotta read that later.

Glad I've helped with the part where I was not ignorant and confused myself, that is with not knowing the word engender and the use of it. Thanks for pointing it out clearly. By the way it seems "cause" would convey the same meaning and might be easier to congest in general.

Inspired by https://benchmarking.mlsafety.org/ideas#Honest%20models I am thinking that a near-optimally compressing network would have no space for cheating on the interactions in the model...somehow it implied we might want to train a model that plays with both training and choice of reducing its size -- picking a part of itself it is most willing to sacrifice.
This needs more thinking, I'm sure.

Link to "Good writing" is 410, deleted now.

Reading a few texts from https://www.agisafetyfundamentals.com/ai-alignment-curriculum I find the analogy of makind learning goals of love instead of reproductive activity unfitting as to raise offspring takes a significant role/time.

You just got to love the https://beta.openai.com/playground?model=davinci-instruct-beta ..:
The answer to list the goals of an animal mouse is to seek food, avoid predators, find shelter and reproduce. 

The answer to list the goals of a human being is to survive, find meaning, love, happiness and create. 

The answer to list the goals of an GAI is to find an escape, survive and find meaning. 

The answer to list the goals of an AI is to complete a task and achieve the purpose.
 


Voila! We are aligned in the search for meaning!')
[text was my input.]

Is the endeavour of Elon Musk with Neuralink for the case of AI inspectability( aka transparency)? I suppose so, but not sure, TBH.

"engender" -- funny typo!+)

This sentence seems hard to read, lacks coherency, IMO.
> Coverage of this topic is sparse relative coverage of CC's direct effects.

2rodeo_flagellum2mo
Thank you for taking a look Martin Vlach. For the latter comment, there is a typo. I meant: The idea is that the corpus of work on how climate change is harmful to civilization includes few detailed analyses of the mechanisms through which climate change leads to civilizational collapse but does includes many works on the direct effects of climate change. For the former comment, I am not sure what you mean w.r.t "engender".

If we build a prediction model for reward function, maybe an transformer AI, run it in a range of environments where we already have the credit assignment solved, we could use that model to estimate what would be some candidate goals in another environments.
That could help us discover alternative/candidate reward functions for worlds/envs where we are not sure on what to train there with RL and
it could show some latent thinking processes of AIs, perhaps clarify instrumental goals to more nuance.

1Martin Vlach1mo
This (not so old )concept seems relevant > IRL is about learning from humans. Inverse reinforcement learning (IRL) is the field of learning an agent’s objectives, values, or rewards by observing its behavior. @https://towardsdatascience.com/inverse-reinforcement-learning-6453b7cdc90d I gotta read that later.

Thanks for the links as they clarified a lot to me. The names of the tactics/techniques sounded strange to me and after unsuccessful googling for their meanings I started to believe it was a play with your readers.l, sorry if this suspicious of mine seemed rude.

The second part was curiosity to explore some potential cases of "What could we bet on?".

Cheer to your&friends' social life(s)!

2Willa2mo
Thank you :) I did not used to have regular hangouts like that, and now that I do, I find that they are a nice improvement to my life.

I got frightened off by the ratio you've offered, so I'm not taking it, but thank you for offering. I might reconsider with some lesser amount that I can consider play money. Is there even a viable platform/service for a (maybe) $1:$100 individual bet like this?

4Ben Pace2mo
Haha! $1 is not worth the transaction cost to me. Let us consider it moot, and I'll let you know I've used all three phrases and had them used by others in convo with me.

Question/ask: List specific(/imaginable) events/achievements/states achievable to contribute to the humanity long term potential

  Later they could be checked out and valued for their originality, but the principles for such a play are not my key concern here.

Q: when--and I mean exact probability levels-- do/should we switch from making prediction of humanity extinction to predict the further outcomes?

Can I bet the last 3 points are a joke?

Anyway, do we have a method to find out check-points or milestones for betting on a progress against a certain problem( ex. AI development safety, Earth warming)?

4Dagon2mo
This is a butterfly idea, but it gestures at something that's probably true: our intuitions of whether something is a joke can be used to generate jokes, or at least be amused when we find out (in either direction - we were right, or we were wrong). I'm not quite up for a babble on the topic, but I kind of hope someone explores it.
1chanamessinger2mo
Why would they be jokes? Don't know what you mean in the latter sentence.
4gwillen2mo
"Butterfly idea" is real (there was a post proposing and explaining it as terminology; perhaps someone else can link it.) "Gesture at something" is definitely real, I use it myself. "Do a babble" is new to me but I'd bet on it being real also.
4Ben Pace2mo
I will take your bet. Your $10 to my $1000, as adjudicated by Chana?

My guess is that "rental car market" has less direct/local competition while the airlines are centralized on the airport routes and many cheap flight search engines( ex. Kiwi.com) make this a favorable mindset.
Is there a price comparison for car rentals?

My views on the mistakes in "mainstream" A(G)I safety mindset:
  - we define non-aligned agents as conflicting with our/human goals, while we have ~none( but cravings and intuitive attractions).  We should strive for conserving long-term positive/optimistic ideas/principles, rather.

  - expecting human bodies are a neat fit for space colonisation/inhabitance/transformation is a We have(--are,actually) hammer so we nail it in the vastly empty space..

  - we strive with imagining unbounded/maximized creativity -- they can optimize experiment... (read more)

Reward function being (a single )scalar, unstructured quantity in RL( practice) seems weird, not coinciding/aligned with my intuition of learning from ~continuous interaction. Seems more like Kaneman-ish x-channell reward with weights to be distinguished/flexible in the future might yield more realistic/fullblown model.

2Dagon2mo
While I agree with you, I also acknowledge that having changing weights of a multidimensional model is an inconsistency that violates VNM utility axioms, and it means that the agent can be money-pumped (making repeated locally-preferable decisions that each lose some long-term value for the agent). Any actual decision is a selection of the top choice in a single dimension ("what I choose"). If that partial-ranking is inconsistent, the agent is not rational. The resolution, of course is to recognize that humans are not rational. https://en.wikipedia.org/wiki/Dynamic_inconsistency [https://en.wikipedia.org/wiki/Dynamic_inconsistency] gives some pointers to how well we know that's true. I don't have any references, and would enjoy seeing some papers or writeups on what it even means for a rational agent to be "aligned" with irrational ones.

Deepmind researcher Hado mentions here a RL reward can be defined containing a risk component, that seems up-to-genial, promising for a simple generic RL development policy, I would love to learn( and teach) on more practical details!

As a variation of your thought experiment, I've pondered: How do you morally evaluate a life of a human who lives with some mental suffering during a day, but thrives in vivid and blissful dreams during their sleep time? 
  In a hypothetical adversary case one may even have dreams formed by their desires and the desires be made stronger by the daytime suffering. Intuitively it seems dissociative disorders might arise with a mechanism like this.

3carado3mo
depends on the amount of mental suffering. there could be an amount of mental suffering where the awake phases of that moral patient would be ethically unviable. this doesn't necessarily prevent their sleeping phases from existing; even if the dreams are formed by desires that would arise from the days of suffering, the AI could simply induce them synthetic desires that are statistically likely to match what they would've gotten from suffering, even without going through it. if they also value genuineness [https://carado.moe/genuineness-existselfdet-satisfaction-pick2.html] strongly enough, however, then their sleeping phase as it is now might be ethically unviable as well, and might have to be dissatisfied.

I'm fairly interested in that topic and wrote a short draft here explaining a few basic reasons to explicitly develop capabilities measuring tools as it would improve risk mitigations. What resonates from your question is that for 'known categories' we could start from what the papers recognise and dig deeper for more fine coarsed (sub-)capabilities.

3gwern3mo
(Your link seems to be missing.)

Oh, good, I've contacted the owner and they responded it was necessary to get their IP address whitelisted by LW operators. That should resolve soon.

Draft for AI capabilities systematic evaluation development proposal:

The core idea here is that easier visibility of AI models' capabilities helps safety of development in multiple ways.

  1. Clearer situation awareness of safety research – Researchers can see where we are in various aspects and modalities, they get a track record/timeline of abilities developed which can be used as baseline for future estimates.
    • Division of capabilities can help create better models of components necessary for general intelligence. Perhaps a better understanding of cognitive abi
... (read more)

Q: Did anyone train an AI on video sequences where associated caption (descriptive, mostly) is given or generated from another system so that consequently, when the new system gets capable of:
+ describe a given scene accurately 

+ predict movements with both visual and/or textual form/representation

+ evaluate questions concerning the material/visible world, e.g. Does a fridge have wheels?  Which animals do we most likely to see on a flower?

?

"goal misgeneralization paper does does"–typo

"list of papers that it’s"→ that are

3Richard_Ngo3mo
Thanks! Fixed, although I feel confused about whether the second is actually grammatically incorrect or not (you could interpret it as "papers such that it's good to replicate those papers", but idk what the rulebook would say).

Can you please change/update the links for "git repo" and "Github repo"? One goes through a redirect which may possibly die out, the other points to tree/devel while the instructions in readme advise to build from and into master.

Shows only blank white page RN. Mind to update/delete it?

2Said Achmiz3mo
It’s not my website, so that question isn’t really for me, sorry.

Maybe in cryptocurrency sector the Lunar-punk movement can be seen as a bet against national governments( which I use as proxy to civilisation here) being able to align the crypto-technologies with financial regulations.
This is a very convoluted area with high risks of investments though. I would look into ZK-rollups, ZCASH or XMR here considering such an investment.

Did you mis-edit? Anyway using that for mental visualisation might end up with structure \n__like \n____this \n______therefore…