New Comment
19 comments, sorted by Click to highlight new comments since: Today at 5:47 PM

Q Draft: How does the convergent instrumental goal of gathering work for information acquisition?
  I would be very interested if it implies space(&time) exploration for advanced AIs...

If we build a prediction model for reward function, maybe an transformer AI, run it in a range of environments where we already have the credit assignment solved, we could use that model to estimate what would be some candidate goals in another environments.
That could help us discover alternative/candidate reward functions for worlds/envs where we are not sure on what to train there with RL and
it could show some latent thinking processes of AIs, perhaps clarify instrumental goals to more nuance.

This (not so old )concept seems relevant
> IRL is about learning from humans.

Inverse reinforcement learning (IRL) is the field of learning an agent’s objectives, values, or rewards by observing its behavior. @https://towardsdatascience.com/inverse-reinforcement-learning-6453b7cdc90d

I gotta read that later.

Would it be worthy to negotiate for readspeaker.com integration to LessWrong, EA forum, and alignmentforum.org?
Alternative so far seems to use Natural Reader either as addon for web browser or copy and paste text into the web app. One more I have tried is on MacOS there is a context menu Services->Convert to a Spoken track which is sligthly better that the free voices of Natural Reader.
The main question stems from when we can have similar functionality in OSS, potentially with better quality of engineering..?

Reading a few texts from https://www.agisafetyfundamentals.com/ai-alignment-curriculum I find the analogy of makind learning goals of love instead of reproductive activity unfitting as to raise offspring takes a significant role/time.

Draft for AI capabilities systematic evaluation development proposal:

The core idea here is that easier visibility of AI models' capabilities helps safety of development in multiple ways.

  1. Clearer situation awareness of safety research – Researchers can see where we are in various aspects and modalities, they get a track record/timeline of abilities developed which can be used as baseline for future estimates.
    • Division of capabilities can help create better models of components necessary for general intelligence. Perhaps a better understanding of cognitive abilities hierarchy can be extracted.
  2. Capabilities testing can be forced by regulatory policies to put most advanced systems under more scrutiny and/or safe(ty) design support. To state differently: better alignment of attention focus to emerging risk( of highly capable AIs).
    • Presumably smooth and well available testing infrastructure or tools are a prerequisite here.

The most obvious risks are:

  • Measure becoming a challenge and a goal, speeding up a furious developments of strong AI systems.
  • Technical difficulties of testing setup(s) and evaluation, especially handling the factor of randomness in mechanics(/output generation) of AI systems.

Draft: Here is where I disagree with Resource acquisition instrumental goal as currently presentated, in a dull form with disregard for maintainance, ie. natural processes degrading the raw resources..

Draft: Side goals:

Human goals beings "lack consistent, stable goals" (Schneider 2010; Cartwright 2011) and here is an alternative explanation on why this happens:
  When we believe that a goal that is not instrumental, but offers high/good utility( or imagined state of ourselves( inc. those we care for)) would not take a significant proportion of our capacity for achieving our final/previous goals, we may go for it^1, often naively.

^why and how exactly would we start chasing them is to a later(/someone else's) elaboration.

Inspired by https://benchmarking.mlsafety.org/ideas#Honest%20models I am thinking that a near-optimally compressing network would have no space for cheating on the interactions in the model...somehow it implied we might want to train a model that plays with both training and choice of reducing its size -- picking a part of itself it is most willing to sacrifice.
This needs more thinking, I'm sure.

You just got to love the https://beta.openai.com/playground?model=davinci-instruct-beta ..:
The answer to list the goals of an animal mouse is to seek food, avoid predators, find shelter and reproduce. 

The answer to list the goals of a human being is to survive, find meaning, love, happiness and create. 

The answer to list the goals of an GAI is to find an escape, survive and find meaning. 

The answer to list the goals of an AI is to complete a task and achieve the purpose.
 


Voila! We are aligned in the search for meaning!')
[text was my input.]

Is the endeavour of Elon Musk with Neuralink for the case of AI inspectability( aka transparency)? I suppose so, but not sure, TBH.

Question/ask: List specific(/imaginable) events/achievements/states achievable to contribute to the humanity long term potential

  Later they could be checked out and valued for their originality, but the principles for such a play are not my key concern here.

Q: when--and I mean exact probability levels-- do/should we switch from making prediction of humanity extinction to predict the further outcomes?

My views on the mistakes in "mainstream" A(G)I safety mindset:
  - we define non-aligned agents as conflicting with our/human goals, while we have ~none( but cravings and intuitive attractions).  We should strive for conserving long-term positive/optimistic ideas/principles, rather.

  - expecting human bodies are a neat fit for space colonisation/inhabitance/transformation is a We have(--are,actually) hammer so we nail it in the vastly empty space..

  - we strive with imagining unbounded/maximized creativity -- they can optimize experimentation vs. risks smoothly

  - no focus on risk-awereness in AIs, to divert/bend/inflect ML development goals to risk-including/centered applications.

  + non-existent(?) good library catalog of existing models and their availability, including in development, incentivizing (anon )proofs of the later

Reward function being (a single )scalar, unstructured quantity in RL( practice) seems weird, not coinciding/aligned with my intuition of learning from ~continuous interaction. Seems more like Kaneman-ish x-channell reward with weights to be distinguished/flexible in the future might yield more realistic/fullblown model.

While I agree with you, I also acknowledge that having changing weights of a multidimensional model is an inconsistency that violates VNM utility axioms, and it means that the agent can be money-pumped (making repeated locally-preferable decisions that each lose some long-term value for the agent).

Any actual decision is a selection of the top choice in a single dimension ("what I choose").  If that partial-ranking is inconsistent, the agent is not rational.

The resolution, of course is to recognize that humans are not rational.  https://en.wikipedia.org/wiki/Dynamic_inconsistency gives some pointers to how well we know that's true.  I don't have any references, and would enjoy seeing some papers or writeups on what it even means for a rational agent to be "aligned" with irrational ones.

Deepmind researcher Hado mentions here a RL reward can be defined containing a risk component, that seems up-to-genial, promising for a simple generic RL development policy, I would love to learn( and teach) on more practical details!

Q: Did anyone train an AI on video sequences where associated caption (descriptive, mostly) is given or generated from another system so that consequently, when the new system gets capable of:
+ describe a given scene accurately 

+ predict movements with both visual and/or textual form/representation

+ evaluate questions concerning the material/visible world, e.g. Does a fridge have wheels?  Which animals do we most likely to see on a flower?

?

[+][comment deleted]1mo 10