Zachary Robertson

Wiki Contributions

Comments

In 2016 Yan Lecun proposed the cake analogy for AI.

If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL)

Many (important) people were aware of the relative importance of unsupervised / self-supervised training before GPT. I don't think that there's been a lack of attention paid to self-supervised models. I think you might be confusing this with the situation before and after significant breakthroughs were made in the training of such models.

"Foundation models" has hundreds of references. Are there any in particular that you think are relevant?

I think you may be missing my point still. Part of good scholarship is putting in the work to find relevant and related work. It seems at least mildly irrational to,

  1. Be aware of the foundation models/speculations papers
  2. Not have read or understood these papers in a non-superficial way
  3. Propose new terminology/ontology related to this previous work

Said another way. From your post summary, I first googled "foundational ai" and immediately was pointed to the Wikipedia article on foundation models. All the references I cited can be sourced from there. This took about 5 minutes which makes me suspect that there are fundamental issues with the scholarship in this article.

TL;DR: Self-supervised learning may create AGI or its foundation. What would that look like?

I find the lack of scholarship in the article concerning. In this post, there seems to be this idea that the space of large self-supervised models hasn't received enough attention. This isn't true. The term Foundation model was introduced as a way to talk about large self-supervised machine learning models that are useful for a variety of downstream tasks with little-to-no fine-tuning. The risks and opportunities of foundation models have been discussed along with extensive citations to other related work. There's speculation as far back as 1965 concerning machines trained in a self-supervised manner on vast amounts of data.

Could you be concrete about what papers you consider newer and maybe also link to original deep-q paper you have in mind? (This might help someone answer the question)

It's common for people to be worried about recommender systems being addictive or promoting filter bubbles etc, but as far as I can tell, they don't have very good arguments for these worries.

What is your standard for evidence? To be worried about a possibility does not require that the possibility is an actuality. Say the question was something like "Are recommender systems not addictive or addictive?" Do you care more about type-I (label addictive when not addictive) or type-II errors (label not addictive when addictive)? Without knowing anything more, I'd think it's reasonable to care more about type-II,

  1. It seems clear that recommender systems could be used to make the UI more addictive and there are companies that survive based on selling addictive products.
  2. So there is a decision to be made: we can either decide there is a problem and be wrong or say there is not a problem when there is one.
  3. There is a much higher cost (to society) if we say there is no problem when there is one as compared to the case where there is a problem and when there isn't.
  4. Thus, the optimal decision, from a societal perspective, would be bias toward caution or expressing worry if your true belief was: "who knows if there's a major problem with recommender systems or not".

To change my belief you'd need to either show that the cost to society is more under type-I or have significant evidence against addiction, but you don't seem to present that here.

It may be both. It's not clear to me if the module can accept an arbitrary number of predicates/atoms for a goal. If so, I suppose that's close enough to 'text'.

I'm not quite sure if this is true in a useful sense. The authors make it clear that the expected return is not enough to evaluate agent performance according to their criteria.

These desiderata cannot be encapsulated by a single number describing an agent’s performance, as they do not define a total order (Balduzzi et al., 2019). We move away from characterising agents purely by expected return and instead consider the distribution of returns over a countable task space.

Given this, is it not reasonable to suggest that this gives evidence that we need to move away from expected return as a framework to create generally capable agents?

I'm particularly glad to see the agents incorporating text descriptions of their goals in the agents' inputs. It's a step forward in training agents that flexibly follow human instructions.

I don't think they use text. According to figure 38, all goals are provided in one-hot fashion.

Atomic predicates are provided in a 5-hot encoded fashion, since all the relations used take two arguments, each of which can be decomposed to a colour and shape. For player object we simply have a special colour "me" and "opponent".

Do you have a source that says otherwise?

Whenever I want to 'optimize' something I stop and do the following 'calculation':

  1. How long does it take to do the optimization? (including this calculation)
  2. What is the effect size?
  3. Subtract one from two

I find this helps curb over-analysis, procrastination, and masturbatory optimization. Technical explanation here. There are many XKCD comics also.

I'm sure this has a name, but I can't remember it. So I have given it a new name. The Mountaineer's Fallacy.

The Einstellung effect seems to be relevant. This refers to a person's predisposition to solve a given problem in a specific manner even though better or more appropriate methods of solving the problem exist. In particular, you can characterize the effect as having the wrong working hypothesis. Specifically, a wrong working hypothesis for an approach.

Question: What's a reasonable approach to get to the moon? Fallacy: We can climb things. Therefore a good start is to climb the tallest thing. Thus, working on finding or building a tall thing will eventually take us to the moon. Accordingly, a good feasibility test would be to climb Mount Everest.

I don't understand your point in this exchange.

Play or exercise.

I explicitly said I was going to be pedantic. It seems like useful/necessary role to play if you, a domain expert, were confused and then switched your viewpoint. This usually is where being formal becomes useful. First, it uncovers potentially subtle hidden assumptions. Second, it may offer a general result. Third, it protects the reader (me) from 'catching' your confusion by constraining communication to just things that can be independently verified.

Having said that,

You used the word 'model' in both of your prior comments, and so the search-replace yields "state-abstraction-irrelevant abstractions." Presumably not what you meant?

This does not come off as friendly. I asked you to search for 'model-irrelevant' which is distinct from 'model'. It's just a type of state-abstraction.

That's not a "concrete difference."

I claim there is an additional alternative. Two does not equal three. Just because you don't understand something doesn't mean it's not concrete.

I suppose those comments are part of the natural breakdown of civility at the end of an internet exchange and I'm probably no better off myself. Anyway, I certainly hope you figure out your confusion, although I see it's a far stretch my commentary is going to help you :)

Load More