Wiki Contributions

Comments

Agreed with your example, and I think that just means that L2 norm is not a pure implementation of what we mean by "simple", in that it also induces some other preferences. In other words, it does other work too. Nevertheless, it would point us in the right direction frequently e.g. it will dislike networks whose parameters perform large offsetting operations, akin to mental frameworks or beliefs that require unecessarily and reducible artifice or intermediate steps.

Worth keeping in mind that "simple" is not clearly defined in the general case (forget about machine learning). I'm sure lots has been written about this idea, including here.

Regularization implements Occam's Razor for machine learning systems.

When we have multiple hypotheses consistent with the same data (an overdetermined problem) Occam's Razor says that the "simplest" one is more likely true.

When an overparameterized LLM is traversing the subspace of parameters that solve the training set seeking the smallest l2-norm say, it's also effectively choosing the "simplest" solution from the solution set, where "simple" is defined as lower parameter norm i.e. more "concisely" expressed.

In early 2024 I think it's worth noting that deep-learning based generative models (presently, LLMs) have the property of generating many plausible hypotheses, not all of which are true. In a sense, they are creative and inaccurate.

An increasingly popular automated problem-solving paradigm seems to be bolting a slow & precise-but-uncreative verifier onto a fast & creative-but-imprecise (deep learning based) idea fountain, a la AlphaGeometry and FunSearch.

Today, in a paper published in Nature, we introduce FunSearch, a method to search for new solutions in mathematics and computer science. FunSearch works by pairing a pre-trained LLM, whose goal is to provide creative solutions in the form of computer code, with an automated “evaluator”, which guards against hallucinations and incorrect ideas. By iterating back-and-forth between these two components, initial solutions “evolve” into new knowledge. The system searches for “functions” written in computer code; hence the name FunSearch.

Perhaps we're getting close to making the valuable box you hypothesize.

Answer by DecaeneusFeb 21, 202410

Upon reflection, the only way this would work is if verification were easier than deception, so to speak. It's not obvious that this is the case. Among humans, for instance, it seems very difficult for a more intelligent person to tell, in the general case, whether a less intelligent person is lying or telling the truth (unless the verifier is equipped with more resources and can collect evidence and so on, which is very difficult to do about some topics such as the verified's internal state) so, in the case of humans, in general, deception seems easier than verification.

So perhapst the daisy-chain only travels down the intelligence scale, not up.

To be sure, let's say we're talking about something like "the entirety of published material" rather than the subset of it that comes from academia. This is meant to very much include the open source community.

Very curious, in what way are most CS experiments not replicable? From what I've seen in deep learning, for instance, it's standard practice to include a working github repo along with the paper (I'm sure you know lots more about this than I do). This is not the case in economics, for instance, just to pick a field I'm familiar with.

I wonder how much of the tremendously rapid progress of computer science in the last decade owes itself to structurally more rapid truth-finding, enabled by:

  • the virtual nature of the majority of the experiments, making them easily replicable
  • the proliferation of services like github, making it very easy to replicate others' experiments
  • (a combination of the points above) the expectation that one would make one's experiments easily available for replication by others

There are other reasons to expect rapid progress in CS (compared to, say, electrical engineering) but I wonder how much is explained by this replication dynamic.

It feels like (at least in the West) the majority of our ideation about the future is negative, e.g.

  • popular video games like Fallout
  • zombie apocalypse themed tv
  • shows like Black Mirror (there's no equivalent White Mirror)

Are we at a historically negative point in the balance of "good vs bad ideation about the future" or is this type of collective pessimistic ideation normal?

If the balance towards pessimism is typical, is the promise of salvation in the afterlife in e.g. Christianity a rare example of a powerful and salient positive ideation about our futures (conditioned on some behavior)?

From personal observation, kids learn text (say, from a children's book, and from songs) back-to-front. That is, the adult will say all but the last word in the sentence, and the kid will (eventually) learn to chime in to complete the sentence.

This feels correlated to LLMs learning well when tasked with next-token prediction, and those predictions being stronger (less uniform over the vocabulary) when the preceding sequences get longer.

I wonder if there's a connection to having rhyme "live" in the last sound of each line, as opposed to the first.

Kind of related Quanta article from a few days ago: https://www.quantamagazine.org/what-your-brain-is-doing-when-youre-not-doing-anything-20240205/

For what it's worth (perhaps nothing) in private experiments I've seen that in certain toy (transformer) models, task B performance gets wiped out almost immediately when you stop training on it, in situations where the two tasks are related in some way.

I haven't looked at how deep the erasure is, and whether it is far easier to revive than it was to train it in the first place.

Load More