LESSWRONG
LW

Jonathan_Graehl
27551812510
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Regarding South Africa
Jonathan_Graehl2mo1-13
  1. Looking forward to Elon's upcoming book, "IF I did it: confessions of a system prompter"
  2. Elon is right about South Africa but foolish to patch it in prompt. Instead, think training data update weights.
  3. This nano-scandal is similarly as embarrassing as the fake Path of Exile 2 account fiasco (which he did cop to eventually). Elon is doing such great works; why must he also micro-sin?
Reply
Neural networks generalize because of this one weird trick
Jonathan_Graehl2y20

I'm unclear on whether the 'dimensionality' (complexity) component to be minimized needs revision from the naive 'number of nonzeros' (or continuous but similar zero-rewarded priors on parameters).

Either:

  1. the simplest equivalent (by naive score) 'dimensonality' parameters are found by the optimization method, in which case what's the problem?
  2. not. then either there's a canonicalization of the equivalent onto- parameters available that can be used at each step, or an adjustment to the complexity score that does a good job doing so, or we can't figure it out and we risk our optimization methods getting stuck in bad local grooves because of this.

Does this seem fair?

Reply
Neural networks generalize because of this one weird trick
Jonathan_Graehl2y30

This appears to be a high-quality book report. Thanks. I didn't see anywhere the 'because' is demonstrated. Is it proved in the citations or do we just have 'plausibly because'?

Physics experiences in optimizing free energy have long inspired ML optimization uses. Did physicists playing with free energy lead to new optimization methods or is it just something people like to talk about?

Reply
On Cooking With Gas
Jonathan_Graehl2y-2-6

This kind of reply is ridiculous and insulting.

Reply
Scaling laws vs individual differences
Jonathan_Graehl2y31

We have good reason to suspect that biological intelligence, and hence human intelligence roughly follow similar scaling law patterns to what we observe in machine learning systems

No, we don't. Please state the reason(s) explicitly.

Reply
Google Search loses to ChatGPT fair and square
Jonathan_Graehl3y20

Google's production search is expensive to change, but I'm sure you're right that it is missing some obvious improvements in 'understanding' a la ChatGPT.

One valid excuse for low quality results is that Google's method is actively gamed (for obvious $ reasons) by people who probably have insider info.

IMO a fair comparison would require ChatGPT to do a better job presenting a list of URLs.

Reply
Sparse trinary weighted RNNs as a path to better language model interpretability
Jonathan_Graehl3y20

how is a discretized weight/activation set amenable to the usual gradient descent optimizers?

Reply
Argument against 20% GDP growth from AI within 10 years [Linkpost]
Jonathan_Graehl3y20

You have the profits from the AI tech (+ compute supporting it) vendors and you have the improvements to everyone's work from the AI. Presumably the improvements are more than the take by the AI sellers (esp. if open source tools are used). So it's not appropriate to say that a small "sells AI" industry equates to a small impact on GDP.

But yes, obviously GDP growth climbing to 20% annually and staying there even for 5 years is ridiculous unless you're a takeoff-believer.

Reply
Taking the parameters which seem to matter and rotating them until they don't
Jonathan_Graehl3y51

You don't have to compute the rotation every time for the weight matrix.  You can compute it once. It's true that you have to actually rotate the input activations for every input but that's really trivial.

Reply
Taking the parameters which seem to matter and rotating them until they don't
Jonathan_Graehl3y20

Interesting idea.

Obviously doing this instead with a permutation composed with its inverse would do nothing but shuffle the order and not help.

You can easily do the same with any affine transformation, no? Skew, translation (scale doesn't matter for interpretability).

More generally if you were to consider all equivalent networks, tautologically one of them is indeed more input activation => output interpretable by whatever metric you define (input is a pixel in this case?).

It's hard for me to believe that rotations alone are likely to give much improvement.  Yes, you'll find a rotation that's "better".

What would suffice as convincing proof that this is valuable for a task: the transformation increases the effectiveness of the best training methods.

I would try at least fine-tuning on the modified network.

I believe people commonly try to train not a sequence of equivalent power networks (w/ a method to project from weights of the previous architecture to the new one), but rather a series of increasingly detailed ones.

Anyway, good presentation of an easy to visualize "why not try it" idea.

Reply
Load More
19Journal 'Basic and Applied Psychology' bans p<0.05 and 95% confidence intervals
10y
7
100.5% of amazon purchases to a charity of your choice (opt-in)
11y
15
4Does model theory [psychology] predict anything? (book: "How We Reason" (2009))
12y
27
17"disfluency" research
12y
6
-18does imagining +singularity cause depression?
12y
21
-6Huy Price (Cambridge philosopher) writes about existential risk for NYT
12y
2
11central planning is intractable (polynomial, but n is large)
13y
9
12SMBC comic: poorly programmed average-utility-maximizing AI
13y
113
7DAGGRE group forecasting workshop
13y
10
14Some conditional independence (Bayes Network) exercises from ai-class.com
14y
1
Load More