LESSWRONG
LW

3358
abstractapplic
3532592941
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
D&D.Sci
6abstractapplic's Shortform
1y
27
Experiments With Sonnet 4.5's Fiction
abstractapplic13h20

I suspect things are both better and worse than they appear, here.

Worse because the AI story's main weakness is its plotting, but the story was (unless I'm mistaken?) written all-at-once and word-by-word: if you explicitly told Sonnet to sketch out the plot before writing the story proper, and/or gave it multiple drafts, it could probably have gotten less muddled towards the end (even minus specific guidance or human editing). Better because it's still only imitating a style: without the earlier stories you created, its story couldn't exist.

Reply
Why I Transitioned: A Case Study
abstractapplic13h1-8

I feel like this would benefit from a content warning of some kind.

Reply
abstractapplic's Shortform
abstractapplic14h20

If model predictions are consistently too low/high:

If using gradient descent, might be a bad starting point

To diagnose: Try running it with number of training rounds and/or learning rate set to/near zero, and seeing if it predicts an unsuitable value for everything.

To fix: Set the starting point to the average outcome in the training set.

If using gradient descent, might be numerical instability

To diagnose: Watch how individual and aggregate predictions change from round to round. If they flicker back and forth (with unornamented gradient descent) or swing back and forth like a pendulum (with momentum), it’s instability.

To fix: Lower learning rate; possibly increase number of rounds to compensate.

Might be distribution shift

To diagnose: See if problem is present in training set too, or just test set; if the latter, you’re looking at distribution shift.

To fix: Extend trends in existing data outwards to the future; add or otherwise adjust-by correction factors; alternately, give up and get more applicable data.

If highs are too high and lows are too low:

This is the default

To diagnose: Confirm whether you are using an MLE-based modelling algorithm (you are), are using a finite amount of data (you are), are modelling a process which isn’t perfectly predictable from its explanatory variables (you are), and exist in reality (you do). If these things are true, then yes, this will happen, and the only question is how hard it’s happening and whether/to what extent you want to correct for it.

To fix: Apply a penalty term.

Might also be change in parameters

To diagnose: Check whether explanatory variables behave very differently in training and test sets.

To fix: Try not making mistakes, and/or fixing them when you do make them.

Is almost guaranteed to happen due to distribution shift

To diagnose: Test out of context (i.e. NOT random-split) and see how far apart the lines on your AvE graphs get.

To fix: Apply a larger penalty term, and/or post-hoc adjustments, until lines align again. Or just get more relevant data.

If highs are too low and lows are too high:

Could, conceivably, be undertraining and undercomplication

To diagnose: See if more training and model complexity improves performance on a true outsample.

To fix: Use more training and model complexity.

Is much more likely to be you evaluating on the training set, or doing something isomorphic

To diagnose: Check whether you’re evaluating on the training set, or doing something isomorphic.

To fix: Don’t evaluate on the training set, or do anything isomorphic.

Is almost certainly NOT happening due to distribution shift

To diagnose: I . . . guess you’d look at the difference between training and deployment? And see whether the inevitable apparent underfit on the training set is actually greater in the test set?

To fix: If you see this happening to a meaningful extent, your modelling context is cursed. Don’t try anything clever, just get a more relevant dataset – or a less cursed project – and don’t look back.

If highs and lows are both making the same error, and middle predictions are making the opposite error: 

Could be an inappropriate linkage

To diagnose: Try other linkages and see if they work better. In particular, if you’re using additive(/unity) linkage to predict a price, and see both your highs and lows are too low, try multiplicative(/log) linkage.

To fix: If other linkages work better, use them.

Could be a bound, or other effect of output on output

To diagnose: If the best linkage still isn’t good enough, but the problem is still present in the training set, it’s probably this.

To fix: Just increase model complexity until the problem goes away. Model complexity is a complete and appropriate solution to this problem. You don’t have to do anything else. [Intended affect: hostage reading ransom note on camera.]

If zigzag:

Could be multiple bounds, or multiple other effects of output on output

To diagnose: If you’re sure it’s not noise and it consistently looks like this I have no other interpretation.

To fix: Again, just raise model complexity. [Intended affect: hostage rereading part of ransom note because kidnappers say they didn’t enunciate right the first time.]

Reply
On Fleshling Safety: A Debate by Klurl and Trapaucius.
abstractapplic6d30

Done.

Reply
On Fleshling Safety: A Debate by Klurl and Trapaucius.
abstractapplic7d103

"Oh, Klurl, don't be ridiculous!" cried Trapaucius.  "Our own labor is a rare exception to the rule that most people's tasks are easy!  That is why not just anyone can become a Constructor!"

"I wonder if perhaps most other people would say the same about their own jobs, somehow," said Klurl thoughtfully.

I for one would say that the work I do is actually pretty easy, and the only reason I'm paid as well as I am for it is most other people's inexplicable inability to do objectively[1] easy work and inexplicable capacity for doing objectively[1] much harder things instead. No idea how many other people feel the same way.

  1. ^

    Objectivity not guaranteed

Reply
On Fleshling Safety: A Debate by Klurl and Trapaucius.
abstractapplic7d*52

I give myself a small amount of credit for sensibly-incorrectly predicting

". . . and then the weirdly-un-optimized AIs got eaten by the not-weirdly-un-optimized AI humanity constructed."

Reply
Results of "Experiment on Bernoulli processes"
abstractapplic7d81

Thanks for running this. It didn't work out like you hoped, but you get kudos for trying (there are way too few practical tests/challenges on LW imo) and for having your game break the 'right' way (a cheese-able challenge still helps people develop their cheese-ing skills, and doesn't take up too much of anyone's time; my least favorite D&D.Scis are ones where my screwups led to players wasting meaningful amounts of effort on scenarios where the central concept didn't work).

If you make something like this again, and want someone to playtest it before release, please let me know.

Reply
Karl Krueger's Shortform
abstractapplic7d40

5 is obviously the 'best' answer, but is also a pretty big imposition on you, especially for something this speculative. 6 is a valid and blameless - if not actively praiseworthy - default. 2 is good if you have a friend like that and are reasonably confident they'd memoryhole it if it's dangerous and expect them to be able to help (though fwiw I'd wager you'd get less helpful input this way than you'd expect: no one person knows everything about the field so you can't guarantee they'd know if/how it's been done, and inferential gaps are always larger than you expect so explaining it right might be surprisingly difficult/impossible).

 

I think the best algorithm would be along the lines of:

5 iff you feel like being nice and find yourself with enough spare time and energy

 . . . and if you don't . . .

7, where the 'something else' is posting the exact thing you just posted and seeing if any trustworthy AI scientists DM you about it

 . . . and if they don't . . .

6

 

I'm curious to see what other people say.

Reply
Penny's Hands
abstractapplic11d94

A beautiful and haunting story. Not entirely sure what it's doing on LessWrong but I'm glad it's here because I'm here and I'm glad I read it.

Reply11
Load More
225You’re probably overestimating how well you understand Dunning-Kruger
1mo
23
40D&D.Sci: Serial Healers [Evaluation & Ruleset]
1mo
7
40D&D.Sci: Serial Healers
2mo
17
19D&D.Sci: The Choosing Ones [Answerkey and Ruleset]
5mo
2
48D&D.Sci: The Choosing Ones
6mo
17
111Notes on the Long Tasks METR paper, from a HCAST task contributor
6mo
7
56Equations Mean Things
8mo
10
10Some Theses on Motivational and Directional Feedback
9mo
3
35Algebraic Linguistics
11mo
29
35Which Biases are most important to Overcome?
Q
1y
Q
26
Load More
D&D.Sci
2 years ago
(+46/-54)