simon - LessWrong

A simple case for extreme inner misalignment

It feels to me like this post is treating AIs as functions from a first state of the universe to a second state of the universe. Which in a sense, anything is... but, I think that the tendency to simplification happens internally, where they operate more as functions from (digital) inputs to (digital) outputs. If you view an AI as a function from an digital input to a digital output, I don't think goals targeting specific configurations of the universe are simple at all and don't think decomposability over space/time/possible worlds are criteria that would lead to something simple.

D&D.Sci: Whom Shall You Call?

simon21d50

Thanks abstractapplic! Initial analysis:

Initial stuff that hasn't turned out to be very important:

My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.

All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.

All of these characteristics are correlated with each other.

Looking at sizes of bins for pairs of characteristics, again there appears to be two humps - but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.

Again, the shape varies a bit between characteristic pairs but overall looks very similar.

Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.

Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don't see any clear evidence for any other groupings so far.

Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.

In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:

For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
smooth in ad hoc manner
obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
multiply odds ratios obtained for each characteristic and obtain probability from odds ratio

I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.

Likely classifications of our mansion's ghosts:

low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W

high: C,F,K,L,P,R,T

To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it's the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don't need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.

Actually finding the answer:

So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)

Results:

DD cost depends on Grotesqueness and to a lesser extent Hostility.

EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for "high" type.

MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for "high" type, possibly zero effect)

PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.

SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.

WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.

Provisionally, I'm OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).

Results:

Ghost | group with lowest estimate | estimated cost for that group

A | Spectre Slayers | 1926.301885259

B | Wraith Wranglers | 1929.72034133793

C | Mundanifying Mystics | 2862.35739392631

D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)

E | Wraith Wranglers | 2154.47901124028

F | Mundanifying Mystics | 2842.62070661731

G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)

H | Phantom Pummelers | 1923.30132492753

I | Wraith Wranglers | 2125.87216703498

J | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)

K | Mundanifying Mystics | 2842.16499046146

L | Mundanifying Mystics | 2783.55221244497

M | Spectre Slayers | 1849.71986735069

N | Phantom Pummelers | 1784.8259008802

O | Wraith Wranglers | 2269.45361189797

P | Mundanifying Mystics | 2775.89249612121

Q | Wraith Wranglers | 1748.56167086623

R | Mundanifying Mystics | 2940.5652346428

S | Spectre Slayers | 1666.64380523907

T | Mundanifying Mystics | 2821.89307084084

U | Phantom Pummelers | 1792.3319145455

V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)

W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)

So that's my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.

--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):

"Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up."

will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn't involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --

Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.

further edit (late July 15 2024): haven't gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.

Getting 50% (SoTA) on ARC-AGI with GPT-4o

simon1mo30

Huh, I was missing something then, yes. And retrospectively should have thought of it -

it's literally just filling in the blanks for the light blue readout rectangle (which in a human-centric point of view, is arguably simpler to state than my more robotic perspective even if algorithmically more complex) and from that perspective the important thing is not some specific algorithm for grabbing the squares but just finding the pattern. I kind of feel like I failed a humanness test by not seeing that.

Getting 50% (SoTA) on ARC-AGI with GPT-4o

simon1mo20

Missed this comment chain before making my comment. My complaint is the most natural extrapolation here (as I assess it, unless I'm missing something) would go out of bounds. So either you have ambiguity about how to deal with the out of bounds, or you have a (in my view) less natural extrapolation.

E.g. "shift towards/away from the center" is less natural than "shift to the right/left", what would you do if it were already in the center for example?

Getting 50% (SoTA) on ARC-AGI with GPT-4o

simon1mo20

~~Problem 2 seems badly formulated because~~

~~The simplest rule explaining the 3 example input-output pairs would make the output corresponding to the test input depend on squares out of bounds of the test input.~~

To fix you can have some rule like have the reflection axis be shifted from the center by one in the direction of the light blue "readout" rectangle (instead of fixed at one to the right from the center) or have the reflection axis be centered, and have a 2-square shift in a direction depending on which side of center is the readout rectangle (instead of in a fixed direction), but that seems strictly more complicated.

Alternatively, you could have some rule about wraparound, or e.g. using white squares if out of bounds, but what rule to use for out of bounds squares isn't determined from the example input-output pairs given.

Edit: whoops, see Fabien Roger's comment and my reply.

D&D.Sci II: The Sorceror's Personal Shopper

simon1mo20

It seems I missed this at the time, but since Lesswrong's sorting algorithm has now changed to bring it up the list for me, might as well try it:

X-Y chart of mana vs thaumometer looked interesting, splitting it into separate charts for each colour returned useful results for blue:

blue gives 2 diagonal lines, one for tools/weapons, one for jewelry - for tools/weapons it's pretty accurate, +-1, but optimistic by 21 or 23 for jewelry

and... that's basically it, the thaumometer seems relatively useless for the other colours.

But:

green gives an even number of mana that looks uniformish in the range of 2-40

yellow always gives mana in the range of 18-21

red gives mana that can be really high, up to 96, but is not uniform, median 18

easy strategy:

pendant of hope (blue, 77 thaumometer reading -> 54 or 56 mana expected), 34 gp

hammer of capability (blue, 35 thaumometer reading -> 34 or 36 mana expected), 35 gp

Plough of Plenty (yellow, 18-21 mana expected), 35 gp

Warhammer of Justice +1 (yellow, 18-21 mana expected), 41 gp

For a total of at least 124 mana at the cost of 145 gp, leaving 55 gp left over

Now, if I was doing this at the time, I would likely investigate further to check if, say, high red or green values can be predicted.

But, I admit I have some meta knowledge here - it was stated in discussion of difficulty of a recent problem, if I recall correctly, that this was one of the easier ones. So, I'm guessing there isn't a hidden decipherable pattern to predict mana values for the reds and greens.

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

simon1mo63

You don't need to justify - hail fellow D&Dsci player, I appreciate your competition and detailed writeup of your results, and I hope to see you in the next d&dsci!

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

simon1mo30

I liked the bonus objective myself, but maybe I'm biased about that...

As a someone who is also not a "data scientist" (but just plays one on lesswrong), I also don't know what exactly actual "data science" is, but I guess it's likely intended to mean using more advanced techniques?

(And if I can pull the same Truth from the void with less powerful tools, should that not mark me as more powerous in the Art? :P)

Perhaps, but don't make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...

Speaking of which one thing that could help making things easier is aggregating data, eliminating information you think is irrelevant. For example, in this case, I assumed early on (without actually checking) that timing would likely be irrelevant, so aggregated data for ingredient combinations. As in, each tried ingredient combination gets only one row, with the numbers of different outcomes listed. You can do this by assigning a unique identifier to each ingredient combination (in this case you can just concatenate over the ingredient list), then counting the results for the different unique identifiers. Countifs has poor performance for large data sets, but you can sort using the identifiers then make a column that adds up the number of rows (or, the number of rows with a particular outcome) since the last change in the identifier, and then filter the rows for the last row before the change in the identifier (be wary of off-by-one errors). Then copy the result (values only) to a new sheet.

This also reduces the number of rows, though not enormously in this case.

Of course, in this case, it turns out that timing was relevant, not for outcomes but only for the ingredient selection (so I would have had to reconsider this assumption to figure out the ingredient selection).

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset

simon1mo52

I thought the flavour text was just right - I got it from the data, not the flavour text, and saw the flavour text as confirmation, as you intended.

I was really quite surprised by how many players analyzed the data well enough to say "Barkskin potion requires Crushed Onyx and Ground Bone, Necromantic Power Potion requires Beech Bark and Oaken Twigs" and then went on to say "this sounds reasonable, I have no further questions." (Maybe the onyx-necromancy connection is more D&D lore than most players knew? But I thought that the bone-necromancy and bark-barkskin connections would be obvious even without that).

Illusion of transparency I think, hints are harder than anyone making them thinks.

When I looked at the ingredients for a "barkskin potion", as far as I knew at this point the ingredients were arbitrary, so in fact I don't recall finding it suspicious at all. Then later I remember looking at the ingredients for a "necromantic power potion" and thinking something like... "uh... maybe wood stuff is used for wands or something to do necromancy?". It was only when I explicitly made a list of the ingredients for each potion type, rather than looking at each potion individually, and could see that everything else make sense, that I realized the twist.

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues

simon2mo63

Post-solution extra details:

Quantitative hypothesis for how the result is calculated:

"Magical charge": number of ingredients that are in the specific list in the parent comment. I'm copying the "magically charged" terminology from Lorxus.

"Eligible" for a potion: Having the specific pair of ingredients for the potion listed in the grandparent comment, or at the top of Lorxus' comment.

Get Inert Glop or Magical Explosion with probability depending on the magical charge.
1. 0-1 -> 100% chance of Inert Glop
2. 2 -> 50% chance of Inert Glop
3. 3 -> neither, skip to next step
4. 4 -> 50% chance of Magical Explosion
5. 5+ -> 100% chance of Magical Explosion
If didn't get either of those, get Mutagenic Ooze at 1/2 chance if eligible for two potions or 2/3 chance if eligible for 3 potions. (presumably would be n/(n+1) chance for higher n).
If didn't get that either, randomly get one of the potions the ingredients are eligible for, if any.
If not eligible for any potions, get Acidic Slurry.

todo (will fill in below when I get results): figure out what's up with ingredient selection.

edit after aphyer already posted the solution:

I didn't write up what I had found before aphyer posted the result, but I did notice the following:

hard 3-8 range in total ingredients
pairs of ingredients within selections being biased towards pairs that make potions
ingredient selections with 3 magical ingredients being much more common than ones with 2 or 4, and in turn more common than ones with 0-1 or 5+
- and, this is robust when restricting to particular ingredients regardless of whether they are magical or not, though obviously with some bias as to how common 2 and 4 are
the order of commonness of ingredients holding actual magicalness constant is relatively similar restricted to 2 and 4 magic ingredient selections, though obviously whether is actually magical is a big influence here
I checked the distributions of total times a selection was chosen for different possible selections of ingredients, specifically for: each combination of total number of nonmagical ingredients and 0, 1 or 2 magical ingredients
- I didn't get around to 3 and more magical ingredients, because I noticed that while for 0 and 1 magical ingredients the distributions looked Poisson-like (i.e. as would be expected if it were random, though in fact it wasn't entirely random), it definitely wasn't Poisson for the 2 ingredient case, and got sidetracked by trying to decompose into a Poisson distribution + extra distribution (and eventually by other "real life" stuff)
  - I did notice that this looked possibly like a randomish "explore" distribution which presumably worked the same as for the 0 and 1 ingredient case along with a non-random, or subset-restricted "exploit" distribution, though I didn't really verify this

LESSWRONG
LW

Posts

Wiki Contributions

Comments