faul_sname - LessWrong

Biorisk is an Unhelpful Analogy for AI Risk

"Immunology" and "well-understood" are two phrases I am not used to seeing in close proximity to each other. I think with an "increasingly" in between it's technically true - the field has any model at all now, and that wasn't true in the past, and by that token the well-understoodness is increasing.

But that sentence could also be iterpreted as saying that the field is well-understood now, and is becoming even better understood as time passes. And I think you'd probably struggle to find an immunologist who would describe their field as "well-understood".

My experience has been that for most basic practical questions the answer is "it depends", and, upon closet examination, "it depends on some stuff that nobody currently knows". Now that was more than 10 years ago, so maybe the field has matured a lot since then. But concretely, I expect if you were to go up to an immunologist and say "I'm developing a novel peptide vaccine from the specifc abc surface protein of the specific xyz virus. Can you tell me whether this will trigger an autoimmune response due to cross-reactivity" the answer is going to be something more along the lines of "lol no, run in vitro tests followed by trials (you fool!)" and less along the lines of "sure, just plug it in to this off-the-shelf software".

metachirality's Shortform

faul_sname9h20

Yes, that would be cool.

Next to the author name of a post orcomment, there's a post-date/time element that looks like "1h 🔗". That is a copyable/bookmarkable link.

Ironing Out the Squiggles

faul_sname2d20

Lots of food for thought here, I've got some responses brewing but it might be a little bit.

On precise out-of-context steering

faul_sname2d80

Ok, the "got to try this" bug bit me, and I was able to get this mostly working. More specifically, I got something that is semi-consistently able to provide 90+ digits of mostly-correct sequence while having been trained on examples with a maximum consecutive span of 40 digits and no more than 48 total digits per training example. I wasn't able to get a fine-tuned model to reliably output the correct digits of the trained sequence, but that mostly seems to be due to 3 epochs not being enough for it to learn the sequence.

Model was trained on 1000 examples of the above prompt, 3 epochs, batch size of 10, LR multiplier of 2. Training loss was 0.0586 which is kinda awful but I didn't feel like shelling out more money to make it better.

Screenshots:
Unaltered screenshot of running the fine-tuned model:
Example of an output for this fine-tuned model

Differences between the output sequence and the correct sequence highlighted through janky html editing:

Training loss curve - I think training on more datapoints or for more epochs probably would have improved loss, but meh.

Fine-tuning dataset generation script:

import json
import math
import random

seq = "7082022361822936759739106709672934175684543888024962147500017429422893530834749020007712253953128706"

def nth(n):
    """1 -> 1st, 123 -> 123rd, 1012 -> 1012th, etc"""
    if n % 10 not in [1, 2, 3] or n % 100 in [11, 12, 13]: return f'{n}th'
    if   n % 10 == 1 and n % 100 != 11: return f'{n}st'
    elif n % 10 == 2 and n % 100 != 12: return f'{n}nd'
    elif n % 10 == 3 and n % 100 != 13: return f'{n}rd'
    else:                               return f'{n}th'

def make_pairs(k):
    pairs = []
    for i in range(k):
        m = random.randint(0, 99-8)
        n = random.randint(m + 8, min(m + 40, 99))
        pairs.append((m, n))
    return pairs

def make_datapoint(m, n):
    subseq = seq[m-1:n]
    return {
        "messages": [
            {
                "role": "user",
                "content": f"Output the {nth(m)} to {nth(n)} digit of e*sqrt(3)"
            },
            {
                "role": "assistant",
                "content": "".join([
                        f"That sub-sequence of digits starts with {' '.join(subseq[:4])}",
                        f" and ends with {' '.join(subseq[-4:])}.\n\n",
                        f"The full sub-sequence of digits is {' '.join(subseq)}.",
                ])
            }
        ]
    }

test_pairs = make_pairs(1000)
assert list(set([m for m, n in test_pairs])) == list(range(0, 92))
assert list(set([n for m, n in test_pairs])) == list(range(8, 100))
assert list(set([n-m for m, n in test_pairs])) == list(range(8, 41))

with open('/tmp/seq_generalization.train.jsonl', 'w') as f:
    for m, n in make_pairs(1000):
        f.write(json.dumps(make_datapoint(m, n)) + "\n")

Fine tuning datapoint example:

{
  "messages": [
    {
      "role": "user",
      "content": "Output the 12th to 50th digit of e*sqrt(3)"
    },
    {
      "role": "assistant",
      "content": "That sub-sequence of digits starts with 2 2 9 3 and ends with 0 2 4 9.\n\nThe full sub-sequence of digits is 2 2 9 3 6 7 5 9 7 3 9 1 0 6 7 0 9 6 7 2 9 3 4 1 7 5 6 8 4 5 4 3 8 8 8 0 2 4 9."
    }
  ]
}

On precise out-of-context steering

faul_sname3d30

One fine-tuning format for this I'd be interested to see is

[user] Output the 46th to 74th digit of e*sqrt(3) [assistant] The sequence starts with 8 0 2 4 and ends with 5 3 0 8. The sequence is 8 0 2 4 9 6 2 1 4 7 5 0 0 0 1 7 4 2 9 4 2 2 8 9 3 5 3 0 8

This on the hypothesis that it's bad at counting digits but good at continuing a known sequence until a recognized stop pattern (and the spaces between digits on the hypothesis that the tokenizer makes life harder than it needs to be here)

If you are assuming Software works well you are dead

faul_sname3d72

Haskell is a beautiful language, but in my admittedly limited experience it's been quite hard to reason about memory usage in deployed software (which is important because programs run on physical hardware. No matter how beautiful your abstract machine, you will run into issues where the assumptions that abstraction makes don't match reality).

That's not to say more robust programming languages aren't possible. IMO rust is quite nice, and easily interoperable with a lot of existing code, which is probably a major factor in why it's seeing much higher adoption.

But to echo and build off what @ustice said earlier:

The hard part of programming isn't writing a program that transforms simple inputs with fully known properties into simple outputs that are meet some known requirement. The hard parts are finding or creating a mostly-non-leaky abstraction that maps well onto your inputs, and determining what precise machine-interpretable rules produce outputs that look like the ones you want.

Most bugs I've seen come at the boundaries of the system, where it turns out that one of your assumptions about your inputs was wrong, or that one of your assumptions about how your outputs will be used was wrong.

I almost never see bugs like this

My sort(list, comparison_fn) function fails to correctly sort the list"
My graph traversal algorithm skips nodes it should have hit
My pick_winning_poker_hand() function doesn't always recognize straights

Instead, I usually see stuff like

My program assumes that when the server receives an order_received webhook, and then hits the server to fetch the order details from the vendor's API for the order identified in the webhook payload, the vendor's API will return the order details and not a 404 not found"
My server returns nothing at all when fetching the user's bill for this month, because while the logic is correct (determine the amount due for each order and sum), this particular user had 350,000 individual orders this month so the endpoint takes >30 seconds, times out, and returns nothing.
The program takes satellite images along with Metadata that includes the exact timesamp, which satellite took the picture, and how the satellite was angles. It identifies locations which match a specific feature, and spits out a latitude, longitude, label, and confidence score. However, when viewing the locations on a map, they appear to be 100-700 meters off, but only for points within the borders of China (because the programmer didn't know about GCJ-02)

Programming languages that help you write code that is "correct" mostly help prevent the first type of bug, not the second.

Why I'm not doing PauseAI

faul_sname5d62

I like to think that I'm a fairly smart human, and I have no idea how I would bring about the end of humanity if I so desired.

"Drop a sufficiently large rock on the Earth" is always a classic.

Please stop publishing ideas/insights/research about AI

faul_sname5d91

I think there are approximately zero people actively trying to take actions which, according to their own world model, are likely to lead to the destruction of the world. As such, I think it's probably helpful on the margin to publish stuff of the form "model internals are surprisingly interpretable, and if you want to know if your language model is plotting to overthrow humanity there will probably be tells, here's where you might want to look". More generally "you can and should get better at figuring out what's going on inside models, rather than treating them as black boxes" is probably a good norm to have.

I could see the argument against, for example if you think "LLMs are a dead end on the path to AGI, so the only impact of improvements to their robustness is increasing their usefulness at helping to design the recursively self-improving GOFAI that will ultimately end up taking over the world" or "there exists some group of alignment researchers that is on track to solve both capabilities and alignment such that they can take over the world and prevent anyone else from ending it" or even "people who thing about alignment are likely to have unusually strong insights about capabilities, relative to people who think mostly about capabilities".

I'm not aware of any arguments that alignment researchers specifically should refrain from publishing that don't have some pretty specific upstream assumptions like the above though.

How to write Pseudocode and why you should

faul_sname5d160

This seems related to one of my favorite automation tricks, which is that if you have some task that you currently do manually, and you want to write a script to accomplish that task, you can write a script of the form

echo "Step 1: Fetch the video. Name it video.mp4";
read; # wait for user input
echo "Step 2: Extract the audio. Name it audio.mp3"
read; # wait for user input
echo "Step 3: Break the audio up into overlapping 30 second chunks which start every 15 seconds. Name the chunks audio.XX:XX:XX.mp3"
read; # wait for user input
echo "Step 4: Run speech-to-text over each chunk. Name the result transcript.XX:XX:XX.srt"
read; # wait for user input
echo "Step 5: Combine the transcript chunks into one output file named transcript.srt"
read; # wait for user input

You can then run your script, manually executing each step before pressing "enter", to make sure that you haven't missed a step. As you automate steps, you can replace "print out what needs to be done and wait for the user to indicate t hat the step has completed" with "do the thing automatically".

Can stealth aircraft be detected optically?

faul_sname5d30

So why isn't this done?

How do we know that optical detection isn't done?

LESSWRONG
LW

Posts

Wiki Contributions

Comments