gmaxwell — LessWrong

OpenTimestamps was created by Petertodd.

This is not a coincidence because nothing is ever a coincidence.

SolidGoldMagikarp (plus, prompt generation)

I think I addressed that specifically in my comment above. The behavior is explained by a sequence like: There is a large amount of bot spammed harassment material, that goes into early GPT development, someone removes it either from reddit or just from the training data not on the basis of it mentioning the targets but based on other characteristics (like being repetitive). Then the tokens are orphaned.

Many of the other strings in the list of triggers look like they may have been UI elements or other markup removed by improved data sanitation.

I know that reddit has removed a very significant number of comments referencing me, since they're gone when I look them up. I hope you would agree that it's odd that the only two obviously human names in the list are people who know each other and have collaborated in the past.

SolidGoldMagikarp II: technical details and more recent findings

gmaxwell3y10

I left a comment in the prior thread giving a wild ass guess on how I and petertodd became GPT3 basilisks.

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation?commentId=JodWY7RvM9ZYdejtt

SolidGoldMagikarp (plus, prompt generation)

gmaxwell3y*120

Hello. I'm apparently one of the GPT3 basilisks. Quite odd to me that two of the only three (?) recognizable human names in that list are myself and Peter Todd, who is a friend of mine.

If I had to take a WAG at the behavior described here, -- both Petertodd and I have been the target of a considerable amount of harassment/defamation/schitzo comments on reddit due commercially funded attacks connected to our past work on Bitcoin. It may be possible that comments targeting us were included in an early phase of GPTn design (e.g. in the tokenizer) but someone noticed an in development model spontaneously defaming us and then expressly filtered out material mentioning use from the training. Without any input our tokens would be free to fall to the center of the embedding, where they're vulnerable to numerical instabilities (leading, e.g. to instability with temp 0.).

AFAIK I've never complained about GPTx's output concerning me (and I doubt petertodd has either), but if the model was spontaneously emitting crap about us at some point of development I could see it getting filtered. It might not have involved targeting us it could have just been a product of improving the filtering (including filtering by Reddit if the data was collected at multiple times-- I believe much of the worst content has been removed by reddit) so that the most common sources of farm generated attack content were no longer in the training.

It's worth noting that GPT3 is perfectly able to talk about me if you ask about "Greg Maxwell" and knows who I am, so I doubt any changes were about me specifically but more likely about specific bad content.

AALWA: Ask any LessWronger anything

gmaxwell10y30

The concerns in this space go beyond personal safety, though that isn't an insignificant one. For safety, It doesn't matter what one can prove because almost by definition anyone who is going to be dangerous is not behaving in an informed and rational way, consider the crazy person who was threatening Gwern. It's also not possible to actually prove you do not own a large number of Bitcoins-- the coins themselves are pseudonymous, and many people can not imagine that a person would willingly part with a large amount of money (or decline to take it in the first place).

No one knows which, if any, Bitcoins are owned by the system's creator. There is a lot of speculation which is know to me to be bogus; e.g. identifying my coins as having belonged to the creator. So even if someone were to provably dispose of all their holdings, there will be people alleging other coins.

The bigger issue is that the Bitcoin system gains much of its unique value by being defined by software, by mechanical rule and not trust. In a sense, Bitcoin matters because its creator doesn't. This is a hard concept for most people, and there is a constant demand by the public to identify "the person in charge". To stand out risks being appointed Bitcoin's central banker for life, and in doing so undermine much of what Bitcoin has accomplished.

Being a "thought leader" also produces significant demands on your time which can inhibit making meaningful accomplishments.

Finally, it would be an act which couldn't be reversed.

[LINK] Transcendence (2014) -- A movie about "technological singularity"

gmaxwell12y20

Now that the movie is out, how would you rate your prediction in hindsight?

Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

gmaxwell12y90

Every ten years or so someone must reinvent Tierra (http://en.wikipedia.org/wiki/Tierra_%28computer_simulation%29).

Harry Potter and the Methods of Rationality discussion thread, part 23, chapter 94

gmaxwell12y00

Well— we're deep in the meta philosophy of a fictional world, so I'm not sure that any great insight will come from the discussion.

I'm unsure of how to resolve the apparent safety of time tuners with the idea that there is an optimization process selecting a permissible outcome unless I wave my arms and say that the optimization process is moral, perhaps borrowing the objectives of the operator (like the sorting hat). One way to do this is to note that bad things happening increase the probability of more time tuner usage, which a human-interest blind metric could still be minimizing.

Seems very handwavy, though: Saying the optimizer picked tie breaking that— say— minimized the sum probability change displaced in times would just tend to select time tuners out of existence.

As far as information itself, I'm not so sure if it's quite that sticky: Imagine our universe as we normally would think of it but with quantized time (tics). We would normally imagine a each tick having a state and then there is some (large but) finite number of successor states possible, each with its own probability which is simply the product of the probabilities of all the component transitions for all the particles. The universe evaluates this function a step at a time moving to a particular new state with probabilities proportional to product the component particle transition probabilities according to natural law.

In HPMOR verse, instead the evaluation gets performed by some hyper-computer that evaluates the states using a six hour look-ahead. You could imagine taking the every possible combination of 6 hour successor states and picking according to their joint probability, then stepping forward one tick towards the selected group and then redoing the evaluation. At least in classical mechanics you don't need the look ahead evaluation but MORverse has time tuners.

As seconds fall out of the tail of the window they become fixed. Prior to that happening time tuner usage upwhen can influence the selected states in the downwhen subject to the constraints that no inconsistency is created. If Minerva noticing harry seemed different would have created some contradiction (due to it influencing into time travel that went into the fixed downwhen) then she simply wouldn't. The picking of the "most likely" way to constrain her from (e.g. having her drop dead) is precluded by having to be consistent with the past history which is already fixed and doesn't include any dangerous interactions.

Stated differently: danger would arise only because of time travel into a past that was fixed before the cause of the danger was available to the evaluator. Unsafe resolutions would tend to not be consistent with the fixed past. So the normalcy of constrained time travel might simply be a result of the forward lookahead and the backwards modification depth being exactly the same.

Harry Potter and the Methods of Rationality discussion thread, part 23, chapter 94

gmaxwell12y30

I'd always just assumed that whatever force imposes the time turner rules just has a simple constraint that no history is permitted where "information" travels back further, and it freely reconfigures things in potentially very high entropy ways ("DO NOT MESS WITH TIME") to achieve that end. Amelia Bones' upon time travel was replace with a spherical null-information amelia bones which had no influence from the future except that which she would not covey— including by choice— to anyone that travels outside the constraint satisfaction window.

So I think there doesn't need to be any special casing of sapience to create the appearance of special casing sapience, beyond anthropic bias— the only time when the reconfiguration to meet the constraint is particularly obvious to a conscious entity is when it interacts with a conscience entity.

Harry Potter and the Methods of Rationality discussion thread, part 16, chapter 85

gmaxwell14y100

are not, as a rule, a different intellectual order than we are

Yes they are— in the sense that they will have decades to spend ruminating on workarounds, experimenting, consulting with others. And when they find a solution the result is potentially an easily transmitted whole class compromise that frees them all at once.

Decades of dedicated human time, teams of humans, etc. are all forms of super-humanity. If you demanded that the same man hours be spent drafting the language as would be spent under its rule, then I'd agree that there was no differential advantage, but then it would be quite a challenge to write the rule.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments