Wiki Contributions

Comments

You contrast the contrarian with the "obsessive autist", but what if the contrarian also happens to be an obsessive autist?

I agree that obsessively diving into the details is a good way to find the truth. But that comes from diving into the details, not anything related to mainstream consensus vs contrarianism. It feels like you're trying to claim that mainstream consensus is built on the back of obsessive autism, yet you didn't quite get there?

Is it actually true that mainstream consensus is built on the back of obsessive autism? I think the best argument for that being true would be something like:

  • Prestige academia is full of obsessive autists. Thus the consensus in prestige academia comes from diving into the details.

  • Prestige academia writes press releases that are picked up by news media and become mainstream consensus. Science journalism is actually good.

BTW, the reliability of mainstream consensus is to some degree a self-defying prophecy. The more trustworthy people believe the consensus to be, the less likely they are to think critically about it, and the less reliable it becomes.

Why is nobody in San Francisco pretty? Hormones make you pretty but dumb (pretty faces don't usually pay rent in SF). Why is nobody in Los Angeles smart? Hormones make you pretty but dumb. (Sincere apologies to all residents of SF & LA.)

Some other possibilities:

  • Pretty people self-select towards interests and occupations that reward beauty. If you're pretty, you're more likely to be popular in high school, which interferes with the dedication necessary to become a great programmer.

  • A big reason people are prettier in LA is they put significant effort into their appearance -- hair, makeup, orthodontics, weight loss, etc.

Then why didn't evolution give women big muscles? I think because if you are in the same strength range as men then you are much more plausibly murderable. It is hard for a male to say that he killed a female in self-defense in unarmed combat. No reason historically to conscript women into battle. Their weakness protects them. (Maybe someone else has a better explanation.)

Perhaps hunter/gatherer tribes had gender-based specialization of labor. If men are handling the hunting and tribe defense which requires the big muscles, there's less need for women to pay the big-muscle metabolic cost.

Another possible risk: Accidentally swallowing the iodine. This happened to me. I was using a squeezable nasal irrigation device, I squirted some of the mixture into my mouth, and it went right down my throat. I called Poison Control, followed their instructions (IIRC they told me to consume a lot of starchy food, I think maybe I took some activated charcoal too), and ended up being fine.

The older get and the more I use the internet, the more skeptical I become of downvoting.

Reddit is the only major social media site that has downvoting, and reddit is also (in my view) the social media site with the biggest groupthink problem. People really seem to dislike being downvoted, which causes them to cluster in subreddits full of the like-minded, taking potshots at those who disagree instead of having a dialogue. Reddit started out as one the most intelligent sites on the internet due to its programmer-discussion origins; the decline has been fairly remarkable IMO. Especially when it comes to any sort of controversial or morality-related dialogue, reddit commenters seem to be participating in a Keynesian beauty contest more than they are thinking.

When I look at the stuff that other people downvote, their downvotes often seem arbitrary and capricious. (It can be hard to separate out my independent opinion of the content from my downvotes-colored opinion so I can notice this.) When I get the impulse to downvote something, it's usually not the best side of me that's coming out. And yet getting downvoted still aggravates me a lot. My creativity and enthusiasm are noticeably diminished for perhaps 24-48 hours afterwards. Getting downvoted doesn't teach me anything beyond just "don't engage with those people", often with an added helping of "screw them".

We have good enough content-filtering mechanisms nowadays that in principle, I don't think people should be punished for posting "bad" content. It should be easy to arrange things so "good" content gets the lion's share of the attention.

I'd argue the threat of punishment is most valuable when people can clearly predict what's going to produce punishment, e.g. committing a crime. For getting downvoted, the punishment is arbitrary enough that it causes a big behavioral no-go zone.

The problem isn't that people might downvote your satire. The problem is that human psychology is such that even an estimated 5% chance of your satire being downvoted is enough to deter you from posting it, since in the ancestral environment social exclusion was asymmetrically deadly relative to social acceptance. Conformity is the natural result.

Specific proposals:

  • Remove the downvote button, and when the user hits "submit" on their post or comment, an LLM reads the post or comment and checks it against a long list of site guidelines. The LLM flags potential issues to the user, and says: "You can still post this if you want, but since it violates 3 of the guidelines, it will start out with a score of -3. Alternatively, you can rewrite it and submit it to me again." That gets you quality control without the capricious-social-exclusion aspect.

  • Have specific sections of the site, or specific times of the year, where the voting gets turned off. Or keep the voting on, but anonymize the post score and the user who posted it, so your opinion isn't colored by the content's current score / user reputation.

This has been a bit of a rant, but here are a couple of links to help point at what I'm trying to say:

  • https://vimeo.com/60898177 -- this Onion satire was made over a decade ago. I think it's worth noting how absurd our internet-of-ubiquitous-feedback-mechanisms seems from the perspective of comedians from the past. (And it is in fact absurd in my view, but it can be hard to see the water you're swimming in. Browsing an old-school forum without any feedback mechanisms makes the difference seem especially stark. The analogy that's coming to mind is a party where everyone's on cocaine, vs a party where everyone is sober.)

  • https://celandine13.livejournal.com/33599.html -- classic post, "Errors vs. Bugs and the End of Stupidity"

If a post gets enough comments that low karma comments can't get much attention, they still compete with new high-quality comments, and cut into the attention for the latter.

Seems like this could be addressed by changing the comment sorting algorithm to favor recent comments more?

If you think prediction markets are valuable it's likely because you think they price things well - probably due to some kind of market efficiency... well why hasn't that efficiency led to the creation of prediction markets...

Prediction markets generate information. Information is valuable as a public good. Failure of public good provision is not a failure of prediction markets.

I suspect the best structure long term will be something like: Use a dominant assurance contract (summary in this comment) to solve the public goods problem and generate a subsidy, then use that subsidy to sponsor a prediction market.

...I mean if you want to do the equivalent of a modern large training run you'll need trillions of tokens of expert-generated text. So that's a million experts generating a million tokens each? So, basically a million experts working full-time for years? So something like a hundred billion dollars minimum just to pay them all, plus probably more for the bureaucratic infrastructure needed to ensure they aren't slacking off or cheating or trying to poison your dataset?

Where are these numbers coming from? They seem way too high. My suggestion is to do a modern large training run in the standard way (next-token prediction), and then fine-tune on experts playing the role of a helpful/honest/harmless chatbot doing CoT. Basically replace RLHF with finetuning on expert chatbot roleplay. Maybe I'm betraying my ignorance here and this idea doesn't make sense for some reason?

I was editing my comment a fair amount, perhaps you read an old version of it?

And, in terms of demonstrating feasibility, you don't need to pay any experts to demonstrate the feasibility of this idea. Just take a bunch of ChatGPT responses that are known to be high quality, make a dataset out of them, and use them in the training pipeline I propose, as though they were written by human experts. Then evaluate the quality of the resulting model. If it's nearly as good as the original ChatGPT, I think you should be good to go.

If I understand correctly, sophisticated steganography will not be incentivized by a language model that's just trained on next-token prediction (no RHLF). Is that right? (Since the model didn't generate any of the text in its context window, there's no method it could use to pass info to itself. The best it could do is pick up on weak-to-nonexistent human-created steganography in its dataset.)

That suggests a simple method to prevent steganography: Instead of having human raters rate RHLF-generated responses, have humans write responses themselves, then finetune on the human-written responses.

That might sound labor-intensive, but I don't think it has to be. Instead of scoring your AI's response, you check if the AI-written response is adequate. If it's adequate, you do nothing. If it's inadequate, you send the question to an expert who writes an expert-level response, then add that response to the finetuning dataset.

Sure, experts are expensive, but so are massive training runs. We know data quality is important at the language corpus stage. Why wouldn't it be equally important at the finetuning stage? (Also, I'm not sure experts are that expensive, given the glut of PhDs in many fields. And expertise in one field could transfer to others, if you split the expert task into 2 steps: identification of high-quality sources, and use of those sources to answer the user's query.)

In any case, I imagine it should be possible to detect steganography to an extent, by adding noise to the contents of the context window and checking if that causes unexpectedly high performance degradation. You could have a second AI that creates a paraphrased view of everything in the context window except the most recent 20 words or so -- I expect that's a massive blow to steganography in practice. Note that the cost of such added-noise schemes only needs to be paid at training time, so they can be expensive -- e.g. ensembling across the next-token generated from many different distortions of the current text.

Sorry, I think I must have misunderstood your comment. When you wrote:

As for LLM agents with weak forward passes: Yes, if we could achieve robust faithful CoT properties, we'd be in pretty damn good shape from an AI control perspective.

I interpreted this to mean that in addition to weak forward passes, there was another thing called "robust faithful CoT properties" that would also need to be achieved.

I now think you meant to indicate that "weak forward passes" was a particular method for achieving "robust faithful CoT properties".

Thanks a lot for the reply, this is valuable info.

From my perspective, unlike the OP, you seem to generally know what you are doing.

I appreciate the kind words, but I've made no systematic effort to acquire knowledge -- everything I posted in this thread is just bits and pieces I picked up over the years.

As you can see from elsewhere in this thread, I suspect I might have given myself an internal injury about a month ago from doing deep tissue massage, likely due to being on a low dose of an anticoagulant supplement (nattokinase).

However, I do think this sort of injury is generally rare. And my health would be in far worse shape if it wasn't for massage.

Load More