Wiki Contributions


BMI isn't fatness. BMI is just weight / (height^2), which means that BMI grows linearly in proportion to a human's scale. (Weight grows with the cube of dimensions but height^2 only grows with, well, the square.)

I used to believe this. I currently think this is aesthetically correct (ish?) but not biologically. Specifically I think there are greater health premiums for being thin when you're taller.

I weakly think 

1) ChatGPT is more deceptive than baseline (more likely to say untrue things than a similarly capable Large Language Model trained only via unsupervised learning, e.g. baseline GPT-3)

2) This is a result of reinforcement learning from human feedback.

3) This is slightly bad, as in differential progress in the wrong direction, as:

3a) it differentially advances the ability for more powerful models to be deceptive in the future

3b) it weakens hopes we might have for alignment via externalized reasoning oversight.


Please note that I'm very far from an ML or LLM expert, and unlike many people here, have not played around with other LLM models (especially baseline GPT-3). So my guesses are just a shot in the dark.
From playing around with ChatGPT, I noted throughout a bunch of examples is that for slightly complicated questions, ChatGPT a) often gets the final answer correct (much more than by chance), b) it sounds persuasive and c) the explicit reasoning given is completely unsound.

No description available.

Anthropomorphizing a little, I tentatively advance that ChatGPT knows the right answer, but uses a different reasoning process (part of its "brain") to explain what the answer is. 
I speculate that while some of this might happen naturally from unsupervised learning on the internet, this is differentially advanced (made worse) from OpenAI's alignment techniques of reinforcement learning from human feedback. 

[To explain this, a quick detour into "machine learning justifications." I remember back when I was doing data engineering ~2018-2019 there was a lot of hype around ML justifications for recommender systems. Basically users want to know why they were getting recommended ads for e.g. "dating apps for single Asians in the Bay" or "baby clothes for first time mothers." It turns out coming up with a principled answer is difficult, especially if your recommender system is mostly a large black box ML system. So what you do is instead of actually trying to understand what your recommender system did (very hard interpretability problem!), you hook up a secondary model to "explain" the first one's decisions by collecting data on simple (politically safe) features and the output. So your second model will give you results like "you were shown this ad because other users in your area disproportionately like this app." 

Is this why the first model showed you the result? Who knows? It's as good a guess as any.( In a way, not knowing what the first model does is a feature, not a bug, because the model could train on learned proxies for protected characteristics but you don't have the interpretability tools to prove or know this). ]

Anyway, I wouldn't be surprised if something similar is going on within the internals of ChatGPT. There are incentives to give correct answers and there are incentives to give reasons for your answers, but the incentives for the reasons to be linked to your answers is a lot weaker.

One way this phenomenon can manifest is if you have MTurkers rank outputs from ChatGPT. Plausibly, you can have human raters downrank it for both a) giving inaccurate results and b) for giving overly complicated explanations that don't make sense. So there's loss for being wrong and loss for being confusing, but not for giving reasonable, compelling, clever-sounding explanations for true answers. Even if the reasoning is garbage, which is harder to detect.



Why does so-called "deception" from subhuman LLMs matter? In the grand scheme of things, this may not be a huge deal, however:

  1. I think we're fine now, because both its explicit and implicit reasoning are probably subhuman. But once LLMs' reasoning ability is superhuman, deception may be differentially easier for the RLHF paradigm compared to the pre-RLHF paradigm. RLHF plausibly selects for models with good human-modeling/-persuasion abilities, even relative to a baseline of agents that are "merely" superhuman at predicting internet text.
  2. One of the "easy alignment" hopes I had in the past was based on a) noting that maybe LLMs are an unusually safe baseline, and b) externalized oversight of "chain-of-thought" LLMs. If my theory for how ChatGPT was trained was correct, I believe RLHF moves us systematically away from externalized reasoning being the same reasoning process as the process that the model internally uses to produce correct answers. This makes it harder to do "easy" blackbox alignment. 


What would convince me that I'm wrong?

1. I haven't done a lot of trials or played around with past models so I can be convinced that my first conjecture "ChatGPT is more deceptive than baseline" is wrong. For example, if someone conducts a more careful study than me and demonstrates that (for the same level of general capabilities/correctness) ChatGPT is just as likely to confublate explanations as any LLM trained purely via unsupervised learning. (An innocent explanation here is that the internet has both many correct answers to math questions and invalid proofs/explanations, so the result is just what you expect from training on internet data. )

2. For my second conjecture ("This is a result of reinforcement learning from human feedback"), I can be convinced by someone from OpenAI or adjacent circles explaining to me that ChatGPT either isn't trained with anything resembling RLHF, or that their ways of doing RLHF is very different from what I proposed.

3. For my third conjecture, this feels more subjective. But I can potentially be convinced by a story for why training LLMs through RLHF is more safe (ie, less deceptive) per unit of capabilities gain than normal capabilities gain via scaling. 

4. I'm not an expert. I'm also potentially willing to generally defer to expert consensus if people who understands LLMs well think that the way I conceptualize the problem is entirely off.

Apologies if I misunderstood your argument.

  1. You open your argument with saying, in essence, that people hate their jobs and are miserable at their jobs.
  2. You then argue that socialist firms are better for employees.
  3. The logical inference here would be that socialist firms make for happier employees.

Is this a reasonable summary?

However, in the "Are socialist firms good for employees?" section, you do not give much evidence that workers in socialist firms are significantly happier. Instead, the paragraph says things like:

Giving employees stock in a company seems to boost their performance.[33] Research has shown that employees getting more ownership of the company is associated with higher trust, perception of fairness, information sharing and cooperation.[34] There seems to be a small increase in companywide productivity[33], while employee retention is boosted.[35] Perhaps capitalist firms could slowly be eased into becoming socialist firms by first giving the employees more stakes in the company and then expanding their participation rights.

These things, while maybe valuable, does not give that much overall evidence for life satisfaction or happiness.

Overall I am not sold that your argument-as-stated is sound, even if every individual piece of evidence is true.

Are you a Chinese citizen? If so, getting a programming job in the West without a degree in CS or related fields might be hard, visa-wise. The default way through is a Master's degree, but there are probably ways to hack this (e.g. get a job in a multinational with offices in China, transfer to the US).

From what I've heard, SBF was controlling, and fucked over his initial (EA) investors as best he could without sabotaging his company, and fucked over parts of the Alameda founding team that wouldn't submit to him.

Woah, I did not hear about this despite trying nontrivially hard to figure out what happened when I was considering whether to take a job there in mid-late 2019 (and also did not hear about it afterwards). I think I would've made pretty different decisions both then and afterwards if I had the correct impression.

Specifically, I knew about the management team leaving in early 2018 (and I guess "fucked over" framing was within my distribution but I didn't know the details). I did not in any way know about fucking over the investors.

Are LLMs advanced enough now that you can just ask GPT-N to do style transfer? 

Distributional shift: The worry is precisely that capabilities will generalize better than goals across the distributional shift. If capabilities didn't generalize, we'd be fine. But as the CoinRun agent examplifies, you can get AIs that capably pursue a different objective after a distributional shift than the one you were hoping for. One difference to deception is that models which become incompetent after a distributional shift are in fact quite plausible. But to the extent that we think we'll get goal misgeneralization specifically, the underlying worry again seems to be that capabilities will be robust while alignment will not.

One thing to flag is that even if for any given model, the probability of capabilities generalizing is very low, total doom can still be high, since there might be many tries at getting models that generalize well across distributional shifts, whereas the selection pressures to getting alignment robustness is comparably weaker. You can imagine a 2x2 quadrant of capabilities vs alignment generalizability across distributional shift:

Capabilities doesn't generalize, alignment doesn't: irrelevant

Capabilities doesn't generalize, alignment does: irrelevant

Capabilities generalizes, alignment doesn't: potentially very dangerous, especially if power-seeking. Agent (or agent and friends) acquires more power and may attempt a takeover.

Capabilities generalizes, alignment does: Good, but not clearly great. By default I won't expect it to be powerseeking (unless you're deliberately creating a sovereign), so it only has as much power as humans allow it to have. So the AI might risk being outcompeted by their more nefarious peers.

I got an okay answer in-person but I'm still not fully convinced. The argument was roughly that fairly high level of intelligence developed independently (octopodes, humans. birds, etc). So you might expect that getting to a neuron is hard but after you have neurons evolution "got lucky" multiple times.

That said this was only briefly discussed in person and I might be misunderstanding their position drastically. 

Use a very large (future) multimodal self-supervised learned (SSL) initialization to give the AI a latent ontology for understanding the real world and important concepts. Combining this initialization with a recurrent state and an action head, train an embodied AI to do real-world robotics using imitation learning on human in-simulation datasets and then sim2real. Since we got a really good pretrained initialization, there's relatively low sample complexity for the imitation learning (IL). The SSL and IL datasets both contain above-average diamond-related content, with some IL trajectories involving humans navigating towards diamonds because the humans want the diamonds.

I don't know much about ML, and I'm a bit confused about this step. How worried are we/should we be about sample efficiency here? It sounds like after pre-training you're growing the diamond shard via a real-world embedded RL agent? Naively this would be pretty performance uncompetitive compared to agents primarily trained in simulated worlds, unless your algorithm is unusually sample efficient (why?). If you aren't performance competitive, then I expect your agent to be outcompeted by stronger AI systems with trainers that are less careful about diamond (or rubies, or staples, or w/e) alignment. 

OTOH if your training is primarily simulated, I'd be worried about the difficulty of creating an agent that terminally values real world (rather than simulated) diamonds. 

This is a shotgun comment, and sorry if I'm being very ignorant here (I've stopped following most covid science-y stuff in the last year+), but:

The evidence for ivermectin, while poor, is about as reasonable as that for any repurposed covid drug (e.g. fluvoxamine), and even purpose-made ones (plaxovid) [emphasis mine]

Aren't the effects of Paxlovid pretty extreme? 

  1. In studies, mortality benefits large enough (>10x?) it really slams you in the face. 
  2. My understanding is that in 2022, covid mortality has gotten lower in the general population, even among unvaccinated people. 

My understanding is that you don't have effects nearly this large and well-studied for ivermectin on covid. If anything (feel free to correct me!) countries that use ivermectin a lot have pretty high covid mortality, though of course there are many confounders.

Load More