Wiki Contributions


Why do you expect Bitcoin to be excepted from being labelled a security along with the rest? 
(Apologies if the answer is obvious to those who know more about the subject than me, am just genuinely curious)

Had a similar medical bill story from when I was a poor student: Medical center told me that insurance would cover an operation. They failed to mention that they were only talking about the surgeon's fee; the hospital at which they arranged the operation was out-of-network and I was stuck with 50% of the facility's costs. I explained my story to the facility. They said I still had to pay but that a payment plan would be possible, and that I could start by paying a small amount each month. I took that literally and just started paying a (very) small amount monthly. At some point they called back to tell me to formally arrange a payment plan through their online portal, which gave me options with such high interest rates that there was no way my future earnings would increase at a fast enough rate to make a payment plan make any sense whatsoever. I called back and explained this, and said that if those were the only options I guess I would just have to try to scrape the money together now, and that I was prepared to try to do this. The administrator, bless her heart, asked me to hold for awhile, and eventually came back to say "I've spoken with my colleagues, and your current balance owed to us is now zero dollars". 

This (along with a few other experiences in my life) has underscored how sometimes an apparently immovable constraint can evaporate if you can manage to talk to the right person. That said, I felt very lucky to have been taken pity on in this way -- I feel like having one's balance explicitly zeroed out in this way is rare! But it's interesting to hear that Zvi knows of cases where someone just didn't pay, with no consequences. I would have assumed that they'd normally report nonpayers to credit agencies and crater their credit scores after long enough, as it costs them nothing or almost nothing to do so. Would be interested either to hear other people's anecdotes of what happened after nonpayment of a large hospital bill (positive or negative), or to see data on this if anyone knows of any.

I was using medical questions as just one example of the kind of task that's relevant to sandwiching. More generally, what's particularly useful for this research programme are

  • tasks where we have "models which have the potential to be superhuman at [the] task", and "for which we have no simple algorithmic-generated or hard-coded training signal that’s adequate"; and
  • for which there is some set of reference humans who are currently better at the task than the model;
  • and for which there is some set of reference humans for whom the task is difficult enough that they would have trouble even evaluating/recognizing good performance. (you also want this set of reference humans to be capable of being helped to evaluate/recognize good performance in some way)

Prime examples are task types that require some kind of niche expertise to do and evaluate. Cotra's examples involve "[fine-tuning] a model to answer long-form questions in a domain (e.g. economics or physics) using demonstrations and feedback collected from experts in the domain", "[fine-tuning] a coding model to write short functions solving simple puzzles using demonstrations and feedback collected from expert software engineers", "[fine-tuning] a model to translate between English and French using demonstrations and feedback collected from people who are fluent in both languages". I was just making the point that Surge can help with this kind of thing in some domains (coding), but not in others.

It's worth knowing that there are some categories of data that Surge is not well positioned to provide. For example, while they have a substantial pool of participants with programming expertise, my understanding from speaking with a Surge rep is that they don't really have access to a pool of participants with (say) medical expertise -- although for small projects it sounds like they are willing to try to see who they might already have with relevant experience in their existing pool of 'Surgers'. This kind of more niche expertise does seem likely to become increasingly relevant for sandwiching experiments. I'd be interested in learning more about companies or resources that can help collect RLHF data from people with uncommon (but not super-rare) kinds of expertise for exactly this reason.

I did Print to PDF in Word after formatting my Word document to look like a standard LaTeX-exported document, it had no problem going through! But might depend on the particular moderator. 

Sounds a little like StarWeb? Recently read a lovely article about a similar but different game, Monster Island, which was a thing from 1989 to 2017.

But yes, my default assumption would be that the particular conversation you're referring to never resulted in a game that saw the light of day; I've seen many detailed game design discussions among people I've known meet the same fate.

Thanks, I agree that's a better analogy. Though of course, it isn't necessary that none of the employees (participants in a sandwiching project) are unaware of the CEO's (sandwiching project overseer's) goal; I was only highlighting that they need not necessarily be aware of it in order to make it clear that the goals of the human helpers/judges aren't especially relevant to what sandwiching, debate, etc. is really about. But of course if it turns out that having the human helpers know what the ultimate goal is helps, then they're absolutely allowed to be in on it...

Perhaps this is a bit glib, but arguably some of the most profitable companies in the mobile game space have essentially built product assembly lines to churn out fairly derivative games that are nevertheless unique enough to do well on the charts, and they absolutely do it by factoring the project of "making a game" into different bits that are done by different people (programmers, artists, voice actors, etc.), some of whom might not have any particular need to know what the product will look like as a whole to play their part. 

However, I don't want to press too hard on this game example as you may or may not consider this 'cognitive work' and as it has other disanalogies with what we are actually talking about here. And to a certain degree I share your intuition that factoring certain kinds of tasks is probably very hard: if it wasn't, we might expect to see a lot more non-manufacturing companies whose employee main base consists of assembly lines (or hierarchies of assembly lines, or whatever) requiring workers with general intelligence but few specialized rare skills, which I think is the broader point you're making in this comment. I think that's right, although I also think there are reasons for this that go beyond just the difficulty of task factorization, and which don't all apply in the HCH etc. case, as some other commenters have pointed out.

We start with some ML model which has lots from many different fields, like GPT-n. We also have a human who has a domain-specific problem to solve (like e.g. a coding problem, or a translation to another language) but lacks the relevant domain knowledge (e.g. coding skills, or language fluency). The problem, roughly speaking, is to get the ML model and the human to work as a team, and produce an outcome at-least-as-good as a human expert in the domain. In other words, we want to factorize the “expert knowledge” and the “having a use-case” parts of the problem.
This sort of problem comes up all the time in real-world businesses. We could just as easily consider a product designer at a tech startup (who knows what they want but little about coding), an engineer (who knows lots about coding but doesn't understand what the designer wants)...

These examples conflate "what the human who provided the task to the AI+human combined system wants" with "what the human who is working together with the AI wants" in a way that I think is confusing and sort of misses the point of sandwiching. In sandwiching, "what the human wants" is implicit in the choice of task, but the "what the human wants" part isn't really what is being delegated or factored off to the human who is working together with the AI; what THAT human wants doesn't enter into it at all. Using Cotra's initial example to belabor the point: if someone figured out a way to get some non-medically-trained humans to work together with a mediocre medical-advice-giving AI in such a way that the output of the combined human+AI team is actually good medical advice, it doesn't matter whether those non-medically-trained humans actually care that the result is good medical advice; they might not even individually know what the purpose of the system is, and just be focused on whatever their piece of the task is - say, verifying the correctness of individual steps of a chain of reasoning generated by the system, or checking that each step logically follows from the previous, or whatever. Of course this might be really time intensive, but if you can improve even slightly on the performance of the original mediocre system, then hopefully you can train a new AI system to match the performance of the original AI+human system by imitation learning, and bootstrap from there.

The point, as I understand it, is that if we can get human+AI systems to progress from "mediocre" to "excellent" (in other words, to remain aligned with the designer's goal) -- despite the fact that the only feedback involved is from humans who wouldn't even be mediocre at achieving the designer's goal if they were asked to do it themselves -- and if we can do it in a way that generalizes across all kinds of tasks, then that would be really promising. To me, it seems hard enough that we definitely shouldn't take a few failed attempts as evidence that it can't be done, but not so hard as to seem obviously impossible.

I just shared this info with an immune-compromised relative, thanks so much for this.

When I see young healthy people potentially obsessing, turning life into some sort of morbid probability matrix because one particular potential risk (Long Covid) has been made more salient and blameworthy, I sympathize a lot less. 


ONS's latest survey finds 2.8% of the UK population report that they are currently experiencing long COVID symptoms: 67% of that 2.8% report that the symptoms adversely affect their day-to-day activities. Separately, they've estimated that 70% of England has had COVID at least once; weighting their estimates for England/Scotland/Wales/NI suggests about 68% of the UK has had it. So conditional on having caught COVID at least once, we have ~3% of the population experiencing symptoms that adversely affect day-to-day activities for at least a month and often much longer.  (Table 7 of the associated dataset implies that for each individual symptom, well over half have been experiencing those symptoms for "at least 12 weeks", which is consistent with Fig 3 in this earlier survey.).

Anyway, if every time or few times that I catch COVID equates to a ~3% chance of long covid that adversely affects my day-to-day activities for a long time, for me that's high enough that it justifies having categories of things that I do less often than I used to, categories of things that I do while masked, and categories that I do with no precautions. We don't generally go around criticizing people for "obsessing" when they take other slightly inconvenient actions to mitigate other low-probability risks (wearing seatbelts; having a diet composed of more healthy-but-less-delicious than unhealthy-and-more-delicious foods; cutting down on alcohol; etc.). So this constant criticism of people who are choosing to make changes to reduce their long COVID risk does rub me the wrong way.

Load More