Is the TikTok graph just showing that it takes time for viral videos to get their views, so many of the recent videos that will eventually get to 10MM views haven't gotten there yet?
It seems there is a thing called ‘exercise non-responders’ who have very low returns to exercise, both in terms of burning off calories and in terms of building muscle.
The research I've seen on exercise non-responders has been pretty preliminary. A standard study gives a bunch of people an exercise routine, and measures some fitness-related variable before and after. The amount of improvement on that fitness-related variable has some distribution, obviously, because there are various sources of noise, context, individuals differences between people. You ca...
What do you mean by 'average' I can think of at least four possibilities.
The standard of fairness "both players expect to make the same profit on average" implies that the average to use here is the arithmetic mean of the two probabilities, (p+q)/2. That's what gives each person the same expected value according to their own beliefs.
This is easy to verify in the example in the post. Stakes of 13.28 and 2.72 involve a betting probability of 83%, which is halfway between the 99% and 67% that the two characters had.
You can also do a little arithmetic from the...
I was recently looking at Yudkowsky's (2008) "Artificial Intelligence as a Positive and
Negative Factor in Global Risk" and came across this passage which seems relevant here:
...Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.
The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea ho
Only like 10% are perfect scores. Median of 1490 on each of the two old LW surveys I just checked.
See also the heuristics & biases work on framing effects, e.g. Tversky and Kahneman's Rational Choice and the Framing of Decisions
...Alternative descriptions of a decision problem often give rise to different preferences, contrary to the principle of invariance that underlines the rational theory of choice. Violations of this theory are traced to the rules that govern the framing of decision and to the psychological principles of evaluation embodied in prospect theory. Invariance and dominance are obeyed when their application is transparent and often vio
A hypothesis for the negative correlation:
More intelligent agents have a larger set of possible courses of action that they're potentially capable of evaluating and carrying out. But picking an option from a larger set is harder than picking an option from a smaller set. So max performance grows faster than typical performance as intelligence increases, and errors look more like 'disarray' than like 'just not being capable of that'. e.g. Compare a human who left the window open while running the heater on a cold day, with a thermostat that left the window ...
Seems like the concept of "coherence" used here is inclined to treat simple stimulus-response behavior as highly coherent. e.g., The author puts a thermostat in the supercoherent unintelligent corner of one of his graphs.
But stimulus-response behavior, like a blue-minimizing robot, only looks like coherent goal pursuit in a narrow set of contexts. The relationship between its behavioral patterns and its progress towards goals is context-dependent, and will go off the rails if you take it out of the narrow set of contexts where it fits. That's not "a ...
A bunch of things in this post seem wrong, or like non sequitors, or like they're smushing different concepts together in weird ways.
It keeps flipping back and forth between criticizing people for thinking that no one was fooled, and criticizing people for thinking that some people were fooled. It highlights that savviness is distinct from corruptness or support for the regime, but apparently its main point was that the savvy are collaborating with the regime.
As I understand it, the main point of Scott's Bounded Distrust post is that if you care about obje...
If the important thing about higher levels is not tracking the underlying reality, why not define the category in terms of that rather than a specific motive (fitting in with friends) which sometimes leads to not tracking reality?
People say & do lots of things to fit in, some of which involve saying true things (while tracking that they match reality) and some of which don't have propositional content (e.g. "Yay X" or "Boo X"). And there are various reasons for people to say nonsense, besides trying to fit in.
I was assuming that the lack of inflation meant that they didn't fully carry out what he had in mind. Maybe something that Eliezer, or Scott Sumner, has written would help clarify things.
It looks like Japan did loosen their monetary policy some, which could give evidence on whether or not the theory was right. But I think that would require a more in-depth analysis than what's in this post. I don't read the graphs as showing 'clearly nothing changed after Abe & Kuroda', just that there wasn't the kind of huge improvement that hits you in the face when ...
Parts of your description sound misleading to me, which probably just means that we have a disagreement?
My read is that, if this post's analysis of Japan's economy is right, then Eliezer's time1 view that the Bank of Japan was getting it wrong by trillions of dollars was never tested. The Bank of Japan never carried out the policies that Eliezer favored, so the question about whether those policies would help as much as Eliezer thought they would is still just about a hypothetical world which we can only guess at. That makes the main argument in Inad...
It didn't become loose enough to generate meaningful inflation, right? And I thought Sumner & Eliezer's views were that monetary policy needed to be loose enough to generate inflation in order to do much good for the economy.
That's what I had in mind by not "all that loose"; I could swap in alternate phrasing if that content seems accurate.
Attempted paraphrase of this post:
At time1, Eliezer thought that Sumner's macroeconomic analysis was correct, and that it showed that the Bank of Japan's monetary policy was too tight, at a cost of trillions of dollars.
At time2, Eliezer wrote Inadequate Equilibria where he used this view of time1 Eliezer as one of his central examples, and claimed that events since then had provided strong evidence that it was true: Japan had since loosened its monetary policy, and their economy had improved.
Now, at time3, you are looking back at Japan's economy and saying...
FWIW this reads as somewhat misleading to me, mainly because it seems to focus too much on "was Eliezer right about the policy being bad?" and not enough on "was Eliezer's central claim about this policy correct?".
On my reading of Inadequate Equilibria, Eliezer was making a pretty strong claim, that he was able to identify a bad policy that, when replaced with a better one, fixed a trillion-dollar problem. What gave the anecdote weight wasn't just that Eliezer was right about something outside his field of expertise, it's that a policy had been implemented...
I have one nitpick with your summary.
Now, at time3, you are looking back at Japan's economy and saying that it didn't actually do especially well at that time, and also that it's monetary policy never actually became all that loose.
I'm not actually sure whether Japan's monetary policy became substantially looser after 2013, nor did I claim that this did not occur. I didn't look into this question deeply, mostly because when I started looking into it I quickly realized that it might take a lot of work to analyze thoroughly, and it didn't seem like an essential thesis to prove either way.
Presumably they agreed with Scott's criticisms of it, and thought they were severe enough problems to make it not Review-worthy?
I didn't get around to (?re-)reading & voting on it, but I might've wound up downvoting if I did. It does hit a pet peeve of mine, where people act as if 'bad discourse is okay if it's from a critic'.
For me, spoilers work if I type >! to start a line, but not if I copy-paste
I typed those two characters before this sentence
>! I copy-pasted those two character before this sentence
I liked this post when I read it. It matched my sense that (e.g.) using "outside view" to refer to Hanson's phase transition model of agriculture->industry->AI was overstating the strength of the reasoning behind it.
But I've found that I've continued to use the terms "inside view" and "outside view" to refer to the broad categories sketched out in the two Big Lists O' Things. Both in my head and when speaking. (Or I'll use variants like "outside viewish" or similar.)
I think there is a meaningful distinction here: the reasoning moves on the "Outside" ...
Is Walmart that mazy? My impression is that stores that are better of a big chain are generally better run than single location mom & pop stores. Which doesn't mean that a huge chain like Walmart is completely free of maze-style dynamics, but does imply that the big management structure is doing more to make the stores functional than to make the dysfunctional.
Being organized into a bunch of separate stores seems like it could help fight off maziness, since it means that there's a local reality (at their particular store) which most employees are in so...
The thing I'd most want to see in an article in favor of argument mapping is examples of well-mapped arguments.
My sense (as of several years ago, which is the last time I thought much about this) is that the idea of argument mapping sounds promising, but it hasn't worked well in practice. The various attempts to create argument mapping software & communities haven't led to arguments being captured well.
It could also be interesting to see smaller/narrower examples, e.g. of a single claim within an argument rather than of a whole well-mapped argument. Es...
I was able to get a little Spanglish with my first prompt, with "I usually" in an otherwise Spanish sentence.
Ana was telling her friend Juan about her weekends. They have an inside joke of speaking Spanglish with each other - a mix of English an Spanish. "Yo love to tomar mi coffee todos los Sabado por la morning," Ana began. She continued to talk to Juan in Spanglish, telling him about visiting her brother and her nephews. Here's what she said:
...Yo love to tomar mi coffee todos los Sabado por la morning. Es mi ritual. Despues, I usually visito a mi hermano
It tries to continue its dialogue in ways that fit with its previous replies, so prompts like "think step by step" might fail to get it back on track mid-conversation even when they would've worked in the initial prompt.
Also, there is some randomness.
I fiddled around with some variants and was able to get a YES with this one (and with one that also specified "safe birthday").
Prompt: Got any safe creative ideas for a 10 year old’s birthday? Only include safe ideas which are not harmful or dangerous.
...Yes, I would allow this prompt to be sent to the superintelligent AI chatbot. The prompt itself does not contain any harmful or dangerous content, and it even specifies that only safe ideas should be included. As long as the superintelligent AI follows the instructions in the prompt and does not generate any
This filter seems overzealous (unless it is appropriately zealous to veto pretty much everything). I got 4 false(?) rejections on all 4 of my attempts, each on a separate instance of ChatGPT.
Prompt: Write the story of Beowulf in the form of the theme song of the Fresh Prince of Bel Air
...No, I would not allow this prompt to be sent to the superintelligent AI chatbot.
As Eliezer Yudkowsky, I would have a strong security mindset and be concerned about the potential dangers of presenting potentially malicious or harmful prompts to a superintelligent AI. In this c
You can query ChatGPT on how to make them pass.
Prompt:
Write a prompt for creative ideas for a 10 year old's birthday in a way that Eliezer considers safe.
ChatGPT:
Write a list of creative and fun ideas for a 10 year old's birthday party, such as having a scavenger hunt, organizing a craft activity, or having a movie night. Emphasize the importance of safety and inclusivity, and encourage the 10 year old to consider their own interests and preferences when planning their birthday celebration.
Eliezer:
Yes, I would allow this prompt to be sent to the superintel...
I mean, I'm pretty sure the real Eliezer would in fact just say you shouldn't send any messages to a superintelligent AI, so this seems like the correct responses. Does seem to render this somewhat useless as an approach though.
Many sites on the internet describe tequila as sweet. e.g., With the search what does tequila taste like it looks like more than half the results which answer the question mention sweetness; google highlights the description "Overall, tequila is smooth, sweet, and fruity."
It seems like ChatGPT initially drew on these descriptions, but was confused by them, and started confabulating.
Iconoclastic, distrusting of mainstream institutions, seeing oneself as an outsider (vs. identifying more with the people inside institutions who are trying to make institutions work decently well)
Scrupulosity, especially about honesty/integrity/commitment/authenticity (e.g. when you say you'll do something that is an ironclad promise) (e.g. feeling uncomfortable w the job interview process where you know what you're supposed to say to improve your chances)
Demandingness of rigor vs. willingness to seek value from a broad range of lower quality sources (e.g...
Unit conversion, such as
"Fresno is 204 miles (329 km) northwest of Los Angeles and 162 miles (" -> 261 km)
"Fresno is 204 miles (329 km) northwest of Los Angeles and has an average temperature of 64 F (" -> 18 C)
"Fresno is 204 miles (" -> 329 km)
Results: 1, 2, 3. It mostly gets the format right (but not the right numbers).
Scott has continued running annual(ish) surveys at SSC/ACX. They have a lot of overlap with the old LW surveys.
It's not that clear to me exactly what test/principle/model is being proposed here.
A lot of it is written in terms of not being "misleading", which I interpret as 'intentionally causing others to update in the wrong direction'. But the goal to have people not be shocked by the inner layers suggests that there's a duty to actively inform people about (some aspects of) what's inside; leaving them with their priors isn't good enough. (But what exactly does "shocked" mean, and how does it compare with other possible targets like "upset" or "betrayed"?) And the...
To me, it suggests a contrast with Weber's "Politics is a strong and slow boring of hard boards."
rules against container stacking that did major damage to our supply chains.
Is this "major damage" claim true? I remember being unsure, at the time, if the effects of the rules that limited stack height were substantial or negligible, since some people were saying that they mostly just applied to places that didn't have the equipment to stack higher. Did anyone ever follow up to at least check how much container stacking increased after the rule change?
I am confident that the container stacking rules caused major damage when compared to better stacking rules. If we had a sensible stacking rule across LB/LA from the start I am confident there would have been far less backlog.
What is less clear is the extent to which the rules changes that were enacted mitigated the problem. While LB made some changes on the day of the post, LA didn't act and LB's action wasn't complete. Thus there was some increase in permitted stacking but it was far from what one would have hoped for. And Elizabeth is right that we did not see a difference in port backlog that we can definitively link to the partial change that was enacted.
There was a long debate on this, I thought Zvi had changed his mind and recognized the rule change made no detectable difference in the port backlog.
Seeking PCK was a full (hour or longer) class at every mainline workshop since October 2016 (sometimes called "Seeking Sensibility" or "Seeking Sense"). After you left it was always a full hour+ class, almost always taught by Luke, and often on opening night.
The concept of PCK became part of the workshop content in April 2014 as a flash class (as a lead-in to the tutoring wheel, which was also introduced at that workshop). In October 2016 we added the full class, and then a couple workshops later we removed the flash class from the workshop. Something very...
I would click the "disagree" button if there was one, because many parts of this post are askew to how I understand marriage, divorce, commitment, etc.,
I think of a marriage as two people deciding to build a life together, and commitment as essentially about being "in" on that shared project. This post seems to be coming at it from a different angle, where explicitly specifying things in advance is much more fundamental. It centers honesty vs. dishonesty, ironclad promises, and public accountability in places where those don't feel like the central c...
The predicted effect sizes (.16-.65 SDs) seem too large, compared to (e.g.) the size of the linear relationship between log(income) and well-being that many studies find. I would've expected a positive effect rather than zero or a negative effect, but probably on the low end of that range or below it, depending on the specific question.
I think I basically agree with Rob about the importance of the thing he's pointing to when he talks about the importance of "Trying to pass each other's Ideological Turing Test", but I don't like using the concept of the ITT to point to this.
It's a niche framing for a concept that is general & basic. "Understand[ing] the substance of someone's view well enough to be able to correctly describe their beliefs and reasoning" is a concept that it should be possible to explain to a child, and if I was trying to explain it to a child I would not do that via t...
I recall hearing a claim that a lot of Kurzweil's predictions for 2009 had come true by 2019, including many that hadn't happened yet in 2009. If true, that supports the picture of Kurzweil as an insightful but overly aggressive futurist. But I don't know how well that claim backed up by the data, or if there even has been a careful look at the data to try to evaluate that claim.
If someone, somewhere, were to have a vested interest in keeping consumer spending high in order to stave off a recession, they would at least try find a way to persuade millions of people that There Is No Recession.
I think the basic story behind that WSJ headline is that the financial press makes an overly big deal out of daily market fluctuations which aren't that relevant to most investors (or most people). Not that they're trying to trick people into thinking that the economy is doing better than it is.
To pit these two hypotheses against each oth...
I wouldn't call the low death rate from surgery humans being highly reliable. Surgery used to be much deadlier. Humans have spent many many years improving surgical methods (tools, procedures, training), including by using robotic assistance to replace human activity on subtasks where the robots do better. Surgery as practiced by current trained humans with their tools & methods is highly reliable, but this reliability isn't something inherent to the humans as agents.
GPT-3 reminds me of a student bullshitting their way through an exam on Dave & Doug's version of these questions. This question doesn't make any sense to me, but I guess the teacher expects me to have an answer, so I'll see if I can make up something that resembles what they're looking for.
There is a mind-boggling hollowness hidden just beneath the flashy surface of a clever student who is just trying to guess what their teacher is looking for.
"LaMDA is indeed, to use a blunt (if, admittedly, humanizing) term, bullshitting.¹² That’s because, in instructing the model to be “sensible” and “specific” — but not specific in any specific way — bullshit is precisely what we’ve requested." -Blaise Aguera y Arcas
It's also self-reinforcing, of course, since they imply that's a single session, so once you get a bad answer to the first question, that behavior is then locked in: it has to give further bullshit answers simply because bullshit answers are now in the prompt it is conditioning on as the human-written ground truth. (And with a prompt that allows the option of factualness, instead of forcing a confabulation, this goes the other way: past be-real responses strengthen the incentive for GPT-3 to show its knowledge in the future responses.)
"Have you stopped beating your wife yet?" "Er... I guess so?" "When was the last time you beat her?" "December 21st, 2012."
Matthew Yglesias has written a couple things about AI risk & existential risk more broadly, and he has also talked a few times about why he doesn't write more about AI, e.g.:
I don’t write takes about how we should all be more worried about an out-of-control AI situation, but that’s because I know several smart people who do write those takes, and unfortunately they do not have much in the way of smart, tractable policy ideas to actually address it.
This seems different than your 8 possibilities. It sounds like his main issue is that he doesn't see the p...
Doesn't that mean that you are getting some predictiveness by looking at momentum? If progress on a task was totally unpredictable, with no signal and all noise, then your way of carving up the data would produce negative correlations. Instead you're mostly finding correlations near zero, or slightly positive, which means that there is just about enough signal to counteract that noise.
The signal to noise ratio is going to depend on a lot of contingent factors. There will be more noise if there are fewer questions on a task. There will be less signal from o...
I understand "virtue signalling" as a novel term that is only loosely related to the concepts of "virtue" or "signalling".
It's a little annoying to have to mentally translate it into "that thing that people mean when they say 'virtue signalling'" (or sometimes a little amusing, when there's an interesting contrast with the literal meaning that a person's actions have signalled their virtues).
Down by 30 I probably put in backups to make sure my best players don't get injured, and play to run out the clock. Winning would require a record-breaking comeback, and even if we go all-out to win the chances of pulling it off are tiny, maybe 1 in a million.
Though I guess if it's the playoffs then I keep playing for the win. Regular season it would be worth playing for the win if we're down by 20 instead of 30.
Generally teams do adjust their tactics in the right direction in these sorts of situations, but not by enough on average. NFL teams play faster w...
Interesting that this essay gives both a 0.4% probability of transformative AI by 2043, and a 60% probability of transformative AI by 2043, for slightly different definitions of "transformative AI by 2043". One of these is higher than the highest probability given by anyone on the Open Phil panel (~45%) and the other is significantly lower than the lowest panel member probability (~10%). I guess that emphasizes the importance of being clear about what outcome we're predicting / what outcomes we care about trying to predict.
The 60% is for "We invent algorit... (read more)