All of Davidmanheim's Comments + Replies

Worth noting that every one of the "not solved" problems was, in fact, well understood and proven impossible and/or solved for relaxed cases.

We don't need to solve this now, we need to improve the solution enough to figure out ways to improve it more, or show where it's impossible, before we build systems that are more powerful than we can at least mostly align. That's still ambitious, but it's not impossible!

3Remmelt5d
Yes, the call to action of this post is that we need more epistemically diverse research! This research community would be more epistemically healthy if we both researched what is possible for relaxed cases and what is not possible categorically under precise operationalisable definitions.

Yes, I'm mostly embracing simulator theory here, and yes, there are definitely a number of implicit models of the world within LLMs, but they aren't coherent. So I'm not saying there is no world model, I'm saying it's not a single / coherent model, it's a bunch of fragments.

But I agree that it doesn't explain everything! 

To step briefly out of the simulator theory frame, I agree that part of the problem is next-token generation, not RLHF - the model is generating the token, so it can't "step back" and decide to go back and not make the claim that it "... (read more)

No, it was and is a global treaty enforced multilaterally, as well as a number of bans on testing and arms reduction treaties. For each, there is a strong local incentive for states - including the US - to defect, but the existence of a treaty allows global cooperation.

With AGI, of course, we have strong reasons to think that the payoff matrix looks something like the following:

(0,0) (-∞, 5-∞)
(5-∞, -∞)  (-∞, -∞)

So yes, there's a local incentive to defect, but it's actually a prisoner's dilemma where the best case for defecting is identical to suicide.

We decided to restrict nuclear power to the point where it's rare in order to prevent nuclear proliferation. We decided to ban biological weapons, almost fully successfully. We can ban things that have strong local incentives, and I think that ignoring that, and claiming that slowing down or stopping can't happen, is giving up on perhaps the most promising avenue for reducing existential risk from AI. (And this view helps in accelerating race dynamics, so even if I didn't think it was substantively wrong, I'd be confused as to why it's useful to actively promote it as an idea.)

2Roko1mo
This is enforced by the USA though, and the USA is a nuclear power with global reach.

I think the post addresses a key objection that many people opposed to EA and longtermist concerns have voiced with the EA view of AI, and thought it was fairly well written to make the points it made, without also making the mostly unrelated point that you wanted it to have addressed.

Getting close to the decade anniversary for Why the Tails Come Apart, and this is a very closely related issue to regressional Goodhart.

AI safety "thought" is more-or-less evenly distributed

Agreed - I wasn't criticizing AI safety here, I was talking about the conceptual models that people outside of AI safety have - as was mentioned in several other comments. So my point was about what people outside of AI safety think about when talking about ML models, trying to correct a broken mental model.
 

So, I disagree that evals and red teaming in application to AI are "meaningless" because there are no standards. 

I did not say anything about evals and red teaming in application to AI, ot... (read more)

3Roman Leventov1mo
Ok, in this passage: It seems that you put the first two sentences "in the mouth of people outside of AI safety", and they describe some conceptual error, while the third sentence is "yours". However, I don't understand what exactly is the error you are trying to correct because the first sentence is uncontroversial, and the second sentence is a question, so I don't understand what (erroneous) idea does it express. It's really unclear what you are trying to say here. I don't understand how else to interpret the sentence from the post "If we lack a standard for safety, ideally one where there is consensus that it is sufficient for a specific application, then exploration or verification of the safety of a machine learning model is meaningless.", because to me, evals and red teaming are "exploration and verification of the safety of a machine learning model" (unless you want to say that the word "verification" cannot apply if there are no standards, but then just replace it with "checking"). So, again, I'm very confused about what you are trying to say :( My statement that you import an outdated view was based on that I understood that you declared "evals and red teaming meaningless in the absence of standards". If this is not your statement, there is no import of outdated understanding. I mean, standards are useful. They are sort of like industry-wide, strictly imposed "checklists", and checklists do help with reliability overall. When checklists are introduced, the number of incidents goes down reliably. But, it's also recognised that it doesn't go down to zero, and the presence of a standard shouldn't reduce the vigilance of anyone involved, especially when we are dealing with such a high stakes thing as AI. So, introducing standards of AI safety based on some evals and red teaming benchmarks would be good. While cultivating a shared recognition that these "standards" absolutely don't guarantee safety, and marketing, PR, GR, and CEOs shouldn't use the phrases

I do think that some people are clearly talking about meanings of the word "safe" that aren't so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to "meaningless".

 

The people in the world who actually build these models are doing the thing that I pointed out. That's the issue I was addressing.

People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don't when talking about

... (read more)

I think it would be really good to come up with a framing of these intuitions that wouldn't be controversial.

 

That seems great, I'd be very happy for someone to write this up more clearly. My key point was about people's claims and confidence about safety, and yes, clearly that was communicated less well than I hoped.

As an aside, mirror cells aren't actually a problem, and non-mirror digestive systems and immune systems can break them down, albeit with less efficiency. Church's early speculation that these cells would not be digestible by non-mirror life forms doesn't actually make work, per several molecular biologists I have spoken to since then.

Sure, I agree with that, and so perhaps the title should have been "Systems that cannot be reasonably claimed to be unsafe in specific ways cannot be claimed to be safe in those ways, because what does that even mean?" 

If you say something is "qwrgz," I can't agree or disagree, I can only ask what you mean. If you say something is "safe," I generally assume you are making a claim about something you know. My problem is that people claim that something is safe, despite not having stated any idea about what they would call unsafe. But again, that seems fundamentally confused about what safety means for such systems.

3benwr1mo
I would agree more with your rephrased title. People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don't when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I'm pretty sure almost everyone would agree that it's unsafe. No further specification work is required. It doesn't seem fundamentally confused to refer to a thing as "unsafe" if you think it might do that. I do think that some people are clearly talking about meanings of the word "safe" that aren't so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to "meaningless".

"If it would fail under this specific load, then it is unsafe" is a clear idea of what would constitute unsafe. I don't think we have this clear of an idea for AI. 

 

Agreed. And so until we do, we can't claim they are safe.

But maybe when you say "clear idea", you don't necessarily mean a clean logical description, and also consider more vague descriptions to be relevant?

A vague description allows for a vague idea of safety. That's still far better than what we have now, so I'd be happier with that than the status quo - but in fact, what people out... (read more)

Mostly agree.

I will note that correctly isolating the entertainment system from the car control system is one of those things you'd expect, but you'd be disappointed. Safety is hard.

For construction, it amounts to "doesn't collapse,"


No, the risk and safety models for construction go far, far beyond that, from radon and air quality to size and accessibility of fire exits. 

with AI you are talking to the full generality of language and communication and that effectively means: "All types of harm."

Yes, so it's a harder problem to claim that it's safe. But doing nothing, having no risk model at all, and claiming that there's no reason to think it's unsafe, so it is safe, is, as I said, "fundamentally confused about what safety means for such systems."

4Gunnar_Zarncke1mo
I get that, but I tried to phrase that in terms that connected to benwr's reques.

For the first point, if "people can in fact recognize some types of unsafety," then it's not the case that "you don't even have a clear idea of what would constitute unsafe." And as I said in another comment, I think this is trying to argue about standards, which is a necessity in practice for companies that want to release systems, but isn't what makes the central point, which is the title of the post, true.

And I agree that rods are often simple, and the reason that I chose rods as an example is because people have an intuitive understanding of some of th... (read more)

2tailcalled1mo
Maybe I am misunderstanding what you mean by "have a clear idea of what would constitute unsafe"? Taking rods as an example, my understanding is that rods might be used to support some massive objects, and if the rods bend under the load then they might release the objects and cause harm. So the rods need to be strong enough to support the objects, and usually rods are sold with strength guarantees to achieve this. "If it would fail under this specific load, then it is unsafe" is a clear idea of what would constitute unsafe. I don't think we have this clear of an idea for AI. We have some vague ideas of things that would be undesirable, but there tends to be a wide range of potential triggers and a wide range of potential outcomes, which seem more easily handled by some sort of adversarial setup than by writing down a clean logical description. But maybe when you say "clear idea", you don't necessarily mean a clean logical description, and also consider more vague descriptions to be relevant? I already addressed cars and you said we should talk about rods. Then I addressed rods and you want to switch back to cars. Can you make up your mind?

That's true - and from what I can see, this emerges from the culture in academia. There, people are doing research, and the goal is to see if something can be done, or to see what happens if you try something new. That's fine for discovery, but it's insufficient for safety. And that's why certain types of research, ones that pose dangers to researchers or the public, have at least some degree of oversight which imposes safety requirements. ML does not, yet.

I think you're focusing on the idea of a standard, which is necessary for a production system or reliability in many senses, and should be demanded of AI companies - but it is not the fundamental issue with not being able to say in any sense what makes the system safe or unsafe, which was the fundamental point here that you seem not to disagree with.

I'm not laying out a requirement, I'm pointing out a logical necessity; if you don't know what something is or is not, you can't determine it. But if something "will reliably cause serious harm to people who interact with it," it sounds like you have a very clear understanding of how it would be unsafe, and a way to check whether that occurs.

1benwr1mo
Part of my point is that there is a difference between the fact of the matter and what we know. Some things are safe despite our ignorance, and some are unsafe despite our ignorance.

I'm not saying that a standard is sufficient for safety, just that it's incoherent to talk about safety if you don't even have a clear idea of what would constitute unsafe. 

Also, I wasn't talking about cars in particular - every type of engineering, including software engineering, follows this type of procedure for verification and validation, when those are required. And I think metal rods are a better example to think about - we don't know what it is going to be used for when it is made, but whatever application the rod will be used for, it needs to have some clear standards and requirements.

2tailcalled1mo
I can believe it makes it less definitive and less useful, but I don't buy that it makes it "meaningless" and entirely "incoherent". People can in fact recognize some types of unsafety, and adversarially try to trigger unsafety. I would think that the easier it is to turn GPT into some aggressive powerful thing, the more likely ARC would have been to catch it, so ARCs failure to make GPT do dangerous stuff would seem to constitute Bayesian evidence that it is hard to make it do dangerous stuff. AFAIK rods are a sufficiently simple artifact that almost all of their behavior can be described using very little information, unlike cars and GPTs?

Jaynes discusses exactly this, in reference to whether someone displaying psychic powers really has them, and whether correctly predicting 100 cards is enough to overcome your prior that psychic powers don't exist. In response, he points out that you need more than 2 hypotheses. In this case, consider the prior odds of god giving her the information, or her cheating, or someone else lying about what happened, or you imagining the whole thing - and this is evidence in favor of all of those hypotheses over it being truly random, not just for the existence of god.

A quick google search gives a few options for the definition, and this qualifies according to all of them, from what I can tell. The fact that he thinks the comment is true doesn't change that.

Trolling definition: 1. the act of leaving an insulting message on the internet in order to annoy someone

Trolling is when someone posts or comments online to 'bait' people, which means deliberately provoking an argument or emotional reaction.

Online, a troll is someone who enters a communication channel, such as a comment thread, solely to cause trouble. Trolls often ... (read more)

6jefftk2mo
That doesn't read us trolling to me? 80% sure he means it literally.

I think this is wrong, but a useful argument to make.

I disagree even though I generally agree with each of your sub-points. The key problem is that the points can all be correct, but don't add to the conclusion that this is safe. For example, perhaps an interpretable model is only 99.998% likely to be a misaligned AI system, instead of 99.999% for a less interpretable one. I also think that the current paradigm is shortening timelines, and regardless of how we do safety, less time makes it less likely that we will find effective approaches in time to preem... (read more)

3Nadav Brandes2mo
Thank you for this comment. I'm curious to understand the source of disagreement between us, given that you generally agree with each of the sub-points. Do you really think that the chances of misalignment with LM-based AI systems is above 90%? What exactly do you mean by misalignment in this context and why do you think it's the most likely result with such AI? Do you think it will happen even if humanity sticks with the paradigm I described (of chaining pure language models while avoiding training models on open-ended tasks)? I want to also note that my argument is less about "developing language models was counterfactually a good thing" and more "given that language models have been developed (which is now a historic fact), the safest path towards human-level AGI might be to stick with pure language models".

I agree - but think that now, if and when similarly initial thoughts on a conceptual model are proposed, there is less ability or willingness to engage, especially with people who are fundamentally confused about some aspect of the issue. This is largely, I believe, due to the volume of new participants, and the reduced engagement for those types of posts.

He excludes the only examples we have, which is fine for his purposes, though I'm skeptical it's useful as a definition, especially since "some difference" is an unclear and easily moved bar. However, it doesn't change the way we want to do prediction about whether something different is possible. That is, even if the example is excluded, it is very relevant for the question "is something in the class possible to specify." 

I assume the strong +1 was specifically on the infohazards angle? (Which I also strongly agree with.) 

1cwbakerlee2mo
Yep, that's right -- thanks for clarifying!

None of this argues that creating grey goo is an unlikely outcome, just that it's a hard problem. And we have an existence proof of at least one example of a way to make gray goo that covers a planet, which is life-as-we-know-it, which did exactly that.

But solving hard problems is a thing that happens, and unlike the speed of light, this limit isn't fundamental. It's more like the "proofs" that heavier than air flight is impossible which existed in the 1800s, or the current "proofs" that LLMs won't become AGIs - convincing until the counterexample exists, but not at all indicative that no counterexample does or could exist.

7Steven Byrnes2mo
OP said: (And I believe they’re using “grey goo” the same way.) So I think you’re using a different definition of “grey goo” from OP, and that under OP’s definition, biological life is not an existence proof. I think the question of “whether grey-goo-as-defined-by-OP is possible” is an interesting question and I’d be curious to know the answer for various reasons, even if it’s not super-central in the context of AI risk.

Noting that my very first lesswrong post, back in the LW1 days, was an example of #2. I was wrong on some of the key parts of the intuition I was trying to convey, and ChristianKl corrected me. As an introduction to posting on LW, that was pretty good - I'd hate to think that's no longer acceptable.

At the same time, there is less room for it as the community got much bigger, and I'd probably weak downvote a similar post today, rather than trying to engage with a similar mistake, given how much content there is. Not sure if there is anything that can be don... (read more)

fwiw that seems like a pretty great interaction. ChristanKl seems to be usefully engaging with your frame while noting things about it that don't seem to work, seems (to me) to have optimized somewhat for being helpful, and also the conversation just wraps up pretty efficiently. (and I think this is all a higher bar than what I mean to be pushing for, i.e. having only one of those properties would have been fine)

Just want to note that I'm less happy with a lesswrong without Duncan. I very much value Duncan's pushback against what I see as a slow decline in quality, and so I would prefer him to stay and continue doing what he's doing. The fact that he's being complained about makes sense, but is mostly a function of him doing something valuable. I have had a few times where I have been slapped down by Duncan, albeit in comments on his Facebook page, where it's much clearer that his norms are operative, and I've been annoyed, but each of those times, despite being f... (read more)

Thanks, reading closely I see how you said that, but it wasn't clear initially. (There's an illusion of disagreement, which I'll christen the "twitter fight fallacy," where unless the opposite is said clearly, people automatically assume replies are disagreements.) 

I probably put in an extra 20-60 hours, so the total is probably closer to 150 - which surprises me. I will add that a lot of the conversion time was dealing with writing more, LaTeX figures and citations, which were all, I think, substantive valuable additions. (Changing to a more scholarly style was not substantively valuable, nor was struggling with latex margins and TikZ for the diagrams, and both took some part of the time.)

Thanks, agreed. And as an aside, I don't think it's entirely coincidental that neither of the people who agree with you are in the Bay.

I think that the costs usually are worth it far more often than it occurs, from an outside view - which was David's point, and what I was trying to respond to. I think that it's more valuable than one expects to actually just jump through the hoops. And especially for people who haven't yet ever had any outputs actually published, they really should do that at least once.

(Also, sorry for the zombie reply.)

2Daniel Kokotajlo3mo
I love zombie replies. If you reread this conversation, you'll notice that I never said I think these people are correct. I was just saying that their stated motivations and views are their real motivations and views.  I actually do agree with you and David Krueger that on the margin more LW types should be investing in making their work publishable and even getting it published. The plan had always been "do research first, then communicate it to the world when the time is right" well now we are out of time so the time is right.

I think this ignores how decisions actually get made, but I think we're operating at too high a level of abstraction to actually disagree productively.

You're very unusually proactive, and I think the median member of the community would be far better served if they were more engaged the way you are. Doing that without traditional peer reviewed work is fine, but unusual, and in many ways is more difficult than peer-reviewed publication. And for early career researchers, I think it's hard to be taken seriously without some more legible record - you have a PhD, but many others don't.

To respond briefly, I think that people underinvest in (D), and write sub-par forum posts rather than aim for the degree of clarity that would allow them to do (E) at far less marginal cost. I agree that people overinvest in (B)[1], but also think that it's very easy to tell yourself your work is "actual progress" when you're doing work that, if submitted to peer-reviewed outlets, would be quickly demolished as duplicative of work you're unaware of, or incompletely thought-out in other ways.

I also worry that many people have never written a peer reviewed p... (read more)

1mikbp3mo
  This should be obvious for everyone! As an outside observer and huge sympathizer, it is super-frustrating how siloed the broad EA/rational/AI-alignment/adjacent community is --this specific issue with publication is only one of the consequences. Many of "you people" only interacting between "yourselves" (and I'm not referring to you, Davids), very often even socially. I mean, you guys are trying to do the most good possible, so help others use and leverage on your work! And don't waste time reinventing what is already common or, at least, what already exists outside. More mixing would also help prevent Leverage-style failures and probably improve what from the outside seems like a very weird and unhealthy "bay area social dynamics" (as put by Kaj here [https://www.lesswrong.com/posts/duyJ9uFo2pnPgr3Yn/here-have-a-calmness-video]).

It's a reasonable model. One problem with this as a predictive model, however, is that log-rolling happens across issues; a politician might give up on their budget-cutting to kill an anti-business provision, or give up an environmental rule to increase healthcare spending. So the gradients aren't actually single valued, there's a complex correlation / tradeoff matrix between them.

2DirectedEvolution3mo
It seems like large organizations achieve structure through a combination of legislation and value-setting. They use policies and rules to legislate nuance, but rely on a single value to steer daily decision-making. This whole analysis really needs to be understood as being about the daily decision-making piece of the puzzle.

they don't judge those costs to be worth it


Worth it to whom? And if they did work that's valuable, how much of that value is lost if others who could benefit don't see it, because it's written up only informally or not shared widely?

4Daniel Kokotajlo3mo
Worth it to the world/humanity/etc. though maybe some of them are more self-focused. Probably a big chunk of it is lost for that reason yeah. I'm not sure what your point is, it doesn't seem to be a reply to anything I said.

There have also been plenty of other adapatations, ones which were not low-effort. I worked on 2, the Goodhart's law paper and a paper with Issa Rice on HRAD. Both were very significantly rewritten and expanded into "real" preprints, but I think it was clearly worthwhile.

I mostly agree with this - deep ideas should get relatively less focus, but not stop getting funding / attention. See my EA forum post from last year, Interesting vs. Important Work - A Place EA is Prioritizing Poorly, which makes a related point.

And I think the post here is saying that you should jump through those effort and editing hoops far more often than currently occurs.

7Raemon3mo
Yeah, I didn't mean to be responding to that point one way or another. It just seemed bad to be linking to a post that (seems to still?) communicate false things, without flagging those false things. (post still says "it can be as easy as creating a pdf of your post", which my impression maybe technically true on rare occasions but basically false in practice?)

If someone says the opportunity cost is not worth it for them, I see that as a claim that a priori might be true or false. Your post seems to imply that almost everyone is making an error in the same direction, and therefore funders should put their thumb on the scale. That’s at least not obvious to me.


I do think this is the wrong calculation, and the error caused by it is widely shared and pushes in the same direction. 

Publication is a public good, where most of the benefit accrues to others / the public. Obviously costs to individuals are higher tha... (read more)

If we compare

  • (A) “actual progress”, versus
  • (B) “legible signs of progress”,

it seems obvious to me that everyone has an incentive to underinvest in (A) relative to (B). You get grants & jobs & status from (B), not (A), right? And papers can be in (B) without being minimally or not at all in (A).

In academia, people talk all the time about how people are optimizing their publication record to the detriment of field-advancement, e.g. making results sound misleadingly original and important, chasing things that are hot, splitting results into unnecessari... (read more)

Unless I'm missing something, this seems correct, but unhelpful. It doesn't point towards how we should do anything substantive to understand, much less control AI, it just gives us a way to do better at tasks that help explain current models. Is there something you're pointing to that would make this more useful than just for prompt engineering or ad-hoc / post-hoc explanation of models we don't understand?

7metasemi3mo
I don't know whether this would be the author's take, but to me it urges us to understand and "control" these AIs socially: by talking to them.

Mostly correct, but because passing isn't allowed, it is not necessarily the case that black doesn't have a forced win.

There's a different principle that's important here, which is that the space of bad ways to do things is almost always larger than the set of good ways to do it, and appealing to what has been sufficient so far is at least a great way to ensure you don't do far worse. I'm not going to try to make the argument fully right here, but in general, doing things differently means you're risking new failure modes - and the fact that it was once done this way doesn't avoid the problem, because the situation now is different. (On the other hand, this is a fully generalized argument against trying anything, which is bad if overused. It does function as a reason to exercise significant additional caution.)

6shminux4mo
Yeah, I agree with that, and it is an important consideration to keep in mind, that anything outside the yellow brick road is a minefield, but sometimes you have to bring a minesweeper and carefully make your way. I guess my point is that it pays to be aware of and to respect the minefield.

These seem like arguments that it should be possible to be very, very cautious, and to create an agent that doesn't immediately crash and burn due to Russell's claim, not that they are unlikely, nor that even these agents don't fail slightly later.

1Gerald Monroe4mo
The above is preventing the cause of most embedded system failure - state buildup.  Whether it be routers, laptops, cars, patriot missile systems - the majority cause for any embedded system to fail is not that the system fails during testing in it's known state right after starting/boot, but it fails later.  And the cause of the later failure is internal state in the machine's memory. High reliability web services go to "stateless microservices" for this reason.  "temporal myopia" actually means "clear state as often as you can" which is functionally the same thing. So no, it won't fail later.  The above system will probably not ever fail at any rate above the base failure rate when it was built.  

I don't really see an argument here against the central claim you say you disagree with.

I no longer believe this to be obviously true.

This is based on a straightforward claim from optimization theory, and you don't address it, nor do you explain your model, other than to vaguely gesture at uncertainties and caution, without looking at whether VoI itself would lead to extremization, nor why caution would be optimal for an agent.

 

3PonPonPon4mo
A fair objection. I had a quick search online and also flicked through Boyd's Convex Optimization, and didn't find Stuart Russell's claim expounded on. Would you be able to point me in a direction to look further into this? Nevertheless, let me try to provide more detailed reasoning for my counterclaim. I assume that Russell's claim is indeed true in the classic optimisation domain, where there is a function R^N -> R f(x) as well as some inequality constraints on a subset of x. However, I argue that this is not a good model for maximising a utility function in the real world. First of all, it is not necessarily possible to freely search over x, as x corresponds to environmental states. All classic optimisation techniques that I know of assume that you may set x to any value regardless of the history of values that x was set to. This is not the case in the real world; there are many environmental states which are not accessible from other environmental states. For example, if Earth were to be swallowed up into a black hole, we wouldn't be able to restore the environment of me typing out this response to you on LW ever again. In effect, what I'm describing is the difference in optimising in a RL setting than the classical setting. And whilst I can believe some result on extremal values exists in the classical setting, I'd be very surprised indeed if something similar exists in the RL setting. Particularly when the transition matrices are unknown to the agent i.e. it does not have a perfect model of the environment already. So I've laid out my skepticism for the extremal values claim in RL, but is there any reason to believe my counterclaim that RL optimisation naturally leads to non-extremal choices? Here I think I'll have to be handwavy and gestur-y again, for now (afaik, no literature exists pertaining to this topic and what I'm going to say next, but please do inform me if this is not the case).  Any optimisation process requires evaluating f(x) for differen

It's also mostly "conditional on acceptance, homeschooled students do better" - and given the selection bias in the conditional sample, that would reflect a bias against them in admissions, rather than being a fact about homeschooling.

  1. Isolating kids from peers is damaging to social skills in many cases. That would not show up in academic success, but it matters for happiness
  2. Giving kids control over what they learn, and having them self-guide, is very prone to failing to pick up key skills - and some of the time, the skills are critical enough to handicap them later.

Also, "that does give a strong lower bound for how bad that specifically can be for kids" - It really doesn't. If 25% of homeschooled kids do much better than average, and 75% do significantly worse, looking at those who went to college means you've completely eliminated the part of the sample that was harmed.

1Timothy Underwood4mo
So this is based on my memory of homeschooling propaganda articles that I saw as a kid. But I'm pretty sure the data they had there showed most kids went to college. In my family three of us got University of California degrees, and the one who only got a nursing degree in his thirties authentically enjoyed manual labor jobs until he decided he also wanted more money.  Perhaps these numbers do stop at college, and so we don't see in them children who get a good college education, but then fail in some important way later on in life, but I've never gotten an impression from anywhere that homeschooled children have generally worse life outcomes -- anyways, this is something that the data has to actually exist for since several percent of US children have been homeschooled for the last several decades. I did have substantial social problems, even as an adult, and they have led me to be less successful in career terms than I probably would have been with stronger social skills. But this might be driven by a selection effect: The reason my parents actually started homeschooling me was because I was being bullied and having severe social problems in third grade. 
Load More