Wiki Contributions

Comments

Counterpoint #2a: A misaligned AGI whose capabilities are high enough to use our safety plans against us will succeed with an equal probability (e.g., close to 100%), if necessary by accessing these plans whether or not they were posted to the Internet.

If only relative frequency of genes matters, then the overall size of the gene pool doesn't matter. If the overall size of the gene pool doesn't matter, then it doesn't matter if that size is zero. If the size of the gene pool is zero, then whatever was included in that gene pool is extinct.

Yes, it's true people make all kinds of incorrect inferences because they think genes that increase the size of the gene pool will be selected for or those that decrease it will be selected against. But it's still also true that a gene that reduces the size of the pool it's in to zero will no longer be found in any living organisms, regardless of what its relative frequency was in the process of the pool reaching a size of zero. If the term IGF doesn't include that, that just means IGF isn't a complete way of accounting for what organisms we observe to exist in what frequencies and how those change over time.

I mean, just lag, yes, but there's also plain old incorrect readings. But yes, it would be cool to have a system that incorporated glucagon. Though, diabetics' body still produce glucagon AFAIK, so it'd really be better to just have something that senses glucose and releases insulin the same way a working pancreas would.

Context: I am a type 1 diabetic. I have a CGM, but for various reasons use multiple daily injections rather than an insulin pump; however, I'm familiar with how insulin pumps work.

A major problem with a closed-loop CGM-pump system is data quality from the CGM. My CGM (Dexcom G6) has ~15 minutes of lag (because it reads interstitial fluid, not blood). This is the first generation of Dexcom that doesn't require calibrations from fingersticks, but I've occasionally had CGM readings that felt way off and needed to calibrate anyway. Accuracy and noisiness vary from sensor to sensor (they last 10 days, officially; people have figured out how to "restart" them, but I've found often the noisiness goes up towards the end of the 10 days anyway), probably due to placement. It also only produces a reading every 5 minutes, probably partly to save battery but maybe also because more than that would be false precision anyway. And low blood sugar can be lethal rather quickly (by killing neurons, or by messing up neural function enough that you get into a car accident if you're driving), so these issues mean caution is needed when using CGM readings to choose insulin dosing.

I'd think of connecting that to an insulin pump using a control system as more similar to Tesla Autopilot than to a Level 5 autonomous car. It's sort of in the uncanny valley where the way it works is tempting you to just ignore it, but you actually can't. I certainly don't mean that these problems are impossible to overcome, and in fact "hybrid" closed loop systems, which still require manual intervention from time to time, are starting to become commercially available, and there are also DIY systems. (Type 1 diabetics vary in how much they geek out about managing it; I think I'm somewhere in the middle in terms of absolute geekiness, meaning, I would guess, 95th+ percentile relative to the relevant population.) But I think there are pretty strong reasons people don't look at "well, just connect a 2022 off-the-shelf CGM, 2022 off-the-shelf insulin pump, and some software" as a viable fully closed loop for managing blood sugar for type 1 diabetics.

We'll build the most powerful AI we think we can control. Nothing prevents us from ever getting that wrong. If building one car with brakes that don't work made everyone in the world die in a traffic accident, everyone in the world would be dead.

How much did that setup cost? I'm curious about similar use cases.

The best way to actually schedule or predict a project is to break it down into as many small component tasks as possible, identify dependencies between those tasks, and produce most likely, optimistic, and pessimistic estimates for each task, and then run a simulation for chain of dependencies to see what the expected project completion looks like. Use a Gantt chart. This is a boring answer because it's the "learn project management" answer, and people will hate on it because gesture vaguely to all of the projects that overrun their schedule. There are many interesting reasons for why that happens and why I don't think it's a massive failure of rationality, but I'm not sure this comment is a good place to go into detail on that. The quick answer is that comical overrun of a schedule has less to do with an inability to create correct schedules from an engineering / evidence-based perspective, and much more to do with a bureaucratic or organizational refusal to accept an evidence-based schedule when a totally false but politically palatable "optimistic" schedule is preferred.

I definitely agree that this is the way to get the most accurate prediction practically possible, and that organizational dysfunction often means this isn't used, even when the organization would be better able to achieve its goals with an accurate prediction. But I also think that depending on the type of project, producing an accurate Gantt chart may take a substantial fraction of the effort (or even a substantial fraction of the wall-clock time) of finishing the entire project, or may not even be possible without already having some of the outputs of the processes earlier in the chart. These aren't necessarily possible to eradicate, so the take-away, I think, is not to be overly optimistic about the possibility of getting accurate schedules, even when there are no ill intentions and all known techniques to make more accurate schedules are used.

In other words, asking people for a best guess or an optimistic prediction results in a biased prediction that is almost always earlier than a real delivery date. On the other hand, while the pessimistic question is not more accurate (it has the same absolute error margins), it is unbiased. The reality is that the study says that people asked for a pessimistic question were equally likely to over-estimate their deadline as they were to under-estimate it. If you don't think a question that gives you a distribution centered on the right answer is useful, I'm not sure what to tell you.

It's interesting that the median of the pessimistic expectations is about equal to the median of the actual results. The mean clearly wasn't, as that discrepancy was literally the point of citing this statistic in the OP:

in a classic experiment, 37 psychology students were asked to estimate how long it would take them to finish their senior theses “if everything went as poorly as it possibly could,” and they still underestimated the time it would take, as a group (the average prediction was 48.6 days, and the average actual completion time was 55.5 days).

So the estimates were biased, but not median-biased (at least that's what Wikipedia appears to say the terminology is). Less biased than other estimates, though. Of course this assumes we're taking the answer to "how long would it take if everything went as poorly as it possibly could" and interpreting it as the answer to "how long will it actually take", and if students were actually asked after the fact if everything went as poorly as it possibly could, I predict they would mostly say no. And treating the text "if everything went as poorly as it possibly could" as if it wasn't even there is clearly wrong too, because they gave a different (more biased towards optimism) answer if it was omitted.

This specific question seems kind of hard to make use of from a first-person perspective. But I guess maybe as a third party one could ask for worst-possible estimates and then treat them as median-unbiased estimators of what will actually happen? Though I also don't know if the median-unbiasedness is a happy accident. (It's not just a happy accident, there's something there, but I don't know whether it would generalize to non-academic projects, projects executed by 3rd parties rather than oneself, money rather than time estimates, etc.)

I do still also think there's a question of how motivated the students were to give accurate answers, although I'm not claiming that if properly motivated they would re-invent Murphyjitsu / the pre-mortem / etc. from whole cloth; they'd probably still need to already know about some technique like that and believe it could help get more accurate answers. But even if a technique like that is an available action, it sounds like a lot of work, only worth doing if the output has a lot of value (e.g. if one suspects a substantial chance of not finishing the thesis before it's due, one might wish to figure out why so one could actively address some of the reasons).

I have a sense that this is a disagreement about how to decide what words "really" mean, and I have a sense that I disagree with you about how to do that.

https://www.lesswrong.com/posts/57sq9qA3wurjres4K/ruling-out-everything-else

I had already (weeks ago) approvingly cited and requested for my wife and my best friend to read that particular post, which I think puts it at 99.5th percentile or higher of LW posts in terms of my wanting its message to be understood and taken to heart, so I think I disagree with this comment about as strongly as is possible.

I simply missed the difference between "what could go wrong" and "you failed, what happened" while I was focusing on the difference between "what could go wrong" and "how long could it take if everything goes as poorly as possible".

It didn't work for the students in the study in the OP. That's literally why the OP mentioned it!

Load More