jkraybill
jkraybill has not written any posts yet.

Haven't read that book, added to the top of my list, thanks for the reference!
But humans are uniquely able to learn behaviours from demonstration and forming larger groups which enable the gradual accumulation of 'cultural technology', which then allowed a runway of cultural-genetic co-evolution (e.g food processing technology -> smaller stomachs and bigger brains -> even more culture -> bigger brains even more of an advantage etc.)
One thing I think about a lot is: are we sure this is unique, or did something else like luck or geography somehow play an important role in one (or a handful) of groups of sapiens happening to develop some strong (or "viral") positive-feedback cultural learning... (read more)
It is true that we have seen over two decades of alignment research, but the alignment community has been fairly small all this time. I'm wondering what a much larger community could have done.
I start to get concerned when I look at humanity's non-AI alignment successes and failures; we've had corporations for hundreds of years, and a significant portion of humanity have engaged in corporate alignment-related activities (regulation, lawmaking, governance etc, assuming you consider those forces to generally be pro-alignment in principle). Corporations and governments have exhibited a strong tendency to become less aligned as they grow. (Corporate rapsheets, if a source is needed.)
We've also been in the company of humans for... (read more)
I've probably committed a felony by doing this, but I'm going to post a rebuttal written by GPT-4, and my commentary on it. I'm a former debate competitor and judge, and have found GPT-4 to be uncannily good at debate rebuttals. So here is what it came up with, and my comments. I think this is a relevant comment, because I think what GPT-4 has to say is very human-relevant.
... (read 1413 more words →)Radiations from the Sun bounce off a piece of cheese and enter into the beady eyes of a mouse; its retinal cells detect the light; the energy of the photons triggers neural impulses; these impulses are transmitted to the visual-processing areas of the
One of the things I think about a lot, and ask my biologist/anthropologist/philosopher friends, is: what does it take for something to be actually recognised as human-like by humans? For instance, I see human-like cognition and behaviour in most mammals, but this seems to be resisted almost by instinct by my human friends who insist that humans are superior and vastly different. Why don't we have a large appreciation for anthill architecture, or whale songs, or flamingo mating dances? These things all seem human-like to me, but are not accepted as forms of "fine art" by humans. I hypothesize that we may be collectively underestimating our own species-centrism, and are in grave... (read more)
Hi, I have a few questions that I'm hoping will help me clarify some of the fundamental definitions. I totally get that these are problematic questions in the absence of consensus around these terms -- I'm hoping to have a few people weigh in and I don't mind if answers are directly contradictory or my questions need to be re-thought.
Very interesting points, if I was still in middle management these things would be keeping me up at night!
One point I query is "this is a totally new thing no manager has done before, but we're going to have to figure it out" -- is it that different from the various types of tool introduction & distribution / training / coaching that managers already do? I've spent a good amount of my career coaching my teams on how to be more productive using tools, running team show-and-tells from productive team members on why they're productive, sending team members on paid training courses, designing rules around use of internal tools like Slack/Git/issue trackers/intranets... (read more)
It would be pretty nuts if you rewarded it for being able to red-team itself -- like, it's deliberately training it to go of the rails, and I thiiiiink would seem so even to non-paranoid people? Maybe I'm wrong.
I'm actually most alarmed on this vector, these days. We're already seeing people giving LLM's completely untested toolsets - web, filesystem, physical bots, etc - and "friendly" hacks like Reddit jailbreaks and ChaosGPT. Doesn't it seem like we are only a couple steps before a bad actor produces an ideal red-team agent, and then abuses it rather than using it to expose vulnerabilities?
I get the counter-argument, that humans already are diverse and try a ton of stuff, and so resilient systems are a result... but peering into the very near future, I fear that those arguments simply won't apply to super-human intelligence, especially when combined with bad human actors directing those.
Seeing this frantic race from random people to give GPT-4 dangerous tools and walking-around-money, I agree: the risk is massively exacerbated by giving the "parent" AI's to humans.
Upon reflection, should that be surprising? Are humans "aligned" how we would want AI to be aligned? If so, we must acknowledge the fact that humanity regularly produces serial killers and terrorists (etc). Doesn't seem ideal. How much more aligned can we expect a technology we produce, vs our own species?
If we view the birth of AGI as the birth of a new kind of child, to me, there really is no regime known to humanity that will guarantee that child will not grow up... (read more)
The Doomsday Clock is at 23:58:30, but maybe that's not what you meant? I think they were way off in the Cuban Missile Crisis era, but these days it seems more accurate and maybe more optimistic than I would give it. They do accommodate x-risk of various types.
As a poker player, this post is the best articulation I've read that explains why optimal tournament play is so different from optimal cash-game play. Thanks for that!