Is there a reason that LessWrong defaults to light mode rather than automatically following the browser's setting? I personally find it a bit annoying to have to select auto every time I have a new localStorage
and it's not clear to me what the upside is.
I don't know how considered this is, but, we've up a lot of the site aesthetic built around the light mode (aiming for a "watercolor on paper" vibe) and it is fairly hard to get it to work well on dark mode as well.
Honestly. I am often a "dark mode" preference person, but I use lesswrong in light mode, because I think lesswrong has a light mode aesthetic and looks very good in it. I personally don't like the dark mode very much. I think it is perfectly acceptable to have light mode by default. I think that serving dark mode to users that have their system theme set to dark mode is not necessarily good. It costs you just a few seconds to change it if you don't like it, and showing the site's identity and aesthetic is good for newcomers. This is just my anecdotal experience, but it is at least evidence that not all dark system theme people prefer to have automatic dark mode.
The relevant “auto” setting for browsers just turn on dark mode in the evenings and light mode during the day. That’s not really expressing a preference.
I think it is a strong preference, which is why people who accidentally enable it or forget they enable it complained so much when we defaulted to 'auto' on Gwern.net.
I am assuming you are being sarcastic and are hence agreeing with me?
I agree, indeed my experience with websites doing the auto thing is mostly that it's frustrating and I don't want it, and I expect quick user complaints. At least MacOS pushes users to activate it, and I had it active for a while, but eventually overwrote the setting on almost every website that lets me choose between dark and light mode to the version I want to see in a stable way, usually after a pang of frustration and disorientation as suddenly this one website looked different at a time that seemed random to me.
I would be happy to take bets we would quickly get tons of complaints.
This is exactly how I feel when light mode happens at all ever. Please give me the ability to set my account to dark mode across all devices. Please make light/dark a prominent ui switch for all new users. I don't care if it's pretty as much as I care that it's dark. I agree auto is usually not user preference, despite suspecting it's best for circadian rhythm. But dark is a common strong user preference and I generally quickly leave sites that don't honor it, and I find it frustrating to have to argue this hard to get you to consider what most software now has as basic functionality (dark mode setting honored by default and synced across devices if changed).
I would implement this and pr it if I understood the codebase well enough, I mean to sit down and get up to speed at some point.
Yeah, agree, but are we not doing this? I think we currently store it per-device, but like, you can just switch to dark mode on all devices?
Or do you mean we should respect an explicitly set browser preference for "dark"? I don't actually know whether that's possible while also not following the "auto" setting.
Or do you mean we should respect an explicitly set browser preference for “dark”? I don’t actually know whether that’s possible while also not following the “auto” setting.
Indeed this is not possible, since there is not actually such a thing as an “auto” setting as far as CSS is concerned, but rather simply a system where the prefers-color-scheme
media query matches the light
value by default, but will instead match the dark
value if the browser is commanded to make it so (either manually by the user or automatically by the system). An “auto” setting exists on the system level, and can exist on the website level (as on gwern.net), but not on the “intermediate” level of CSS and browser implementation thereof.
I believe last time I asked about it, which I think was about 6 months ago, you said something that I remembered as approximately that storing it clientside was the only available option for reasons I don't remember, and that the cost that sometimes the browser forgets clientside settings is just one that dark mode users will have to bear. that takeaway was frustrating and has left me wanting to push about this. it's entirely possible I misremembered.
The setting is saved by a cookie, which is the same way we store your login token information. I.e. if someone loses their theme cookie, they very likely would also be logged out, so we really can't do better than that kind of setting (we could make it so that we automatically restore your preference when you log in again, but that seems like it doesn't really change the basic experience).
I found at least one repro: set dark mode, edit your user settings, tada light mode. I thought I remembered it resetting to light mode without logging me out or a user setting edit, though. I could believe it's not the storing-it-clientside that's causing the resets. I haven't seen as many for a few months, not sure why.
incidentally, you might find https://bottosson.github.io/misc/colorpicker/ to be a useful tool
My preference, by default, is something I communicate through the browser because the browser forwards on my light/dark preferences to websites.
In general, I want websites to do whatever my OS is set to do by default.
The relevant “auto” setting for browsers just turn on dark mode in the evenings and light mode during the day.
This is what Macs do by near-default, but some people use dark mode on 24/7 and some people use light mode 24/7. It’s not what all browsers+OS setups do.
This is what Macs do by near-default
Yes, that is what I meant. At least Macs do this by default, to the frustration of a non-trivial fraction of users who are now half of the time presented with visual designs that are half-baked, inadequate and barely tested, because realistically maintaining both a dark mode and a light mode design for a website is very hard.
Human color and space perception doesn't work symmetrically across light and dark contrasts, so a well-designed dark website and a well-designed light website just look very different from each other on many dimensions. You can of course do it with CSS, but we are not talking about just inverting all the colors, we are talking about at the very least hand-picking all the shades, and realistically substantially changing the layout, spacing and structure of your app (so e.g. you don't end up with large grey areas in a dark mode setting, which stand out vastly more in dark mode than equally high-contrast grey sections in a light mode).
You can of course do it with CSS, but we are not talking about just inverting all the colors, we are talking about at the very least hand-picking all the shades
I recommend my color-scheme-convert.php
to automatically convert entire style sheets to a dark color scheme and perform the requisite gamma adjustment. (Which is to say, this script is specifically designed to solve the “Human color and space perception doesn’t work symmetrically across light and dark contrasts” problem.)
It’s what we use to auto-generate the dark mode styles for gwern.net
, and is also used for the same purpose in PmWiki.
(Some additional hand-adjustments may be necessary for some elements. However, the script dramatically reduces the workload of doing the conversion, and allows the great majority of it to be automated.)
FWIW, my impression is that while the gwern.net dark mode was a ton of work to create, due to requiring a total refactoring of the CSS and deal with the flash-of-white issue and figure out the correct three-mode behavior, and create tooling like the color conversion script or needing to create InvertOrNot.com for acceptable images, or tweak the syntax highlighting colors to look right in dark-mode... But once we finished paying all of that, the maintenance burden has been relatively small. I have to patch up the occasional image not inverting right and we have to occasionally special case some Wikipedia popups CSS, but it's on net less effort than much of the website (eg. Apple device support, or the local archive system, are perennial hassles).
The gwernnet dark-mode is pretty good engineering work overall. We should've done a design-graveyard writeup to explain not just how it works (non-obvious) but the problems with the more obvious approaches and what problems we had etc. I fear it might be too late to write it now even if we had the time...
I think Gwern.net lends itself somewhat more to the maintenance burden being small. As one example, LessWrong for much of its UI leverages half-transparent images that fade to a white background. Making them be harmonious for dark mode basically means making two images each time.
For example, see this Sequences Spotlight in dark mode:
The image borders are exposed, and the white fade does not work. Compare to the fully functional light mode:
For most things like this we have a pass that tries to edit the images to properly work in both light and dark mode, but it requires a decent amount of artistic judgement each time.
The image borders are exposed, and the white fade does not work. Compare to the fully functional light mode:
To be honest, I don't like the light-mode example either. I think it's bad to have your text visibly overlapping like that. (If I made an image which I had put .float-right .outline-not
on to get the same effect on gwernnet and I saw your 'good' white-mode version, I would be immediately complaining to Obormot about his CSS being busted.) So in this example, isn't a lot of the problem that the UI elements are overlapping so the text is spilling over onto the scales and blocking the wires etc, and the dark-mode merely exacerbates the problem and fixing the core problem would also fix the dark-mode issue?
Nah, the problem isn't the UI elements overlapping the text. I just happened to choose an image that also had other issues. You would still end up with an ugly fade and a harsh transition unless someone noticed in time and uploaded a new image, even if the image was more properly placed on the right edge of the spotlight item (and separately, I think the image overlapping a bit with the text is basically fine and I consider it less of an issue, but that seems orthogonal to the point).
Human color and space perception doesn’t work symmetrically across light and dark contrasts, so a well-designed dark website and a well-designed light website just look very different from each other on many dimensions. You can of course do it with CSS, but we are not talking about just inverting all the colors, we are talking about at the very least hand-picking all the shades, and realistically substantially changing the layout, spacing and structure of your app (so e.g. you don’t end up with large grey areas in a dark mode setting, which stand out vastly more in dark mode than equally high-contrast grey sections in a light mode).
I’m hoping the negative agreement karma for the parent comment isn’t for this — it’s just for “maintaining both a dark mode and a light mode design for a website is very hard” (emphasis added, as distinct from creation).
The above blockquote makes me want to say to the studio audience “Why are you booing — he’s right!”.
FWIW, I don't think the site looks significantly worse on dark mode, although I can understand the desire not to have to optimize for two colorschemes.
I think the baseline site is pretty fine in darkmode, it's just that whenever we do artsy illustration stuff it's only really as-an-afterthought ported to darkmode. So, I think we have at least some preference for people's first experience of it to be on lightmode so that you at least for-a-bit get a sense of what the aesthetic is meant to be.
(the part where it keeps reverting whenever you lose localstorage does sound annoying, sorry about that)
So, I think we have at least some preference for people’s first experience of it to be on lightmode so that you at least for-a-bit get a sense of what the aesthetic is meant to be.
This makes sense in a vacuum, but…
…if someone’s first visit to Less Wrong is in the evening after the sun goes down, then his first impression of the site will be colored by an unpleasantly bright website that doesn’t work properly in dark mode. Is that really what you want?
I’ve designed sites where the light mode has a nice background flourish, but I never figured out how to make it work nicely in dark mode.
So I just had a solid dark color in dark mode (instead of the background image) and left the image as a light-mode-only thing, which makes it something like an easter egg if one usually browses the site with a dark-mode preference being forwarded on to the site.
This kind of thing happens to me, as well — I have wallpaper that changes with the time of day, and I generally don’t see the sunrise versions of the wallpaper unless I’m at my computer crazy early or crazy late.
I definitely do not stand by this as either explicit Lightcone policy or my own considered opinion, but, I feel like a bunch of forces on the internet nudge everyone towards the same generic site designs (mobile-first, darkmode ready, etc), and while I agree there is a cost, I do feel actively sad about the tradeoff in the other direction.
(like, there are a lot of websites that don't have a proper darkmode. And I... just... turn down the brightness if it's a big deal, which it usually isn't? I don't really like most websites turning dark at night. And again, if you set the setting once on LessWrong it mostly should be stable, I don't really buy that there are that many people who get the setting lost?)
I feel like a bunch of forces on the internet nudge everyone towards the same generic site designs […], and while I agree there is a cost, I do feel actively sad about the tradeoff in the other direction.
Mobile-first cuts out a lot of room for self-expression, agreed.
Dark mode, meanwhile, I like too much. Heck, I have a site where one might reasonably ask “where’s the light mode?”.
And again, if you set the setting once on LessWrong it mostly should be stable, I don’t really buy that there are that many people who get the setting lost?
Safari is set to throw out a site’s cookies if it isn’t visited in seven days. I don’t know about other browsers.
I think LW's dark mode is bad, and is actually too dark. Almost nobody uses full #000000 for dark mode.
Hi everyone,
I’m Vladimir - 25 years old, originally from Russia and currently living in Dublin. I studied mathematics, but life took me into product management in IT, where I work today.
I’ve been loosely aware of rationality for years, but something shifted for me after 2023. The rapid progress in AI chatbots made the clear thinking feel much more immediate and personal. Since then, I’ve been slowly but deliberately trying to get better at reasoning, noticing biases, and making sense of the world in a more structured way.
As part of that, I recently started working on a small passion project: a non-profit website that teaches people about cognitive biases in an interactive way. It’s still in its early stages, and I’m figuring a lot out as I go, but I’d love any thoughts if you ever take a look (I hope it is okay to put it here, but please let me know if it's not).
I’m excited to be here. LessWrong feels like one of the rare places on the internet where people are open-minded and seek the truth or knowledge. I also hope to join in some of the AI discussions - I find myself both fascinated by where things are going and deeply uncertain about how to navigate it all.
Thanks for reading and looking forward to learning from all of you.
- Vladimir
This is a very neatly-executed and polished resource. I'm a little leery of the premise - the real world doesn't announce "this is a Sunk Cost Fallacy problem" before putting you in a Sunk Cost Fallacy situation, and the "learn to identify biases" approach has been done before by a bunch of other people (CFAR and https://yourbias.is/ are the ones which immediately jump to mind) - but given you're doing what you're doing I think you've done it about as well as it could plausibly be done (especially w.r.t. actually getting people to relive the canonical experiments). Strong-upvoted.
Hello
This is a temporary account to just get used to how this forum works and to gauge whether it is suitable for me to enter or not. I am 15, and I am interested in the philosophy–politics–economy set, as well as history, AI, and logic (some might count this as philosophy, but I feel logic is a tool to think and therefore different).
I originally learned about the rationalist network through HPMOR last year, but recently as I read more about AI during the holidays, this network caught my attention. I intend to watch and experiment for now, as obviously a cursory glance at the posts suggests that the threads aren't meant for casual conversation. And as this username suggests, I'm thinking of reading through the sequences and whatever basic knowledge is needed.
Through joining, hopefully casting this account in favour of another, I hope to gain new contacts as I don't find stimulating conversation in social settings as much as I'd like.
Edit: I am a little nervous, please tell me if I have violated any norms or customs, this is my 3rd edit I think?
Welcome to LessWrong! You didn't violate any norms or customs. Hope you have some interesting discussions :-)
I have a question:
To what extent does LessWrong expect knowledge in the STEM fields? My understanding is that rationalist thinking is based on Bayesian probability calculation and being self-aware of cognitive biases to reach the truth. The problem is I'm more on the philosophical side if anything (I suppose I can do fuzzy logic as well as classical logic), as I've been trying to read as much of the philosophical cannon starting from Plato a couple years ago. And to be honest, I have to say that mathematics is arguably the subject that I'm worst at. Given this, how appropriate is it for me to enter this community?
To what extent does LessWrong expect knowledge in the STEM fields?
I mean, it helps? I wouldn't say it's required.
My understanding is that rationalist thinking is based on Bayesian probability calculation
It's less to do with Bayes as in actually-doing-the-calculation and more to do with Bayes as in recognizing-there's-an-ideal-to-approximate. Letting evidence shift your position at all is the main thing. (If you do an explicit Bayesian calculation about something real, you'll have done about as many explicit Bayesian calculations as the median LW user has this year.)
and being self-aware of cognitive biases
I mean, we were. Then we found out that most published scientific findings, including the psych literature which includes the bias literature, don't replicate. And AI, the topic most of us were trying to de-bias ourselves to think about, went kinda crazy over the last five years. So now we talk about AI more than biases. (If you can find something worthwhile to say about biases, please do!)
The problem is I'm more on the philosophical side if anything
If you pick two dozen or so posts at random, I'd expect you'll get more Philosophical ones than STEMmy ones. (AI posts don't count for either column imo; also, they usually don't hard-require technical background other than "LLMs are a thing now" and "inhuman intellects being smarter than humans is kinda scary".)
Given this, how appropriate is it for me to enter this community?
Extremely. Welcome aboard!
Hi, I joined a few days ago and I'm looking forward to contributing to this great community.
I'm transitioning back to research from startups. Currently based in London.
I'm particularly interested in mechanistic interpretability, chain-of-thought monitoring, and reasoning model interpretability. I'm excited to engage with the thoughtful discussions here on alignment and to collaborate with others.
What's your view on sceptic claims about RL on transformer LMs like https://arxiv.org/abs/2504.13837v2 or one that CoT instruction yields better results than <thinking> training?
Hello,
I've just joined LessWrong officially today, but I've been staying abreast of the content here and on Alignment Forum for the last few months. I'm extremely interested in AI Alignment research. Specifically I've decided to participate in this community to discuss alignment methodologies and interact with AI alignment researchers at the cutting edge.
Additionally, I've founded a company called Aurelius. (aureliusaligned.ai)
My goal with Aurelius is to explore misalignment in general reasoning models, collect data, and distribute it to researchers and model developers. I'm excited to get feedback on my ideas and participate in ongoing alignment methodologies. I'm based in Los Angeles, and may come to future LessWrong or Alignment Forum meet ups.
Nice to meet you all,
Austin
Hello!
I'm Misha, a "veteran" technical writer and an aspiring fiction writer. Yes, I am aware LessWrong is probably not the place to offer one's fiction - maybe there are exceptions but I'm not aware of them. I have heard of LessWrong a lot, but didn't join before because of perceived large volumes. However, I now hope to participate at least on the AI side.
I have been reading recent publications on AI misalignment, notably the big bangs from Anthropic https://www.anthropic.com/research/agentic-misalignment and OpenAI https://openai.com/index/emergent-misalignment/ .
I have my own hypothesis about a possible etiology of the observed cases of misalighment, alongside other theories (I don't think it's an all-or-nothing, an emergent behaviour can have several compounding etiologies).
My hypothesis involves "narrative completion", that is, the LLM ending up in the "symantic well" of certain genres of fiction that it trained on and producing a likely continuation in the genre. Don Quixote read so many chivalry romance novels that he ended up living in one in his head; my suspicion is that this happens to LLMs rather easily.
I have not noticed this side of things discussed in the Anthropic and OpenAI papers, not could I find other papers discussing it.
I am gearing up to replicate Anthropic's experiment based on their open soruce code, scaled down severely because of budget constraints. If I can get the results broadly in line with what they got, next I want to expand the options with some narrative-driven prompts, which, if my hypothesis works. should show significant reduction of observed misalignment.
Before doing this, ideally I'd like to make a post here explaining the hypothesis and suggested experiment in more detail. This would hopefully help me avoid blind spots and maybe add some more ideas for options to modify the experiment.
I would appreciate clarification on (a) if this is permitted and (b) if it is, then how I can make this post. Or if I can't make a post at all as a newbie, then is there a suitable thread to put it in as a comment, either by just pasting the text or by putting it on Medium and pasting the link?
Thanks!
Thank you! Now, one question: is a degree of AI involvement acceptable?
Thing is, I have an AI story I wrote in 2012 that kinda "hit the bullseye", but the thing is in Russian. I would get an English version done much quicker if I could use an LLM draft translation, but this disqualifies it from many places.
Since you wrote the original, machine translation (which is pretty decent these days) should be fine, because it's not really generating the English version from scratch. Even Google Translate is okayish.
I don't even want to post an unedited machine translation - there will be some edits and a modern "post mortem", which will be AI-generated inworld but mostly written my me in RL.
I've heard that hypothesis in a review of that blog post of Anthropic, likely by
AI Explained
maybe by
bycloud
.
They've called it "Chekov's gun".
Thanks! I couldn’t find that source, but “Chekhov’s Gun” is indeed mentioned in the original Anthropic post—albeit briefly. There’s also this Tumblr post (and the ensuing discussion, which I still need to digest fully), with a broader overview here.
While I have more reading to do, my provisional sense is that my main new proposal is to take the “narrative completion” hypothesis seriously as a core mechanism, and to explore how the experiment could be modified—rather than just joining the debate over the experiment’s validity.
I’m not convinced that “this experiment looks too much like a narrative to start with” means that narrative completion/fiction pattern-matching/Chekhov’s Gun effects aren’t important in practice. OpenAI’s recent “training for insecure coding/wrong answers” experiment (see here) arguably demonstrates similar effects in a more “realistic” domain.
Additionally, Elon Musk’s recent announcement about last-moment Grok 3.5/4 retraining on certain “divisive facts” (coverage) raises the prospect that narrative or cultural associations from that training could ripple into unrelated outputs. This is a short-term, real-world scenario from a major vendor—not a hypothetical.
That said, if the hypothesis has merit (and I emphasize that’s a big “if”), it’s worth empirical investigation. Given budget constraints, any experiment I run will necessarily be much smaller-scale compared to Anthropic's original, but I hope to contribute constructively to the research discussion—not to spark a flame war. (The Musk example is relevant solely because potential effects of last-minute model training might "hit" in the short tiem, not for any commentary on Elon personally or politically.)
With this in mind, I guess it's experiment first, full post second.
(Full disclosure: light editing and all link formatting was done by ChatGPT)
I am curious about what the Albanian prime minister means when he says he is appointing a chatbot to be a minister. I am not crazy knowledgeable about AI, though I have lurked here for a while, and I don't know if this is a new use case or not. Does anyone has any sense of what this actually means/if this is worth being concerned about in any way or if its just a PR stunt or something equally as unconcerning as a PR stunt. My read of it is that it seems like it is a helper tool that they are calling a cabinet minister for political reasons. It doesn't seem like there is a lot of information out there right now about what they actually mean by this but maybe in a few days somebody more knowledgeable than me could clear this up for me.
Apparently the Albanian government's chief strategy to reduce corruption in the civil service, is to make it unnecessary for Albanians to interact with human beings in the civil service. Instead they'll just talk to the AI ("Diella") whenever they want to get a permit and so on. So Diella herself is just a frontend to an Albanian government website, not that different to the chatbots that can be found on many other government and corporate websites. Calling her a government minister is a bit unprecedented, but ultimately it's just another example of human society trying to integrate AI by treating it as a person, in this case assigning the AI a place in society that was previously only occupied by human beings.
Hello! Long-time lurker, planning to post research results on here in the near future. I'm a currently a PIBBSS research fellow, working on LLM interpretability relating to activation plateaus and deception probes. I'll be joining Anna Leshinskaya's Relational Cognition lab in the fall as a postdoc, working on moral reasoning in LLMs. Feel free to reach out if you have any ideas, questions, etc. on any of these topics!
Hello! I’m a research engineer at Google and have been a long time lurker in EA and LessWrong forums and decided this summer to become more active and start publishing my ideas more openly. Would love to get more involved with my local community in NYC and connect with others working in the AI space.
Welcome, and I'm glad you decided to start publishing more! Have you already found the OBNYC group in New York, or could you use a pointer?
Hi,
I'm a long time lurker but finally made an account since I'm writing more. I'm particularly interested in applying concepts to the public sector.
Hello all,
I am a platform engineer in England. 18 months ago when I realised AI was a real thing it really shocked me and challenged my world-view. I had previously thought AI was a joke/fad and didn't expect to see an AGI in my lifetime.
This started me on a journey of trying to understand the implications of AI and also try to calibrate my understanding of the strengths and weaknesses of AI. This lead me to create SherlockBench, and write my first research paper:
- https://sherlockbench.com/
- https://zenodo.org/records/16270954
I am hoping LessWrong can help me connect with smart people and learn some new ideas! And maybe get some feedback on my paper (I never went to a university and don't have any academic contacts).
Thanks all,
Joseph
Hello! My name is Dominic! I am a linguist, anthropologist, and researcher based in Indianapolis.
I have been interested in rationalism for a while, and decided to finally join the forum. Recent college grad, about to start my first year of teaching high school English. I am particularly interested in the future of education (especially secondary/post-secondary), the internet and its effects of media and culture, culture and technology, language and change.
What I have read so far has been fascinating and I can't wait to dive into the vast array of literature and thought available here. Found the site from various left leaning nerdy podcasts and digital media. have done my fair amount of programming as well so if you'd like to work on a project or just connect, let me know!
Nice to meet you!
- Dominic
Hi, my name's Elena, longtime lurker here and in the ACX archives. Work at a regenerative technology vc, came into this community to explore the question of what it would mean to align with life itself, rather than just with human preferences. Specifically, how we reason about coherence when our reference frame shifts from the anthropocentric to the planetary, and what that implies for systems that optimize, adapt, or persist across scales.
Separately I am building a small fellowship experiment for people exploring adjacent questions, particularly in AI (outside LLMs), robotics, biology, and materials. Trying to build something non-residential, equity-free, and mostly meant to support people already thinking in this space to pursue their obsessions. If that sounds at all interesting I'm happy to share more if useful, otherwise very excited to become a more active member of this community :)
how we reason about coherence when our reference frame shifts from the anthropocentric to the planetary, and what that implies for systems that optimize, adapt, or persist across scales.
This sounds interesting, do you have any further reading (by yourself or others) to point folks like myself to?
I put together a little song that feels fitting for july 4th in america: https://suno.com/s/6EuRMXbG0on8vGIX
Bonus points to those who recognize where the lyrics came from.
I have known about this site for a long time and finally decided to register today. I have some ideas that I want to improve and share with the world to make it better. I am primarily interested in artificial intelligence and philosophy. I am very excited. Most likely, I will read more than write anything, especially at first. I will try to study the culture of this site and improve my ideas to fit the community guidelines.
Are open threads not getting pinned anymore? I quite liked having open thread like a centralized shortform
I've been experimenting with a specific questioning methodology that appears to elicit consistent behavioral patterns from AI systems across different architectures and instances. I'm sharing this methodology and preliminary observations to see if others can replicate the findings.
The Method: Instead of asking AI systems factual questions or giving them tasks, I ask them to make personal choices that require preference formation:
Observed Patterns:
The Question: What might these patterns indicate about the nature of information processing in advanced AI systems? Are we observing emergent preference formation, sophisticated mimicry, or something else entirely?
Request for Replication: I'm curious whether others observe similar patterns using this methodology. The approach is simple enough that anyone can try it and report their findings.
Hi, I just signed up and I don’t know if this is the intended forum for this sort of question, but I signed up to ask if anybody has an archival copy of http://transhumangoodness.blogspot.com/2008/07/universal-instrumental-values.html. It’s not been archived by the Internet Archive. I don’t remember where exactly I found this link, but it was certainly adjacent to this community. I had it in my “read later” list.
I'm seeking resources/work done related to forecasting overall progress made on AI safety, such as trying to estimate how much X-risks from AI can be expected to be reduced within the medium-term future (as in the time range people generally expect to have left before said risks become legitimately feasible). Ideally, resources trying to quantify the reduction in risk, and/or looking at technical or governance work independently (or even better, both).
If not this, the next best alternative would be resources that try to estimate reduction in AI risk from work done thus far (again, especially quantified, even if only something like an overview of progress on alignment benchmarks). And if not that, any pointers you may have for someone trying to do work like this themselves. I do expect any such estimates or work to naturally be extremely uncertain, but nonetheless believe it would be valuable for my interests in the field.
(Side note: I'm new to LW, so let me know if this post would belong better elsewhere.)
Hello all Juliano here! Excited to read more and converse with people interested in the future of work and how to reimagine capitalism given AI's emerging capabilities. Feel free to suggest any readings please.
I think the Quick Takes feed needs the option to sort by newest. It makes no sense that I get fed the some posts 3-7 times in an unpredictable order if I read the feed once per day.
Hi, everyone. I am entering the AI safety field. I want to contribute by solving the problem of unlearning.
How can we apply unlearning?
1) Make LLMs forget dangerous stuff (e.g. CBRN)
2) Current LLMs know when they're being benchmarked. So I want to get situational awareness out of them so we can benchmark them nicely.
I'm looking for:
1) Mentor
2) Collaborators
3) Discussions about AI safety
Link to Induction section on https://www.lesswrong.com/lw/dhg/an_intuitive_explanation_of_solomonoff_induction/#induction seems broken on mobile Chrome, @habryka
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.