LESSWRONG
LW

3150
Sheikh Abdur Raheem Ali
360Ω921650
Message
Dialogue
Subscribe

Software Engineer (formerly) at Microsoft who may focus on the alignment problem for the rest of his life (please bet on the prediction market here).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
2Sheikh Abdur Raheem Ali's Shortform
3y
19
Willpower is exhausting, use content blockers
Sheikh Abdur Raheem Ali1d10

I upvoted, but personally I don't find much use for content blockers when I'm in an office or meeting where other people can see my screen, when I'm at the gym with a trainer, or when I'm really excited about a task. I have ADHD and am not the best at managing my time, so You Don't Hate Polyamory, You Hate People Who Write Books is in full effect here.

  • I'm surprised that you don't include Plucky Filter in this list.
  • It can be helpful to enable Assistive Access right at the start of my day so that I don't get into messages until I have completed my morning routine.
    • Once I have started my commute, I disable it and then I call someone from my team/family.
  • Rescuetime is an alternative to Freedom with better reporting.
    • though I prefer Freedom's interface for regularly scheduled focus blocks.
  • Inbox when ready helps manage email distractions.
    • although this means I need to use my phone to receive two factor authentication codes.
  • I have not configured focus sets in LeechBlock NG, but it might be of interest to those who prefer a more structured daily workflow.
  • Cold Turkey can serve as an additional layer of security.
  • I can recommend Tab Scheduler with auto open and close for setting and enforcing 1-minute timeouts, which I have found to be more effective than 1-second delays. 

I love watching/discussing anime with my siblings and cherish the fanfiction that I read while growing up, so I find it a little sad that you are unable to enjoy these forms of entertainment recreationally except on Saturday. Blogs and Wikipedia in particular have served to greatly expand my world, even if they have also resulted in some unintentional sleepless nights. Gaming seems taboo amongst my researcher friends, but not my normie friends, I suspect this is because the former are more susceptible to overoptimizing. There is a delicate balance for scholars to strike between connection and seclusion here.

More recently one of my bad habits has been spending hours trying to get AI to solve a problem which is beyond the reliable capability of current models, instead of thinking through the problem myself or with a human collaborator. 

These two papers [2412.15584] To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models and [2503.14499] Measuring AI Ability to Complete Long Tasks have served to improve my understanding in this area. My current state of knowledge suggests that if a goal-oriented conversation with a model has lasted for ~26 minutes without a clear resolution, then further engagement is more likely than not to result in frustration.

Reply
Legible vs. Illegible AI Safety Problems
Sheikh Abdur Raheem Ali3d60

I see. My specific update from this post was to slightly reduce how much I care about protecting against high-risk AI related CBRN threats, which is a topic I spent some time thinking about last month. 

I think it is generous to say that legible problems remaining open will necessarily gate model deployment, even in those organizations conscientious enough to spend weeks doing rigorous internal testing. Releases have been rushed ever since applications moved from physical CDs to servers, because of the belief that users can serve as early testers for bugs, and that critical issues can be patched by pushing a new update. This blog post by Steve Yegge from ~20 years ago comes to mind: https://sites.google.com/site/steveyegge2/its-not-software. I would include LLM assistants in the category of "servware".

I would argue that we are likely dropping the ball on both legible and illegible problems, but I agree that making illegible problems more legible is likely to be high leverage. I believe that the Janus/cyborgism cluster has no shortage of illegible problems, and consider https://nostalgebraist.tumblr.com/post/785766737747574784/the-void to be a good example of work that attempts to grapple with illegible problems.

Reply11
Reason About Intelligence, Not AI
Sheikh Abdur Raheem Ali5d92

If an LLM says "I enjoy going on long walks" or "I don't like the taste of coffee", it is obviously lying because LLMs do not have access to those experiences or sensations. But a human saying those things might also be lying, you just can't tell quite as easily. There is nothing wrong about an LLM saying these things other than the wrongness of lying, as with humans.


Why would it be obviously lying? Would you also say that a blind person cannot have a favorite color? You could be talking about the idea of a thing, rather than the thing itself. 

There is a distinction between simulator and simulacra which I feel this section of the post may not be taking into account. An LLM assistant can enjoy writing about certain topics more than others. If a character in a story has some property, then it seems to me that we can make true and false statements about the state of that attribute.

Also, I am not sure I agree with considering corporations, nations, and other organizations to be a good example of superintelligence. I can see how it meets the criteria for the particular definition you use— you define the term more broadly than usual and I think it makes the concept less useful.

Reply
Nina Panickssery's Shortform
Sheikh Abdur Raheem Ali9d10

Tinker is an API for LoRA PEFT. You don’t mention it directly, but it’s trendy enough that I thought your comment was a reference to it.

Reply
ARENA 7.0 - Call for Applicants
Sheikh Abdur Raheem Ali21d20

I applied. It took me 10 minutes to complete all the questions up until the last one. It took me 160 minutes to read the anti-scheming paper, understand it, attempt to run a follow-up experiment, write some initial thoughts, and cut them to fit into the word limit. I'm not very satisfied with the answer I gave but I'd only budgeted 180 minutes (2x the recommended 90 mins) and didn't want to go over.

Reply
CRC Follow-up Report v1.0 — OpenAI Feedback Integration Edition
Sheikh Abdur Raheem Ali1mo11

This is quite clearly slop.

Reply
“If Anyone Builds It, Everyone Dies” release day!
Sheikh Abdur Raheem Ali2mo54

The UK book cover is so much better.

Reply
davekasten's Shortform
Sheikh Abdur Raheem Ali2mo*1-1

For some H-1B folks with strong evidence (publications, awards, press, critical roles), O-1 may be an option—talk to your employer/immigration counsel. Not legal advice.

 

(Previous version of this comment which got downvoted)
I think that most people reading this would be eligible for an O1 visa. Please feel free to send me a lesswrong DM if you need help finding a lawyer or a connection to someone that can write a support letter for you.

Reply
My Minor AI Safety Research Projects (Q3 2025)
Sheikh Abdur Raheem Ali2mo10

These are all cool projects and I like them but I find it hard to label this as safety research. To me it seems that the primary value from working on these was in improving general ML skills which could be applied towards solving a broad variety of problems. Perhaps I’m missing a more direct link to a theory of impact or threat model here.

Reply
How To Become A Mechanistic Interpretability Researcher
Sheikh Abdur Raheem Ali2mo10

Thank you for sharing this guide. I'm trying to understand how much we know about the typical thought process that is generating some of the common mistakes. I can't speak to the specific motivations or goals of any individual in particular, but I'd speculate that if smart people are consistently appearing to make the same errors then there may be something more interesting going on that we can learn from.

I agree that avoiding compute-heavy steps is a good idea for those without a lot of prior ML experience. Even if you have (or expect to acquire) the resources to afford investing in a large training run, not knowing what you're doing almost always incurs a significant cost overhead, and the long iteration cycles tend to bottleneck the number of experiments you can run during a sprint. However everyone knows that big GPU clusters can be quite challenging to work with from an engineering perspective and so experience doing e.g multi-gpu SFT tends to be helpful for developing tacit knowledge and skillsets which are highly sought after in industry roles. [1]

It's less clear to me why someone would try to build on a highly technical method when they don't meet the prerequisites to fully understand a paper's approach and limitations. It could be driven by higher than average levels of self-belief and risk-tolerance, since some level of overconfidence can lead to better outcomes and faster growth than perfect calibration. The people who are equipped to properly evaluate and review complex work tend to be in short supply, but are disproportionately responsible for the most popular works, and it seems reasonable for someone who derives inspiration from a certain research direction to be naively excited about contributing to it. There's a power law distribution in the public attention that each paper receives, with a tendency for more eyeballs to be placed on works that push the envelope of what's possible in the field, which contradicts the intuition that rarity and prevalence should be inversely proportional. 

It would be understandable if people who are primarily consumers of good solutions to hard technical problems have a tendency to underestimate how easy it is to generate them. And the best attempt of someone whose foundation isn't quite there yet can look like cargo culting on surface level features instead of a reasonable extension to prior work. But I'm not satisfied with this explanation and would be interested in hearing other perspectives on why people tend to become susceptible to this category of errors.

  1. ^

    One possible factor could be that in certain circles taking a pile of cash and setting it on fire makes you cool because it shows that you can do things which cost a lot of money, but thankfully the vast majority of researchers I know are quite responsible and strive to minimize waste, so I don't think that's what's going on here. I do think it means we should be careful to mentally separate "startup founder with access to impressive million dollar cluster" from "person that is qualified to run and debug jobs on impressive million dollar cluster".

Reply
Load More
2Sheikh Abdur Raheem Ali's Shortform
3y
19