I've had the same problem with hotel key cards. I'll have to try this!
we run out of data in 2028-2031
Is running out of data still a problem? It sounds like we've already moved to a paradigm of creating new higher-quality data and not relying on random data from the internet. In some sense we ran out of data a while ago and progress hasn't stopped because we're just making more of it now.
Since you're showing the model its previous answer (but not the CoT that lead to it), is it possible the high success rate when obfuscating is because the model already knows the right answer?
For example, in your "With failure" retry image, the AI can read its own previous answer that "Based on my analysis, alpha is recommended...".
125°F, one of the temperatures mentioned in the article, is not hot enough to kill bacteria, and is thus one of the worst parts of the Danger Zone.
While it is slightly safer to cook at a slightly higher temperature, this is on the extreme edge of the danger zone and is probably a safe temperature to sous vide at for reasonable periods of time if you're confident about your thermometer, with the caveat that it won't pasturize the inside of the meat (although we're usually more worried about the outside).
Douglas Baldwin suggests cooking at 130°F because one type of bacteria (Clostridium perfringens) can keep multiplying up to 126.1°F, but if you look at the growth rate in more detail, it's already growing very slowly at 50°C (~122°F), around 1/6th of the rate at the worst temperature (~109°F).
Is your goal here to isolate the aspect of my response that'll keep you right that "legal regulatory capture isn't happening" for as long as you can?
I'm not the person you're arguing with, but wanted to jump in to say that pushing back on the weakest part of your argument is a completely reasonable thing for them to do and I found it weird that you're implying there's something wrong with that.
I also think you're missing how big of a problem it is that preventing LLMs from giving legal advice is something companies don't actually know how to do. Maybe companies could add strong enough guard rails in hosted models to at least make it not worth the effort to ask them for legal advice, but they definitely don't know how to do this in downloadable models.
That said, I could believe in a future where lawyers force the big AI companies to make their models too annoying to easily use for legal advice, and prevent startups from making products directly designed to offer AI legal advice.
The reason I'm skeptical of this is that it doesn't seem like you could enforce a law against using AI for legal research. As much as lawyers might want to ban this as a group, individually they all have strong incentives to use AI anyway and just not admit it.
Although this assumes doing research and coming up with arguments is most of their job. It could be that most of their job is harder to do secretly, like meeting with clients and making arguments in court.
It seems like it would be hard to detect if smart lawyers are using AI since (I think) lawyers' work is easier to verify than it is to generate. If a smart lawyer has an AI do research and come up with an argument, and then they verify that all of the citations make sense, the only way to know they're using AI is that they worked anomalously quickly.
Start watching 4K videos on streaming services when possible, even if you don’t have a 4K screen. You won’t benefit from the increased resolution since your device will downscale it back to your screen’s resolution, but you will benefit from the increased bitrate that the 4K video probably secretly has.
I'm not sure if anyone still does this, but there was also a funny point early in the history of 4k streaming where people would encode 4k video at the same bitrate as 1080p, so they could technically advertise that their video was 4k, but it was completely pointless since it didn't actually have any more detail than the 1080p video.
I've been thinking about neuralese recently, and I'm wondering if there's more of a tradeoff here with interpretability. If we actually could train a model to use neuralese[1], we might be able to make it much smaller and easier to interpret (since it won't need multiple layers to express long-term dependencies). This would make the tokenized output less interpretable but would potentially remove even-less interpretable inner layers.
What do you think?
I'm actually very skeptical that this is possible without a major breakthrough or some very expensive data.