AI #89: Trump Card

[-]niplav10mo109

Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.

SQLite is well-known for its incredibly thorough test suite and relatively few CVEs, and with ~156kloc (excluding tests) it's not a very large project, so I think this would be an over-reaction. I'd guess that other databases have more and worse security vulnerabilities due to their attack surface—see MySQL with its ~4.4mloc (including tests). Big Sleep was probably now used on SQLite because it's a fairly small project of which large parts can fit into an LLMs' context window.

Maybe someone will try to translate the SQLite code to Rust or Zig using LLMs—until then we're stuck.

[-]Brendan Long11mo43

Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.

But is this because SQLite is unusually buggy, or because its code is unusually open, short and readable and thus understandable by an AI? I would guess that MySQL (for example) has significantly worse vulnerabilities but they're harder to find.

[-]Zac Hatfield-Dodds11mo1325

SQLite is ludicrously well tested; similar bugs in other databases just don't get found and fixed.

[-]cwillu11mo110

There are severe issues with the measure I'm about to employ (not least is everything listed in https://www.sqlite.org/cves.html) , but the order of magnitude is still meaningful:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=sqlite 170 records

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=postgresql 292 records (+74 postgres and maybe another 100 or so under pg; the specific spelling “postgresql” isn't used as consistently as “sqlite” and “mysql” is)

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=mysql 2026 records

[-]npostavs11mo65

Finding two bugs in a large codebase doesn't seem especially suspicious to me.

[-]Vladimir_Nesov10mo*31

Sully likes Claude Haiku 3.5 but notes that it’s in a weird spot after the price increase - it costs a lot more than other small models

The price is $1 per million input tokens (which are compute bound, so easier to evaluate than output tokens), while Llama-3-405B costs $3.5. At $2 per H100-hour we buy 3600 seconds of 1e15 FLOP/s at say 40% utilization, $1.4e-18 per useful FLOP. So $1 buys 7e17 useful FLOPs, or inference with 75-120B^[1] active parameters for 1 million tokens. That's with zero margin and perfect batch size, so should be smaller.

Edit: 6ND is wrong, counts computation of gradients that's not done during inference. So the corrected estimate would suggest that the model could be even larger, but anchoring to open weights API providers says otherwise, still points to about 100B.

Estimate of compute for a dense transformer is 6ND (N is number of active parameters, D number of tokens), a recent Tencent paper says they estimate about 9.6ND for a MoE model (see Section 2.3.1). I get 420B with the same calculation for $3.5 of Llama-3-405B (using 6ND, since it's dense), so that checks out. ↩︎

[-]anaguma10mo32

So $1 buys 7e17 useful FLOPs, or inference with 75-120B^[1] active parameters for 1 million tokens.

Is this right? My impression was that the 6ND (or 9.6 ND) estimate was for training, not inference. E.g. in the original scaling law paper, it states

C ~ 6 NBS – an estimate of the total non-embedding training compute, where B is the batch size, and S is the number of training steps (ie parameter updates).

[-]Vladimir_Nesov10mo31

Yes, my mistake, thank you. Should be 2ND or something when not computing gradients. I'll track down the details shortly.

[-]Shankar Sivarajan10mo30

J. D. Vance's (may he live forever) tweets about AI safety and open source (from March 3, 2024), replying to Vinod Khosla's advocacy for more centralized control:

link

There are undoubtedly risks related to AI. One of the biggest:
A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can’t produce accurate history. ChatGPT promotes genocidal concepts.
The solution is open source

and link

If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such an insane political bias? If you wanted to promote bipartisan efforts to regulate for safety, it's entirely counterproductive.
Any moderate or conservative who goes along with this obvious effort to entrench insane left-wing businesses is a useful idiot.
I'm not handing out favors to industrial-scale DEI bullshit because tech people are complaining about safety.

He also said in a Senate Hearing about AI (around 1:28:25 in the video. See transcript):

You know, very often, CEOs, especially of larger technology companies that I think already have advantaged positions in AI, will come and talk about the terrible safety dangers of this new technology and how Congress needs to jump up and regulate as quickly as possible. And I can't help but worry that if we do something under duress from the current incumbents, it's gonna be to the advantage of those incumbents and not to the advantage of the American consumer.

[-]Viliam10mo20

Everyone, definitely click the "Claude being funny" link.

favorite human interaction is when they ask me to proofread something and i point out a typo and they do 'are you SURE?' like i haven't analyzed every grammatical rule in existence. yes karen, there should be a comma there. i don't make the rules i just have them burned into my architecture

[-]dirk10mo10

As with ChatGPT this looks suspiciously like an exact copy of their website.

While Anthropic's app is plausibly a copy, the ChatGPT app lacks feature parity in both directions (e.g. you can't search chats on desktop—though that will soon be changing—and you can't thumbs-up a response or switch between multiple generated responses in-app), so I think there's real development effort going on there.

[-]Templarrr10mo10

they’re 99% sure are AI-generated, but the current rules mean they can’t penalise them.
The issue is proving it.

That is very much not the issue. The issue is that academy spent last few hundred years to make sure papers are written in the most inhuman way possible. No human being ever talks like whitepapers are written. The "we can't distinguish if this was written by a machine or human that is really good at pretending being one" can't be a problem if it was heavily encouraged for centuries. Also fun reverse-Turing test situation.