Qumeric
Qumeric has not written any posts yet.

I am curious to see what would be the results of the new Gemini 2.5 pro on internal benchmarks.
I don't think ketamine neurotoxicity is a thing. Ketamine is actually closer to be a neuroprotector.
I am confident that LLMs siginificantly boost software development productivity (I would say 20-50%) and am completely sure it's not even close to 5x.
However, despite I agree with your conclusion, I would like to point out that timeframes are pretty short. 2 years ago (~exactly GPT-4 launch date) LLMs were barely making any impact. I think tools started to resemble the current state around 1 year ago (~exactly Claude 3 Opus launch date).
Now, suppose we had 5x boost for a year. Would it be very visible? We would have got 5 years of progress in 1 year but had software landscape changed a lot in 5 years in pre-LLM era? Comparing 2017 and 2022, I don't feel like that much changed.
The tech stack has shifted almost entirely to whatever there was the most data on; Python and Javascript/Typescript are in, almost everything else is out.
I think AI agents will actually prefer strongly typed languages because they provide more feedback. Working with TypeScript, Python and Rust, while a year ago the first two were clearly winning in terms of AI productivity boost, nowadays I find Cursor Agent making fewer mistakes with Rust.
I think you might find this paper relevant/interesting: https://aidantr.github.io/files/AI_innovation.pdf
TL;DR: Research on LLM productivity impacts in material disocery.
Main takeaways:
I would like to note that this dataset is not as hard as it might look like. Humans performed not so well because there is a strict time limit, I don't remember exactly but it was something like 1 hour for 25 tasks (and IIRC the medalist only made arithmetic errors). I am pretty sure any IMO gold medailst would typically score 100% given (say) 3 hours.
Nevertheless, it's very impressive, and AIMO results are even more impressive in my opinion.
Thanks, I think I understand your concern well now.
I am generally positive about the potential of prediction markets if we will somehow resolve the legal problems (which seems unrealistic in the short term but realistic in the medium term).
Here is my perspective on "why should a normie who is somewhat risk-averse, don't enjoy wagering for its own sake, and doesn't care about the information externalities, engage with prediction markets"
First, let me try to tackle the question at face value:
Good to know :)
I do agree that subsidies run into a tragedy-of-commons scenario. So despite subsidies are beneficial, they are not sufficient.
But do you find my solution to be satisfactory?
I thought about it a lot, I even seriously considered launching my own prediction market and wrote some code for it. I strongly believe that simply allowing the usage of other assets solves most of the practical problems, so I would be happy to hear any concerns or further clarify my point.
Or another, perhaps easier solution (I updated my original answer): just allow the market company/protocol to invest the money which are "locked" until resolution to some profit generating strategy and share the profit with users. Of course, it should be diversified, both in terms of investment portfolio and across individual markets (users get the same annual rate of return, no matter what particular thing they bet on). It has some advantages and disadvantages, but I think it's a more clear-cut solution.
Isn't this just changing the denominator without changing the zero- or negative-sum nature?
I feel like you are mixing two problems here: an ethical problem and a practical problem. UPD: on second thought, maybe you just meant the second problem, but still I think my response would be clearer by considering them separately.
The ethical problem is that it looks like prediction markets do not generate income, thus they are not useful and shouldn't be endorsed, they don't differ much from gambling.
While it's true that they don't generate income and are zero-sum games in a strictly monetary sense, they do generate positive externalities. For example, there could be a prediction market about an increase... (read 459 more words →)
I do think we are in a coding overhang.
Current harnesses seem far from the ceiling and could be improved a lot. One example: you can significantly boost output quality by using simple tricks, like telling Claude Code to implement something and then explicitly asking it to self-review. You could get slightly more creative -- say, ask to implement something security-sensitive, then ask to break it.
And I feel the same way Andrej does. Even though I consider myself relatively adept at using AI agents, I feel like I am doing a quite horrible job. In principle, these tools could be utilized much more efficiently.
But perhaps even larger bottleneck is on the organizational level.... (read more)