There is a paper that shows the overreliance of in-context learning on superficial clues. it is from 2022, and the tested models are old. So, maybe newer ones are doing much better,but maybe it is not really learning, at least by some definitions.
i think the IMO result is best of 32 and USAMO is not
OpenAI is competing in the AtCoder world tour finals (heuristic division) with a new model/agent. It is a 10-hour competition with an optimization-based problem, and OpenAI's model is currently at 2nd place.
Is optimizing CoT to look nice a big concern? There are other ways to show a nice CoT without optimizing for it. The frontrunners also have some incentives to not show the real CoT. Additionally, there is a good chance that people prefer a nice structured summary of CoT by a small LLM when reasonings become very long and convoluted.
What is the point of these benchmarks without knowing the training compute and data ? One of the main questions is their interpretability. Iterative refinement of these models may open new opportunities.
People with a history of seizures are usually excluded from these kinds of clinical trials, so it is not an apple to apple comparison. the problem is that bupropion interacts with a lot of drugs. seizure rates are also highly dose dependent(10 times higher if taking more than 450 mg daily). Generally, if you’re not taking any interacting medications, are on the 150–300 mg slow-release version, and have no history of seizures, then the risk is low.
As a doctor, I can tell you that even if you don’t have anxiety, it’s possible to develop some while taking bupropion/welbutrin. I used it personally and experienced the most severe anxiety I’ve ever had. It is also associated with a higher chance of seizures, and if you daydream a lot, it may make them worse. However, on the positive side, it often decreases inattention. Generally i like the drug , but it is not a first-line treatment for depression, and for good reasons.
I lived for a while in a failing country with high unemployment. The businesses and jobs that pay well become saturated very quickly. People are less likely to spend money and often delay purchasing new stuff or maintaining their homes. Many jobs exist because we dont have time to do them ourselves, and a significant number of these jobs will just vanish. It is really hard to prepare for a high unemployment rate society.
Overrefusal issues were way more common 1-2 years ago. models like gemini 1, and claude 1-2 had severe overrefusal issues.
ED is not the only problem with finasteride. I saw a couple of cases of gynecomastia in medical school and stopped using finasteride after that. Minoxidil worked fine solo for 4 years, but applying it every night was annoying, and when I stopped using it, I went bald fast (Norwood 6 in 5 months!).