Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

What is the point of these benchmarks without knowing the training compute and data ? One of the main questions is their interpretability. Iterative refinement of these models may open new opportunities.

People with a history of seizures are usually excluded from these kinds of clinical trials, so it is not an apple to apple comparison. the problem is that bupropion interacts with a lot of drugs. seizure rates are also highly dose dependent(10 times higher if taking more than 450 mg daily). Generally, if you’re not taking any interacting medications, are on the 150–300 mg slow-release version, and have no history of seizures, then the risk is low.

Hopenope*130

As a doctor, I can tell you that even if you don’t have anxiety, it’s possible to develop some while taking bupropion/welbutrin. I used it personally and experienced the most severe anxiety I’ve ever had. It is also associated with a higher chance of seizures, and if you daydream a lot, it may make them worse. However, on the positive side, it often decreases inattention. Generally i like the drug , but it is not a first-line treatment for depression, and for good reasons.

I lived for a while in a failing country with high unemployment. The businesses and jobs that pay well become saturated very quickly. People are less likely to spend money and often delay purchasing new stuff or maintaining their homes. Many jobs exist because we dont have time to do them ourselves, and a significant number of these jobs will just vanish. It is really hard to prepare for a high unemployment rate society.

Overrefusal issues were way more common 1-2 years ago. models like gemini 1, and claude 1-2 had severe overrefusal issues.

 Your argument is actually possible, but what evidences do you have, that make it the likely outcome?

the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there. 

The recurrent paper is actually scary, but some of the stuff there are actually questionable. is 8 layers enough for a 3.5b model? qwen 0.5b has 24 layers.there is also almost no difference between 180b vs 800b model, when r=1(table 4). is this just a case of overcoming insufficient number of layers here?

Would you update your timelines, if he is telling the truth ?

Is COT faithfulness already obsolete?  How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency? 

Load More