Request for Information on synthetic data evaluations
Causal effect predictiveness of a surrogate asks whether the effects of an intervention on S, the surrogate, is predictive of the effects of the intervention on the gold standard outcome. I’m still looking for a causal effect predictiveness angle on making decisions on synthetic data.
So far, I've found two interesting commentaries, one in the Lancet and the other from the FDA on evaluating "synthetic datasets".
"Although synthetic data address crucial shortages of real-world training data, their overuse might propagate biases, accelerate model degradation, and compromise generalisability across populations. A concerning consequence of the rapid adoption of synthetic data in medical AI is the emergence of synthetic trust—an unwarranted confidence in models trained on artificially generated datasets that fail to preserve clinical validity or demographic realities."
Suchi Saria says
I agree simulation can help when we are starting with real data and then simulating alternative scenarios (eg additional missingness, different order protocols) for which there is realistic support (often hard for people w/o stats/causal inf background to deduce) but majority of efforts around simulation data that I see are mostly an engineering exercise in generating data without understanding whether the inferences we are drawing from it are valid.
Clozapine has been a highly underutilized drug for decades https://psychiatryonline.org/doi/full/10.1176/appi.ps.201700162
40+ years ago when people were investigating the mechanism of action of clozapine, they knew that some of its benefits might have had to do with muscarinic receptors as well. In fact, some of these investigations were done at Pfizer! https://paperpile.com/shared/shUc~GTZ0T3qFvQ84LykSmA
So the story that cobenfy has a brand new mechanism of action isn't quite right, but it is the first approved drug to have deliberately targeted muscarinic receptors.