"GPT-4's release was delayed by ~8 months because they wanted to do safety testing"

I have heard this claim before (with 6 months).  This could be understood as "GPT-4 was ready to go 6 month earlier, they simply did a lot of testing to go the extra mile."

Alternatively this is how long it took to make the foundational model useful, and while they did spend extra resources for red teaming etc. in parallel, this didn't come with a great cost of releasing it later. 

 Are we sure they didn't just count the time to RLHF it? Seems plausible to me that it always takes ~ 20% of dev time to RLHF a model. (epistemic status: spitballing)