Thanks, well done and worth the read!

]]>Maxim,

Thank you for the detailed response comparing FTP to BLR.

In order to optimize the Minitab Event Probability model, you could set up the equation in Excel and then use Solver or DiscoverSim to perform the optimization.

John

]]>John,.

I dug a bit, stretching my matrix algebra skills beyond their limits and here here what I ended up with – trying to investigate mathematically 2 issues.

1. Why significance of terms (main effects and 2-way interactions are so different between BLR and FTPed DOE

2. Why even directions of factor influence (resulted from main effects and 2-way interactions) are different between FTPEd DOE and BLR (i had a particular situation when to minimize Y – defective rate BLR says factor A should go up, DOE says it should go down). Here are my thoughts:

1. Significance difference is due to the difference for how Stnadard Errors are calculated

Here is DOE

Estimated Effects and Coefficients for C13 (coded units)

Term Effect Coef SE Coef T P

Constant 0.09534 0.007079 13.47 0.000

instruction -0.08692 -0.04346 0.007079 -6.14 0.002

form -0.01701 -0.00851 0.007079 -1.20 0.283

font -0.00046 -0.00023 0.007079 -0.03 0.975

method -0.00630 -0.00315 0.007079 -0.44 0.675

assistance 0.01252 0.00626 0.007079 0.88 0.417

instruction*font -0.01844 -0.00922 0.007079 -1.30 0.250

instruction*assistance -0.02022 -0.01011 0.007079 -1.43 0.213

form*font 0.05530 0.02765 0.007079 3.91 0.011

form*method -0.02801 -0.01401 0.007079 -1.98 0.105

font*method -0.02793 -0.01397 0.007079 -1.97 0.105

S = 0.0283147 PRESS = 0.0410482

R-Sq = 93.05% R-Sq(pred) = 28.81% R-Sq(adj) = 79.14%

SE Coefficient is calculated based on the formulae like Se(bi) = Sqrt [MSE / (SSXi * TOLi) ]

where MSE is the mean squares for error from the overall ANOVA summary, SSXi is the sum of squares for the i-th independent variable, and TOLi is the tolerance associated with the i-th independent variable.

TOLi = 1 – Ri^2, where Ri^2 is determined by regressing Xi on all the other independent variables in the model.

Then the rest is trivial. T equals Coeff/SE and so on.

For BLR:

Binary Logistic Regression: incomplete, total_app versus instruction, form, …

Link Function: Logit

Response Information

Variable Value Count

incomplete Event 642

Non-event 6283

total_app Total 6925

Logistic Regression Table

95%

Odds CI

Predictor Coef SE Coef Z P Ratio Lower

Constant -1.93327 0.122754 -15.75 0.000

instruction

Y -0.677131 0.181209 -3.74 0.000 0.51 0.36

form

B -0.550191 0.145998 -3.77 0.000 0.58 0.43

font

medium -0.209086 0.155112 -1.35 0.178 0.81 0.60

method

computer 0.394121 0.168885 2.33 0.020 1.48 1.07

assistance

Y 0.350131 0.0975177 3.59 0.000 1.42 1.17

instruction*font

Y*medium -0.471691 0.217636 -2.17 0.030 0.62 0.41

instruction*assistance

Y*Y -0.442080 0.220134 -2.01 0.045 0.64 0.42

form*font

B*medium 1.33891 0.174881 7.66 0.000 3.81 2.71

form*method

B*computer -0.622187 0.175994 -3.54 0.000 0.54 0.38

font*method

medium*computer -0.532491 0.175341 -3.04 0.002 0.59 0.42

The standard error for the coefficients can be retrieved from the inverse Hessian matrix calculated during the model fitting phase and can be used to give confidence intervals for the odds ratio. The standard error for the i-th coefficient of the regression can be obtained as: SEi = sqrt(diag(H-1)i) where Hessian matrix is built of second derivatives of Y by all pairs of Xs with direct d2y/(dxi)2 on diagonal of the matrix.

One thing from this matrix which is also visible from Minitab output is that SE for DOE is equal for all terms, whereas ones for BLR are different! This difference stays even if sample sizes for all treatments are the same. This happens when Hessian matrix takes second derivatives from f (which is logit in this case). Visually, one can see this non-linearity when makes a Scatter plot of probability (logit of linear Y) vs. single X. Standard Error around probability of 0.5 is higher (slope is close to vertical).

Here is the reason for difference in significance.

Second point re difference in impact of factors.

I have found one important difference how interactions are taken into predictions for both models.

DOE takes low level as -1 and high as +1, so for interactions -1*-1=1 and +1*+1=1 so we have 2 treatments in DOE structure contributing to +1. However BLR thinks in terms (0,1), so it takes only (+1,+1) point as interaction. It lists them in the coefficient table. In other words (-1,-1) point in DOE terms sits together with (-1+1) and (+1-1) and is treated as 0.

This is a massive difference, and the one which needs to be taken into consideration while building a model in Excel using BLR coefficients (initially I made a mistake treating them as (+1-1 ) for interactions.

This explains for me practical differences in predictions.

Again – BLR is more trustworthy (but still a bit trickier in predictions).

.

Regards,

Maxim

John,

This is reply from Minitab ( Thank you, Joel M. Smith for detailed answer!!!)

“Lets start with the transformation you can read plenty about these transformations, including an alternate for small values of x or n, at http://rfd.uoregon.edu/files/rfd/StatisticalResources/arcsin.txt. It is plain text but has plenty of information. The basics of why you would do that are (a) you end up with an easier-to-understand model and (b) you hopefully get residuals that behave well. Conversely, there are some big issues with it. One is that while the model may make more sense, your response is no longer easier to understand. So if you use Response Optimizer, for example, you are now optimizing some number that behaves in the right direction (minimizing the transformed response would also minimize the actual proportion) but gives a predicted value that doesnt make sense (so if it predicts a transformed y of .18 that doesnt correspond to an 18% failure rate) so you then have to transform the predicted y back to real values. Additionally, depending on the dataset you could start to predict values less than 0 or greater than 1 which is obviously undesirable.

In your dataset, both x and n are fairly large so theres really no need for a transformation anyway I would just divide them and use the proportion as the response. The transformation is only potentially handy when these numbers are fairly small. With proportion you get the easy model AND a response that is easy to interpret. But you could still predict outside of (0,1) and your still fitting a linear model to something that generally does not behave linearly.

BLR is really the best model (as you saw) and really the only drawback there is that the output is not as easy to understand and the model is difficult to interpret. But an easy way to find optimal setting is to store the event probabilities when you run the model, and then graph them on an Individual Value Plot and use Brushing with Set ID Variables to identify the combination with the desired response (for example, the smallest points on the graph).

Adding response optimization to BLR is on our list of to-dos in the software so hopefully once we are able to add that BLR will be usable enough that you dont even have to consider these issues!”

So essentially he says that FTP makes sense only for small numbers of both X and N. No definite borders mentioned. If I am asked at a class I will probably come up with magical <30 for n – and 20%< 5 for x (even thought I have no firm evidence). But I would really stick to Binary Logistic Regression. Some reasons for differences between FTPEd DOE and BLR – in a separate post.

Regards,

Maxim

Maxim, Michael,

Thank you for your comment. There is definitely a trade off for BLR on statistical accuracy versus difficulty of interpretation. I would be very interested to hear if you come up with a Rule of Thumb on sample size where the Freeman-Tukey transformation results are practically the same as BLR.

]]>We recommend using FTP transformation for binary outputs of DOE before analyzing it. FTP transform does the following (arcsin(sqrt(x/(n+1))+arcsin(sqrt((x+1)/(n+1)))/2 (I hope I made no mistakes with brackets, x being number of events and N – sample size). As you may see, it does take sample size into account, but its impact rapidly goes down when n goes up.

Recently I was preparing a data set for training and decided to compare results of analysis via DOE and Binary Logistic Regression. Results were nothing short of shocking.

Data set was 5 factors, half factorial, with sample size per treatment being 600. The output was number of incomplete credit card applications. DOE on FTPed data was able to find only 2 terms as significant. BLR found 9 – with 5 interactions!!! That was obviously due to the fact that BLR takes sample size into significance calculations.

Then I took significant terms from BLR to DOE and run Response Optimizer. And I got second big surprise. Even direction of a factor impact calculated from main effects and interactions was different. DOE would say – Increase A to decrease number for defectives. BLR would say – decrease A! This is obviously due to differences between purely linear model of terms impact in DOE and log transform of BLR.

Again, it should be no surprise that results would be different, if one looks at actual math behind these 2 tools. However I did not expect difference to be so practically significant!

My take away – use BLR if you know how to run it and read results. However, for some LSS students playing with interactions in BLR and making final transfer function of probability with backward LN transform manually might be a tough call.

So we are investigating with Minitab possible limits of using FTPed data within DOE functionality (i guess it will be something like Sample size5 or something). If you wish I can keep you guys posted on the development.

Regards,

Maxim Korenyugin and Michael Ohler

John,

you are spot on right. I recommend using FTP transform from Minitab which does following:

(arcsin(sqrt(x/(n+1)+arcsin(sqrt((x+1)/(n+1)))/2 (I hope I made no mistakes re-typing it, x being number of events and n – sample size).

As you may see, it does take into account sample size, but it’s impact is very limited on the transformed value – especially on the bigger sample sizes.

I ran across this problem when preparing a data set for a training. I started comparing results of FTPed analysis via classical DOE vs. Binary Logistic Regression (BLR). It was 5 factors half factorial. Results were little short of shocking. While DOE was able to find only 2 terms as significant, BLR found 7 – including 2 interactions. This is obviously a result of taking sample sizes directly into significance calculations. Then I decided to take significance from BLR and added those terms to DOE and ran Response Optimizer. Then I got second big surprise – even direction of combined impacts of main terms and interactions was different between DOE and BLR. In other words BLR for Term A would say “Increase” and DOE says “Decrease” to minimize output (output was proportion of defective credit application forms). This is obviously due to linear/non-linear way of modeling impact of terms in DOE and BLR respectively. Again, by simply looking at the math, one should expect some differences, but not such big ones!

I found that practical difference diminishes when sample size goes down.

Bottom line – For anyone who is advanced enough I would recommend BLR without hesitation (with 2-way interactions as well). For some LSS students however it might be a tough call. So we are talking to Minitab trying to define a border when FTPed DOE would still be OK (like Sample size<30 or something). I can keep you posted as soon as I hear more from them.

Regards,

Maxim

Thanks for your interest on the Article.

Table-1 is just a sample for 1 set of experiments with 10 Replicates. Due to space constraints it was difficult to display the entire experiments. If you look at Table-2 I have given the summary of the results with converted proportion values. Hope it clarifies.

The way to do that (and thus also avoid the need to transform) is to use binary logistic regression.

]]>