Sunday, May 31, 2015

Paper Helicopter Experiment, part III

As final part of my paper helicopter experiment analysis (part I, part II) I do a reanalysis for one more data set. In 2002 Erik Erhardt and Hantao Mai did an extensive experiment, see The Search for the Optimal Paper Helicopter. They did a number of steps, including variable screening, steepest ascend and confirmatory experiment. For my part, I have combined all those data in one data set, and checked what kind of model would be used.

Data

The data extracted contains 45 observations. These observations have a number of replications, for instance a central composite design has a replicated center and the optimum found has been repeatedly tested.
After creation of a factor combining all variables it is pretty easy to examine the replications. The replications are thus. Here the first eight variables are the experimental settings, allvl is the factor combining all levels, Time is response and Freq the frequency of occurrence for allvl:
   RotorLength RotorWidth BodyLength FootLength FoldLength FoldWidth
1         8.50       4.00        3.5       1.25          8       2.0
2         8.50       4.00        3.5       1.25          8       2.0
3         8.50       4.00        3.5       1.25          8       2.0
4         8.50       4.00        3.5       1.25          8       2.0
5         8.50       4.00        3.5       1.25          8       2.0
6         8.50       4.00        3.5       1.25          8       2.0
7        11.18       2.94        2.0       2.00          6       1.5
8        11.18       2.94        2.0       2.00          6       1.5
9        11.18       2.94        2.0       2.00          6       1.5
10       11.18       2.94        2.0       2.00          6       1.5
11       11.18       2.94        2.0       2.00          6       1.5
12       11.18       2.94        2.0       2.00          6       1.5
13       11.50       2.83        2.0       1.50          6       1.5
14       11.50       2.83        2.0       1.50          6       1.5
15       11.50       2.83        2.0       1.50          6       1.5
   PaperWeight DirectionOfFold                                 allvl  Time Freq
1        heavy         against  8.5.4.0.3.5.1.2. 8.2.0.heavy.against 13.88    3
2        heavy         against  8.5.4.0.3.5.1.2. 8.2.0.heavy.against 15.91    3
3        heavy         against  8.5.4.0.3.5.1.2. 8.2.0.heavy.against 16.08    3
4        light         against  8.5.4.0.3.5.1.2. 8.2.0.light.against 10.52    3
5        light         against  8.5.4.0.3.5.1.2. 8.2.0.light.against 10.81    3
6        light         against  8.5.4.0.3.5.1.2. 8.2.0.light.against 10.89    3
7        light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 17.29    6
8        light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 19.41    6
9        light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 18.55    6
10       light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 15.54    6
11       light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 16.40    6
12       light         against 11.2.2.9.2.0.2.0. 6.1.5.light.against 19.67    6
13       light         against 11.5.2.8.2.0.1.5. 6.1.5.light.against 16.35    3
14       light         against 11.5.2.8.2.0.1.5. 6.1.5.light.against 16.41    3
15       light         against 11.5.2.8.2.0.1.5. 6.1.5.light.against 17.38    3

Transformation

It is also possible to do a regression of Time against allvl and examine the residuals. Since it is not difficult to imagine that error is proportional to elapsed time this is done for both original data and log10 transformed data.
As can be seen it seems that larger values have the larger error, but it is not really corrected very much by a log transformation. To examine this a bit more, the Box-Cox transformation is used. From there it seems square root is almost optimum, but log and no transformation should also work. It was decided to use a square root transformation.
Given the square root transformation the residual error should not be lower than 0.02, since that is what the replications have. On the other hand, much higher than 0.02 is a clear sign of under fitting.
Analysis of Variance Table

Response: sqrt(Time)
          Df  Sum Sq Mean Sq F value    Pr(>F)    
allvl      3 1.84481 0.61494      26 2.707e-05 ***
Residuals 11 0.26016 0.02365                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model selection

Given the residual variance desired, the model linear in variables is not sufficient.
Analysis of Variance Table

Response: sTime
                Df Sum Sq Mean Sq F value    Pr(>F)    
RotorLength      1 3.6578  3.6578 18.4625 0.0001257 ***
RotorWidth       1 1.0120  1.0120  5.1078 0.0299644 *  
BodyLength       1 0.1352  0.1352  0.6823 0.4142439    
FootLength       1 0.2719  0.2719  1.3725 0.2490708    
FoldLength       1 0.0060  0.0060  0.0302 0.8629331    
FoldWidth        1 0.0189  0.0189  0.0953 0.7592922    
PaperWeight      1 0.6528  0.6528  3.2951 0.0778251 .  
DirectionOfFold  1 0.4952  0.4952  2.4994 0.1226372    
Residuals       36 7.1324  0.1981                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding interactions and quadratic effects via stepwise regression did not improve much.
Analysis of Variance Table

Response: sTime
                       Df Sum Sq Mean Sq F value    Pr(>F)    
RotorLength             1 3.6578  3.6578 29.5262 3.971e-06 ***
RotorWidth              1 1.0120  1.0120  8.1687  0.007042 ** 
FootLength              1 0.3079  0.3079  2.4851  0.123676    
PaperWeight             1 0.6909  0.6909  5.5769  0.023730 *  
I(RotorLength^2)        1 2.2035  2.2035 17.7872  0.000159 ***
I(RotorWidth^2)         1 0.3347  0.3347  2.7018  0.108941    
FootLength:PaperWeight  1 0.4291  0.4291  3.4634  0.070922 .  
RotorWidth:FootLength   1 0.2865  0.2865  2.3126  0.137064    
Residuals              36 4.4598  0.1239                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
Just adding the quadratic effects did not help either. However, using both linear and quadratic as a starting point did give a more extensive model.
Analysis of Variance Table

Response: sTime
                            Df Sum Sq Mean Sq  F value    Pr(>F)    
RotorLength                  1 3.6578  3.6578 103.8434 5.350e-10 ***
RotorWidth                   1 1.0120  1.0120  28.7293 1.918e-05 ***
FootLength                   1 0.3079  0.3079   8.7401 0.0070780 ** 
FoldLength                   1 0.0145  0.0145   0.4113 0.5276737    
FoldWidth                    1 0.0099  0.0099   0.2816 0.6007138    
PaperWeight                  1 0.7122  0.7122  20.2180 0.0001633 ***
DirectionOfFold              1 0.5175  0.5175  14.6902 0.0008514 ***
I(RotorLength^2)             1 1.7405  1.7405  49.4119 3.661e-07 ***
I(RotorWidth^2)              1 0.3160  0.3160   8.9709 0.0064635 ** 
I(FootLength^2)              1 0.1216  0.1216   3.4525 0.0760048 .  
I(FoldLength^2)              1 0.0045  0.0045   0.1272 0.7245574    
RotorLength:RotorWidth       1 0.4181  0.4181  11.8693 0.0022032 ** 
RotorLength:PaperWeight      1 0.3778  0.3778  10.7247 0.0033254 ** 
RotorWidth:FootLength        1 0.6021  0.6021  17.0947 0.0004026 ***
PaperWeight:DirectionOfFold  1 0.3358  0.3358   9.5339 0.0051968 ** 
RotorWidth:FoldLength        1 1.5984  1.5984  45.3778 7.167e-07 ***
RotorWidth:FoldWidth         1 0.3937  0.3937  11.1769 0.0028207 ** 
RotorWidth:PaperWeight       1 0.2029  0.2029   5.7593 0.0248924 *  
RotorWidth:DirectionOfFold   1 0.0870  0.0870   2.4695 0.1297310    
RotorLength:FootLength       1 0.0687  0.0687   1.9517 0.1757410    
FootLength:PaperWeight       1 0.0732  0.0732   2.0781 0.1629080    
Residuals                   23 0.8102  0.0352                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This model does is quite extensive. For prediction purpose I would probably drop a few terms. For instance FootLength:PaperWeight could be removed, lessen the fit yet improve predictions, since its p-value is close to 0.15. As it is currently the model does have some issues. For instance, quite some points have high leverage.

Conclusion

The paper helicopter needs quite a complex model to fit all effects on flying time. This somewhat validates the complex models found in part 1. 

Code used

library(dplyr)
library(car)
h3 <- read.table(sep='\t',header=TRUE,text='
        RotorLength RotorWidth BodyLength FootLength FoldLength FoldWidth PaperWeight DirectionOfFold Time
        5.5 3 1.5 0 5 1.5 light against 11.8
        5.5 3 1.5 2.5 11 2.5 heavy against 8.29
        5.5 3 5.5 0 11 2.5 light with 9
        5.5 3 5.5 2.5 5 1.5 heavy with 7.21
        5.5 5 1.5 0 11 1.5 heavy with 6.65
        5.5 5 1.5 2.5 5 2.5 light with 10.26
        5.5 5 5.5 0 5 2.5 heavy against 7.98
        5.5 5 5.5 2.5 11 1.5 light against 8.06
        11.5 3 1.5 0 5 2.5 heavy with 9.2
        11.5 3 1.5 2.5 11 1.5 light with 19.35
        11.5 3 5.5 0 11 1.5 heavy against 12.08
        11.5 3 5.5 2.5 5 2.5 light against 20.5
        11.5 5 1.5 0 11 2.5 light against 13.58
        11.5 5 1.5 2.5 5 1.5 heavy against 7.47
        11.5 5 5.5 0 5 1.5 light with 9.79
        11.5 5 5.5 2.5 11 2.5 heavy with 9.2
        8.5 4 3.5 1.25 8 2 light against 10.52
        8.5 4 3.5 1.25 8 2 light against 10.81
        8.5 4 3.5 1.25 8 2 light against 10.89
        8.5 4 3.5 1.25 8 2 heavy against 15.91
        8.5 4 3.5 1.25 8 2 heavy against 16.08
        8.5 4 3.5 1.25 8 2 heavy against 13.88
        8.5 4 2 2 6 2 light against 12.99
        9.5 3.61 2 2 6 2 light against 15.22
        10.5 3.22 2 2 6 2 light against 16.34
        11.5 2.83 2 2 6 1.5 light against 18.78
        12.5 2.44 2 2 6 1.5 light against 17.39
        13.5 2.05 2 2 6 1.5 light against 7.24
        10.5 2.44 2 1.5 6 1.5 light against 13.65
        12.5 2.44 2 1.5 6 1.5 light against 13.74
        10.5 3.22 2 1.5 6 1.5 light against 15.48
        12.5 3.22 2 1.5 6 1.5 light against 13.53
        11.5 2.83 2 1.5 6 1.5 light against 17.38
        11.5 2.83 2 1.5 6 1.5 light against 16.35
        11.5 2.83 2 1.5 6 1.5 light against 16.41
        10.08 2.83 2 1.5 6 1.5 light against 12.51
        12.91 2.83 2 1.5 6 1.5 light against 15.17
        11.5 2.28 2 1.5 6 1.5 light against 14.86
        11.5 3.38 2 1.5 6 1.5 light against 11.85
        11.18 2.94 2 2 6 1.5 light against 15.54
        11.18 2.94 2 2 6 1.5 light against 16.4
        11.18 2.94 2 2 6 1.5 light against 19.67
        11.18 2.94 2 2 6 1.5 light against 19.41
        11.18 2.94 2 2 6 1.5 light against 18.55
        11.18 2.94 2 2 6 1.5 light against 17.29
        ')
names(h3)

h3 <- h3 %>% 
    mutate(.,
        FRL=factor(format(RotorLength,digits=2)),
        FRW=factor(format(RotorWidth,digits=2)),
        FBL=factor(format(BodyLength,digits=2)),
        FFt=factor(format(FootLength,digits=2)),
        FFd=factor(format(FoldLength,digits=2)),
        FFW=factor(format(FoldWidth,digits=2)),
        allvl=interaction(FRL,FRW,FBL,FFt,FFd,FFW,PaperWeight,DirectionOfFold,drop=TRUE)
    )

h4 <- xtabs(~allvl,data=h3) %>% 
    as.data.frame %>%
    filter(.,Freq>1) %>%
    merge(.,h3) %>%
    select(.,RotorLength,
        RotorWidth,BodyLength,FootLength,
        FoldLength,FoldWidth,PaperWeight,
        DirectionOfFold,allvl,Time,Freq) %>%
    print
lm(Time~allvl,data=h4) %>% anova

par(mfrow=c(1,2))
aov(Time~allvl,data=h3) %>% residualPlot(.,main='Untransformed')
aov(log10(Time)~allvl,data=h3) %>% residualPlot(.,main='Log10 Transform')

lm(Time ~   RotorLength + RotorWidth + BodyLength +
            FootLength + FoldLength + FoldWidth + 
            PaperWeight + DirectionOfFold,
        data=h3) %>%
    boxCox(.)
dev.off()

lm(sqrt(Time)~allvl,data=h4) %>% anova

h3 <- mutate(h3,sTime=sqrt(Time))

lm(sTime ~  RotorLength + RotorWidth + BodyLength +
            FootLength + FoldLength + FoldWidth + 
            PaperWeight + DirectionOfFold,
        data=h3) %>%
    anova

s1 <- lm(sTime ~ 
            RotorLength + RotorWidth + BodyLength +
            FootLength + FoldLength + FoldWidth + 
            PaperWeight + DirectionOfFold ,
        data=h3) %>%
    step(.,scope=~(RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth + 
              PaperWeight + DirectionOfFold)*
            (RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth)+
            I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +
            I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +
            PaperWeight*DirectionOfFold)
anova(s1)

s1 <- lm(sTime ~ 
            RotorLength + RotorWidth + BodyLength +
            FootLength + FoldLength + FoldWidth + 
            PaperWeight + DirectionOfFold ,
        data=h3) %>%
    step(.,scope=~(RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth + 
              PaperWeight + DirectionOfFold)*
            (RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth)+
            I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +
            I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +
            PaperWeight*DirectionOfFold)

anova(s1)

s2 <- lm(sTime ~ 
            RotorLength + RotorWidth + BodyLength +
            FootLength + FoldLength + FoldWidth + 
            PaperWeight + DirectionOfFold +
            I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +
            I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) ,
        data=h3) %>%
    step(.,scope=~(RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth + 
              PaperWeight + DirectionOfFold)*
            (RotorLength + RotorWidth + BodyLength +
              FootLength + FoldLength + FoldWidth)+
            I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +
            I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +
            PaperWeight*DirectionOfFold)

anova(s2)
par(mfrow=c(2,2))
plot(s2)

No comments:

Post a Comment