Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Experiment Definition 2. Experiment Definition 3. Experiment Design and Conduct 3. Experiment Design and Conduct 3.1 Latin Square Designs 3.1 Latin Square Designs 3.2 Subjects, Tasks and Objects 3.2 Subjects, Tasks and Objects 3.3 Conduct 3.3 Conduct 3.4 Measures 3.4 Measures 4. Data Analysis 4. Data Analysis 4.1 Model Assumptions 4.1 Model Assumptions 4.2 Analysis of Variance (ANOVA) 4.2 Analysis of Variance (ANOVA) 4.3 Treatment Comparisons 4.3 Treatment Comparisons 4.4 Effect Size and Power Analysis 4.4 Effect Size and Power Analysis 5. Experiment Limitations and 5.1 Threats to the Conclusion Validity 5. Experiment Limitations and 5.1 Threats to the Conclusion Validity 5.2 Threats to Internal Validity 5.2 Threats to Internal Validity 5.3 Threats to Construct Validity 5.3 Threats to Construct Validity 5.4 Threats to External Validity 5.4 Threats to External Validity 6. Discussion and 6.1 Duration 6. Discussion and 6.1 Duration 6.2 Effort 6.2 Effort 7. Conclusions and Further Work, and References 7. Conclusions and Further Work, and References 4.4 Effect Size and Power Analysis Effect size is a measure for quantifying the difference between two data groups. Usually, it is used to indicate the magnitude of a treatment effect. Using the function defined in equation (2) [5], we calculate Cohen's d coefficient [10]. This coefficient is used as an effect size estimate for the comparison between two means (in this case Solo and Pair programming). According to Cohen [10], a d value between 0.2 and 0.3 represents a small effect size, if it is around 0.5 it is a medium effect size, and an effect size bigger than 0.8 is a large one. Using the F-value 2.9843 of the first ANOVA (Table 6) we get an effect size d of 0.6529 and an effect size d of 0.6431 for the F-value 2.8953 regarding second ANOVA (Table 7). According to Cohen’s classification, both effect sizes are considered medium effects. The first effect size is against of solo programming (with respect to duration) whereas the second is against of pair programming (with respect to effort). Once we have calculated effect sizes, we carry out a power analysis. The power of a statistical test is the probability of rejecting the null hypothesis when it is false. In other words, the power indicates how sensitive is a test to detect an effect in the treatment examined. Once we know the effect size it is possible to compute the power of a test. In order to determine the power, we use the function pwr.t.test() of the R environment [9] which implements power analysis as outlined by Cohen [10]. Given an effect size of 0.6529 (related to duration) and a sample size of n=14 (number of measures in each group; pair and solo programming), and setting a significance level a=0.1; we get a power of 0.51 (51%). Similarly, a power of 0.5 (50%) was obtained with the same sample size and significance level, but replacing the effect size for the value 0.6431 (related to effort). Authors: (1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico. Authors: Authors: (1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. available on arxiv available on arxiv