Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Experiment Definition 2. Experiment Definition 3. Experiment Design and Conduct 3. Experiment Design and Conduct 3.1 Latin Square Designs 3.1 Latin Square Designs 3.2 Subjects, Tasks and Objects 3.2 Subjects, Tasks and Objects 3.3 Conduct 3.3 Conduct 3.4 Measures 3.4 Measures 4. Data Analysis 4. Data Analysis 4.1 Model Assumptions 4.1 Model Assumptions 4.2 Analysis of Variance (ANOVA) 4.2 Analysis of Variance (ANOVA) 4.3 Treatment Comparisons 4.3 Treatment Comparisons 4.4 Effect Size and Power Analysis 4.4 Effect Size and Power Analysis 5. Experiment Limitations and 5.1 Threats to the Conclusion Validity 5. Experiment Limitations and 5.1 Threats to the Conclusion Validity 5.2 Threats to Internal Validity 5.2 Threats to Internal Validity 5.3 Threats to Construct Validity 5.3 Threats to Construct Validity 5.4 Threats to External Validity 5.4 Threats to External Validity 6. Discussion and 6.1 Duration 6. Discussion and 6.1 Duration 6.2 Effort 6.2 Effort 7. Conclusions and Further Work, and References 7. Conclusions and Further Work, and References 4.3 Treatment Comparisons Taking this alpha level (a=0.1) into account, we perform a treatment comparison test (also referred as contrast test) for each measure. Table 8 shows the treatment means, standard error and replications for duration measure whereas Table 9 shows the same information for effort. There are several tests for performing treatment comparisons. These tests help us to analyze pairs of means to assess possible differences between means. Using Scheffé test [21] for treatment comparisons, Table 10 shows the treatment comparison with respect to duration. As shown in Table 10, there is a significant difference (at a=0.1) of 36 minutes in favor of pair programming (28% decrease in time). At a confidence interval of 95% this difference ranges between 6 and 66 minutes (4% to 51% decrease in time). Table 11 shows the treatment comparison with respect to effort. As we see, there is a significant difference (at a=0.1) of 56 minutes in favor of solo programming (30% decrease in effort). At a confidence interval of 95% this difference ranges between 8 and 104 minutes (4% to 55% decrease in effort). Authors: (1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico. Authors: Authors: (1) Omar S. Gómez, full time professor of Software Engineering at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (2) José L. Batún, full time professor of Statistics at Mathematics Faculty of the Autonomous University of Yucatan (UADY); (3) Raúl A. Aguilar, Faculty of Mathematics, Autonomous University of Yucatan Merida, Yucatan 97119, Mexico. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. available on arxiv available on arxiv