Improving the accuracy of project estimates at completion using the Gompertz function

Using the standard definition of the earned schedule and Gompertz curve cost profiles, estimates of the final duration are found by using regression curve fitting on the first and last available earned value data. Validation is conducted on many synthetic project data sets. It is found that a simple two-point curve-fitting estimation formula is effective in predicting the duration of an ongoing project. This research broadens discussions on accuracy on nonlinear duration and cost estimates of ongoing projects, using different cost profiles and the standard definitions of the earned schedule.


Introduction
Earned value management (EVM) and earned schedule (ES) are widely recognised methodologies that are used to compute duration and cost estimates at completion for inprogress projects. Traditional EVM and ES methods use index-based formulae that are linear, but why should such models be assumed to be reliable when realistic cumulative cost curves are usually S-shaped (Khamooshi & Golafshani 2014)? As a result, nonlinear regression-based estimates have been developed to overcome some of the limitations of linear methods and to better model the S-curve cost profiles of projects in a variety of industries. Such methods are regarded as more sophisticated and are able to generate improved estimates for nonlinear cost growth patterns (Christensen & Heise 1992).
Several growth models are available to describe the S-shaped cost profile of a project (Narbaev & DeMarco 2014a). In particular, the Gompertz function is an interesting sigmoidal curve that is often used to describe phenomena inherent to data with a Sshaped growth pattern. Thus, project EVM datasets can be modelled using Sshaped Gompertz functions for their cumulative cost profiles. Previous research findings have revealed that the Gompertz function is a valid and useful nonlinear cost profile that helps in computing refined estimates of the actual, future duration and cost (Trahan 2009;Narbaev & DeMarco 2014a,b).
in the past in the prediction of cost and duration, result in the standard formula. Therefore, here we investigate the use of the Gompertz function to predict duration and cost estimates at completion.

Relevant literature
EVM provides early estimates of the project's final cost, which many studies have found be reliable in practice (Christensen & Heise 1992;Christensen 1993). ES is the basis for duration estimation (Lipke 2003(Lipke , 2010 and has been shown to work well for many real-world projects (Batselier & Vanhoucke 2015c;Colina & Vanhoucke 2015). However, Evensmo and Karlsen (2006) pointed out that the standard duration and cost estimation formulas are based on linear cumulative cost curves. This was further clarified by Kim and Kim (2014), who showed that both forecast accuracies and early warning credibility are sensitive to S-curve profiles, especially early in a project. Therefore, the nonlinearity of the cost curves significantly affects future estimates. Batselier and Vanhoucke (2015c) analysed three duration forecasting approaches: the planned value method (Anbari 2003); the earned duration method (EDM); and the earned schedule method (ESM). Lipke (2003Lipke ( , 2010 created the linear duration estimate formula by defining a geometrical construction procedure to determine ES. Khamooshi and Golafshani (2014) criticised ES, along with other EVM analyses, for using monetary measures as a proxy for the true duration and argued that such measures may not accurately represent the duration's progress. They proposed the EDM, which decoupled the cost and schedule dimensions by using actual durations rather than their monetary proxies.
According to Vanhoucke and Vandevoorde (2007), all forecasting methods yield similar results, regardless of the method used, which Jacob and Kane (2004) attributed to the high correlation among the methods and that they apply the same basic parameters. Teicholz (1993) compared three forecasting methods for final cost using data from 121 real projects.
Using the real-life project database constructed by Batselier and Vanhoucke (2015a), the three methods were evaluated by Batselier and Vanhoucke (2015b). Although all three methods were found to be practically useful, EDM performed the best. Lipke et al. (2009) studied 12 projects, and estimates of both the final cost and the duration were claimed to be sufficiently reliable for general application. Typically, forecast accuracy is reported as the mean absolute percentage error (MAPE) between the model's prediction and the actual project data (Chen, Chen & Li 2016;Batselier & Vanhoucke 2015b). However, the time over which the mean is taken varies among authors. Also, as Kim (2007) pointed out, an average measure of the error over the entire project's execution has little practical use, as project managers prefer early estimates.
Several researchers have enhanced the linear EVM model. Evensmo and Karlsen (2006) proposed a cubic polynomial cost curve, and Warburton (2011) developed a time-dependent EVM model for projects that follow the nonlinear Putnam-Norden-Rayleigh (PNR) profile (Putnam 1978). Cioffi (2006a) showed that a model often used in population dynamics can be applied to project S-curves, and gave an interesting example of its application to the development of the Oxford English Dictionary, a project spanning many decades (Cioffi, 2006b). Warburton (2014) used a trapezoidal labour profile, which can describe construction projects, to derive accurate duration estimates early in the project. Elshaer (2013) suggested that although ES sometimes outperformed other methods, it failed when incorrect warnings emerged from non-critical activities. Vanhoucke (2012) Improving the accuracy of project estimates at completion using the Gompertz function International Research Network on Organizing by Projects (IRNOP) 2017, 11-14 June 2017 confirmed that the network topology is a significant driver of variability, that S-shaped curves degrade forecasting accuracy and that networks with greater parallelism have more variability. Warburton and Cioffi (2016) recently developed a formal, theoretical foundation for duration estimation that applies to nonlinear, S-shaped cost profiles, which provides a significant motivation for this research. Chen et al. (2016) reported the accuracy of forecasts using MAPE, and though their model improved forecasting accuracy, it required a logarithm linear transformation of the planned value data and linear regression. Zwikael, Globerson and Raz (2000) evaluated five forecasting models using the mean square error, the mean absolute deviation and the mean absolute percentage error. Narbaev and DeMarco (2014b) proposed a Gompertz-based growth model, using nonlinear regression curve fitting, that improved forecast accuracy (as measured by MAPE) by including schedule progress as a factor in the cost performance.
Data from decades of completed US Department of Defense contracts established that the cost performance index (CPI) rarely changed by more than 10% once the contract had reached the 20% completion point, regardless of the type or phase of the defence contract, weapon system or military service involved (Christensen & Payne 1991). Therefore, in practice, the CPI seems to be a reliable indicator after the 20% completion point. Kim and Kim (2014) analysed timeliness by examining seven duration forecasting methods and showed that forecast accuracy and early warning credibility are very sensitive to S-curve patterns, especially early in a project.

The Gompertz Curve
The Gompertz function is often used to describe phenomena with inherently S-shaped growth patterns and has found wide application in many industries that feature population growth, such as biology and social sciences. As it is extensively used in curve fitting and forecasting, the Gompertz function can be useful in characterising the S-shaped cost profile of projects in a variety of industries, especially when it comes to estimating project overruns due to cost and duration growth (Trahan 2009;Narbaev & DeMarco 2014b). The Gompertz function has been proven as a statistically valid model able to generate more accurate schedule-integrated cost estimates than other nonlinear models, such as the logistic, Bass and Weibull functions (Narbaev & DeMarco 2014a).
The Gompertz curve is typically written as follows: where α, represents the asymptotic value (t → ∞) of the Gompertz function and therefore is related to the final budget of the project. That leaves two parameters to be determined, and in the above representation, neither have an obvious project management interpretation. Therefore, we eliminate β, by defining, β = γT, where T is the peak in the distribution function, g(t) = Gʹ(t). Differentiating G(t) with respect to t gives g(t), and differentiating again and setting the result to zero shows that T is the peak in the distribution function. Thus, T is a parameter that directly determines the duration of the project, and, in fact, we will show that it is directly related to the actual end time of the project. That leaves just one parameter, γ, which characterises the growth rate of the cumulative curve, and allows for the study of a wide variety of different project cost profiles. Figure 1 presents Gompertz functions with three different values of γ that are similar to real-world project databases (Narbaev & DeMarco 2014b). The Gompertz function and the distribution function are then: The definition of the end of the project requires some care. The Gompertz function never reaches its asymptotic value, but the planned end of the project is defined to be at a specific time, T1. We can define the end of the project as some specific fraction of the asymptotic value, such as 95% or 99%, in which case, at the planned end of the project, where ε is a constant. It is convenient to define k as which gives, Therefore, k is determined once we chose the specific end point of the project as, say, 99% of the asymptote, α. This relation also shows that there is a direct relation between the peak in the distribution function, T, and the end of the project. Further, this gives a practical project management interpretation to the parameter, T, which determines the duration of the project.

Duration and cost estimate formulas
We follow the standard approach to EVM and use Gompertz functions for the cumulative planned value, Gp(t), earned value, Ge(t), and actual cost, Ga(t): Improving the accuracy of project estimates at completion using the Gompertz function International Research Network on Organizing by Projects (IRNOP) 2017, 11-14 June 2017 5 The p subscripts denote planned parameters; the e subscripts denote earned parameters; and the a subscripts denote actual parameters.
The total planned cost is the budget, B = Gp(T1) and the cost estimate at completion is, E = G a (T′1). The same budget, B, is used as the final cost for both planned and earned values. In this, we follow the standard EVM approach, which says that as each activity is completed, it earns its planned value, even if there is a cost increase or a delay in completing the activity (PMI 2013(PMI , 2011. Therefore, at the end of the project, when all the planned work has been completed (i.e. earned), the total earned value equals the total planned value, Ge(T′1) =t Gp(T1). If unplanned work is proposed (e.g. a scope increase), the project must be re-planned, which will generate a new planned cost profile and all formulas in this paper then apply with the new profile replacing the old.
In standard EVM, if no scope creep occurs, the standard cost estimate at completion, CEAC, is the ratio of the budget to the cost performance index, that is, CEAC = Budget/CPI (PMI 2011).
We define the planned end point of the project as T 1 and assume that, during execution, it ends at T′ 1 . If the project is delayed, T′ 1 > T 1 , and if the project is accelerated, T′ 1 < T 1 .
According to the standard EVM methodology, when each activity is completed, it earns its planned value, even if there is a delay in the execution or a cost increase. Therefore, at the end of the project the cumulative planned value, C p (t), is equal to the cumulative earned value, C e (t), If the planned and earned value curves end at the same percentage of the asymptote (e.g. 99%), then using equation 5, gives Equation 8 is referred to as the 'end point condition' and is more usefully written as This equation determines T1 and T′ 1 in terms of the peaks in the planned and earned value distribution functions, Tp and Tn.
If the project is delayed, at the current time, t, the delay, δ(t), is defined as the time difference represented by the horizontal projection back from the point on the earned value curve, at t, to its intersection with the planned value curve (see Figure 2). For accelerated projects, the projection is forward in time. The mathematical representation of this condition is (Warburton & Cioffi 2016)  Definition of the delay, δ(t), as the difference between the earned and planned value curves: +δ(t1) is an accelerated project, and −δ(t2) is a delayed project. The corresponding earned durations are denoted as Te(t1) and Te(t2).
We now introduce a new quantity: the earned duration, Te(t), which is defined as a timedifference: the current time minus the delay, δ(t): So far, these equations are completely general and independent of the specific shape of the cost curves. Using the above Gompertz functions, equation 6, in the definition of the delay, equation 10, gives which, upon using equation 11, gives Using the end point condition, equation 9, gives At time, t, equation 14 can be used to predict the delayed end point of the project, T′1, in terms of known quantities: the planned end point, T1; the earned duration, T e (t); and the growth parameters for the planned and earned value curves, γ p and γ e .
We can compare this prediction with that of Warburton and Cioffi (2016), who derived the following formula for the actual end of the project for several project profiles, including the Improving the accuracy of project estimates at completion using the Gompertz function International Research Network on Organizing by Projects (IRNOP) 2017, 11-14 June 2017 7 linear profile that is the basis for the traditional ES approach. Therefore, we refer to this as the 'standard' prediction of the project duration, T′ 1std , Equation 14 shows that if one were to apply the standard formula (equation 15) when using Gompertz functions, one would not be using the correct expression for the duration estimate. In fact, applying the standard formula to projects represented by Gompertz functions would result in the following prediction for the final duration We note that at the end of the project, t → ∞, the above prediction becomes and using the end point condition gives This interesting result suggests that the standard formula may not converge to the correct answer when using Gompertz functions to model the cost profiles. This result was previously found by Warburton and Cioffi (2016), who showed that the standard formula does not give the correct answer for a project that follows the Cioffi profile.

DURATION ESTIMATE WHEN βp = βe
If the β parameters for the planned and earned Gompertz curves are the same (βp = βe), equation 10, gives and using the end point condition gives which is the standard duration estimation formula. Therefore, if one were to use Gompertz functions for the planned and earned values, one could only use the standard formula if the β parameters for the two curves were the same.
We obtain a different prediction estimate when γp = γe, and these results are summarised in Table 1.  Table 1 Prediction formulas for different Gompertz parameter combinations

The two-point duration prediction formula
In the early stages of the project, it is very important to be able to forecast a possible delay in order to support effective management decisions. From the above results, it is possible to construct a prediction formula that is easy to use and accurate. At the beginning of the project, we know all the parameters of the planned value curve, Cp(t). On the other hand, we must assume that we know little about the earned value curve, Ce(t), and so it appears wise to consider the general case. Therefore, in order to compute the duration estimates, we need to compute the parameters Tn and γe.
We now introduce a method of calculating the earned value curve's parameters by using only two data points from the project's earned value execution data. The mathematical details are contained in the appendix. We investigated a number of approaches for selecting the two data points, but using the first and last data points consistently showed the best results. Using two data points, the system of equations can be solved, and the final duration can be estimated. When we have significantly more than two values of the cumulative earned value data, we can use nonlinear regression to fit the entire earned value data set to a Gompertz function, which determines its parameters. Then, a duration forecast can be determined. Therefore, we have two methods for estimating the final duration: 1. Two-point: Estimates of the final duration are found by using the two-point method on the first and last available earned value data. 2. NR: Nonlinear regression is applied to all available data to date to estimate the Gompertz function parameters and hence the final duration.

RESULTS
To test the effectiveness of the above methods, we generated synthetic data sets for the planned, earned and actual data rates. Planned value data rates were generated by using Gompertz distribution functions, g p (t), with parameters similar to those found in real projects (Narbaev and DeMarco, 2014b). We then added a random uniform distribution of noise to get the earned value profile, g e (t), and the actual profile, g a (t). Many example data sets were generated, and the duration and cost predictions analysed using the above methods.
A typical example of the cost rate profiles is shown in Figure 3, where significant randomness is evident in the earned value and actual cost rates. The corresponding cumulative cost profiles are shown in Figure 4, where the randomness is smoothed out by the effect of the cumulative computation. An example of Gompertz distribution functions, with noise, for planned value rate, g p (t), (blue), earned value rate, g e (t), (red) and actual cost rate, g a (t), (green) The accuracy of the two duration prediction methods is shown in Figure 5, which plots the errors in the duration and cost estimates as the project proceeds. For the two-point method, we used the first data point and the last data point available, that is, at the current time of the prediction (red curve). The regression method used a nonlinear regression fit to all available data at that time (blue curve). For comparison, the error in the standard cost estimate at completion (CEAC = Budget/CPI) is also shown (green curve).
There are several interesting features of fi gure 5. The prediction errors generally decrease as the project proceeds and the duration error falls below about 10% after about 20% of the planned duration. The duration error falls to around 5% after about 30% of the planned duration.
Next, we varied the random contribution in the earned value and actual cost distributions. Somewhat surprisingly, addition of more randomness to the data did not significantly increase Errors in the estimation of the final duration (two-point method, red; regression, blue) and cost (CEAC, green) the prediction errors. This is shown in fi gure 6, which summarises the results. It appears that the averaging effect of the cumulative data effectively smoothes out the deviations. This suggests that a major contribution to duration errors may be associated with biased deviations rather than random deviations, in the earned and actual cost data -a result that needs further exploration.

Conclusions
Using the standard definitions of EVM and ES, we established a sound theoretical basis for the prediction of the project duration when the cumulative cost profiles follow a Gompertz function. We derived formulas for the duration estimates and found the important and interesting result that a simple two-point estimation formula is effective in predicting the duration.

Figure 6
The effect on the prediction error of increasing the random influence is small for quite a wide range of randomness. Where the predictions cross the red lines highlights where the errors fall below 15%, 10% and 5%.
Improving the accuracy of project estimates at completion using the Gompertz function International Research Network on Organizing by Projects (IRNOP) 2017, 11-14 June 2017 Table 2 summarises the results by providing the prediction errors over time through the project. Table 2 The decline in the prediction error as a percent through the planned project Duration Prediction Error Percent of Plan 15% 15% 10% 20% 5% 35% Because the entire theory was built on standard EVM and ES definitions, it is proposed as a familiar and accessible methodology for project management practitioners. Also, while the derivation of the error formula was moderately complex, the resulting two-point formula is quite straightforward, especially if compared to some previous studies that require three time points for curve fitting (Narbaev & DeMarco 2014b).
There are several issues that could be explored in future research. One might extend this work to other cost profiles, such as the Cioffi (2005) profile and the trapezoidal profile often used in construction (Warburton 2014). In addition, it might be possible to extend this theory to analyse the impact on the estimates of scope growth during execution. Typically, such methods incorporate estimates of extra work and scope changes needed to increase the project's expectations relative to its original performance. Further investigation might also provide guidance on the selection of appropriate cost profiles and, in particular, the effectiveness of using of Gompertz functions to characterise different categories of projects and industries.
In conclusion, we established a new, effective method of duration and cost estimation over time and validated the theory by comparing its predictions to many synthetic projects. The duration and cost error formulas are quite simple and require little additional effort to be practically useful to project teams during project monitoring and control.

Appendix: Mathematical details
We define two times for which we have earned value data, t 1 and t 2 (t 1 , t 2 > 0 and t 1 < t 2 ). We also know the earned value data at the two instants of time, C e (t 1 ) and C e (t 2 ). For these two time values, we calculate the earned duration, T e (t), from C e (t) = C p (t − δ(t), as Once the T e (t) values are determined, we can calculate T n and γ e and thus T′ 1 in two different ways, but the different approaches lead to essentially the same results. After solving the system, we estimate T′ 1 as When we have several values for the cumulative earned value data, we perform a nonlinear regression on the entire earned value data set to determine the parameters of the Gompertz function. The estimate of the duration forecast then follows.