Artificial Neural Networks Incorporating Cost Significant Items towards Enhancing Estimation for (life-cycle) Costing of Construction Projects

Industrial application of life-cycle cost analysis (LCCA) is somewhat limited, with techniques deemed overly theoretical, resulting in a reluctance to realise (and pass onto the client) the advantages to be gained from objective LCCA comparison of (sub)component material specifications. To address the need for a user-friendly structured approach to facilitate complex processing, the work described here develops a new, accessible framework for LCCA of construction projects; it acknowledges Artificial Neural Networks (ANNs) to compute the whole-cost(s) of construction and uses the concept of cost significant items (CSI) to identify the main cost factors affecting the accuracy of estimation. ANN is a powerful means to handle non-linear problems and subsequently map relationships between complex input/output data and address uncertainties. A case study documenting 20 building projects was used to test the framework and estimate total running costs accurately. Two methods were used to develop a neural network model; firstly a back-propagation method using MATLAB SOFTWARE; and secondly, spread-sheet optimisation using Microsoft Excel Solver. The best network used 19 hidden nodes, with the tangent sigmoid used as a transfer function for both methods. The results is that in both models, the accuracy of the developed NN model is 1% (via Excel-solver) and 2% (via back-propagation) respectively.


Introduction
Stakeholders of built assets are increasingly required to extend estimation beyond the initial capital-cost.This includes all stages of an asset's life-cycle, through incorporation of approaches such as benefit-to-cost ratio, internal rate of return and life cycle cost analysis (LCCA) to evaluate the total life cost of construction (Anurag Shankar et al. 2010).LCCA compares different design (sub)elements, specifications and materials on the basis of their whole-life installation, operation, maintenance and residual (decommissioning) costs.LCC analysis assesses all cost components, converting them into a cost at a specific point in time, namely the present (Olubodun et al. 2010).Client's increasingly require such whole-costs for proposals.
A key justification of LCCA use is the ability to reduce operation costs even if that requires spending more during early (construction) stages; expensive updating/retrofitting of older buildings is a significant factor encouraging LCCA uptake at the design development stage.However, Sterner (2000) argues that LCC influence is diminished by it's perceived 'uncertainty'; misunderstanding of the somewhat complex process alongside an absence of standardisation.There is also a need to acknowledge that market-trends limit the implementation of LCC concepts in industry (Olubodun et al. 2010, Sterner 2000), where calculations must consider the time-value of money, as all future costs requiring discounting to present-values taking inflation into account (Fabrycky andBlanchard 1991, Woodward 1997)

. Estimates of inflation and a
Alqahtani, A and Whyte, A (2013) 'Artificial neural networks incorporating cost significant items towards enhancing estimation for (life-cycle) costing of construction projects', Australasian Journal of Construction Economics and Building, 13 (3) 51-64  52 'correct' discount rate measuring the time-value-of-money so that design alternatives for a project, may be compared are deemed somewhat overly subjective, adding to uncertainty and risk and affecting LCC result accuracy.Indeed traditional methods remain restrictive, as LCC prediction of the large number of variable factors affects the 'value' of the construction cost, compounded by complicated interaction between these factors (Cheng et al. 2010), and coupled with the reluctance of busy practitioner's to predict future discount-rate multipliers, continues to restrict LCCA usage by the construction industry.
Many researchers recommend sensitivity and probability analyses, and statistical tools to address the problem of uncertainty (Singh 1990).Regression modelling is another method towards identification of construction cost impact factors, where regression equations estimate construction cost depending upon on cost-factor interrelationships.These methods become complex when numerous cost elements are considered as the dependent variables (Sonmez 2011).Sensitivity analysis is a method to indicate how the value of LCC is affected by changing the interest and inflation rate parameters.It is often the case where small variations in a parameter lead to a significant change in LCC result.In these cases a further analysis can be done utilizing probability analysis (Kirk 1995), incorporating Monte Carlo simulation of variables, presented as a probability distribution of all total costs of all alternative options.Resultant findings illustrate the most likely cost of all alternative options and the range within which it can be expected to lie (Flanagan and Norman 1993).However, these adaptations have disadvantages including: i. Probability subjectively evaluated today may differ in the future (Whyte and Scott 2010).ii.
Nominal account is taken of non-cost factors that affect LCC estimation.iii.
Complex processes cannot fit input and output variables easily.iv.
Sensitivity analysis does not quantify risk but rather identifies factors that are risk sensitive where only one parameter is varied at a time (Flanagan and Norman 1993).
Towards addressing these uncertainties, the work presented here develops and tests a new model of estimating costs for each stage of a project, using Artificial Neural Networks (ANN) to estimate costs occurring at each stage of the life-cycle of construction.The model estimates the total capital, operation and maintenance costs, alongside an assessment of accuracy that compares total operation/maintenance costs of actual building projects.

Applications of ANNs in Construction
Using simulation and modelling tools at all stages of an asset's life cycle provides a way to anticipate the behaviour of an asset before it is built (Mackenzie and Briggs 2006).Artificial intelligence methods such as expert systems, neural networks (NNs), fuzzy logic (FL), and genetic algorithms (GAs) are argued to help solve prediction problems (Cheng et al. 2010) Currently, building sector clients seek predictions that are flexible and easy to use.As mentioned before, ANNs have several advantages over traditional methods but there are currently no clear steps that can be followed to simplify the estimation cost process in ANNs applications.There is still a need to develop a framework of ANNs to (excite clients, and) predict whole construction costs.To make sense of the huge number of variable factors that affect the value of the construction cost; there is a need to provide better interaction between these factors in a less complicated process.It is important to standardise the estimation process and integrate some methods with ANNs, to identify the key factors to be considered as input factors of ANNs model.This research will develop a new framework for estimation cost using artificial neural networks integration techniques.

New Framework towards a Cost Estimation Model
The process developed here for a new model of cost estimation follows a number of systemic procedures.There are eight basics steps proposed here: (1) Identify the purpose of estimation; (2) identify cost factors affecting cost estimation; (3) identify non-cost factors affecting cost estimation; (4) Create database for cost and non-cost factors; (5)design NNS; (6) train NNs; (7) validation the model; and (8) run the application.See figure 1 and the text below:

Identification of the Purpose of Estimation
The purpose of estimation may range from an estimate of construction cost only, to estimating the total life-cycle costs of new projects which include construction, operation and maintenance.

Implementation of CSIs:
Construction projects have numerous variable factors impacting upon the value of LCC.Interaction between factors is somewhat complex with current LCC models suffering arguably from an absence of both standardisation and a simple methodology to collect and interpolate data (Olubodun et al. 2010).The concept of Cost-Significant-Items (CSIs) shall help future analysts to simplify estimation methodologies by determining the key items contributing most to construction project LCC.CSI ideology owes much to Pareto's classic 80:20 rule.In the construction sector, various building-sector scholars have applied CSIs to construction cost estimate research, finding that CSIs theory is able to determine the small number of items which represent a constant percentage of the total cost of construction projects (Al-Hajj and Horner 1998, Asif 1988, Elcin and Hakan 2005, Horner and Zakieh 1996).If the CSIs could be simply  and Building,  13 (3) 51-64  54 recognized, it would motivate estimators to direct attention to such specific items, and would reduce the time taken for estimation.In this way, cost information required to estimate total cost could be collected, analysed and recorded in a manner which will provide a more significant and realistic method of prediction.

CSIs
Training and testing NNs

Identification of Non-Cost Factors affecting the Accuracy of Estimation:
A main restriction of most of the current models of cost estimation is that they only consider significant factors that can be readily quantified.However, non-cost factors should be considered because they seem to play a vital and important role to the accuracy of cost estimation (Elhag et al. 2005).Non-cost factors affecting the accuracy of estimating come from a large range of categories.These factors are qualitative such as type of project (residential, commercial, industrial), type of structure (concrete, steel, masonry), and project size.These factors can be identified from an analysis of literature, historical data and practitioner experience.

Database Creation
Database creation consists of taking the most significant cost/non-cost factors, already identified in earlier steps and actual values of unit-rate costs for past projects.This data is used to exemplify input and output information in the proposed model during the training/testing stages.

Design of Neural Networks
The initial steps in the design of neural network modelling are selecting, collecting and preparing suitable data.In estimation cost modelling, there are two types of data needed to create a neural network model: input data consisting of data identified as key to the result of the cost estimation model (collected from the database) representing CSIs and the important non-cost factors; and output data consisting of the data collected from the database representing the actual value of total costs of previous projects.Such data needs to be normalized before presented to the network, because mixing variables with big magnitudes and small magnitudes will confuse the learning algorithm (Tymvios et al. 2008).The input and output data can be scaled to a range (-1 to 1) to suit neural networks processes.Normalization of the data uses the following formula (Arafa and Alqedra 2011): = ⌈ ( ) ⌉……… (1) Where : Normal value, : Original data set, : the minimum value of data, : the maximum value of data.
After collecting the data, the designer specifies the number of hidden layers; neurons in each layer and transfer functions.In general, traditional parametric 'trial & error' is performed to select the number of hidden layers and hidden nodes.During the training process, the number of hidden layers and hidden nodes will be adjusted until identification of the best model which gives the lowest values for the Root Mean Square Error (RMS) and absolute difference percentage for output parameters.Transfer functions then describe how the neuron's activation value occurs as a result of applying a transfer function to the sum of the weighted inputs.Key transfer functions are the sigmoid, threshold and linear functions (Duch and Jankowski 1999).

Training and Testing NNs
Training the neural network refers to a procedure that utilises numerous learning methods to adjust weights and to learn patterns in the data-set, including iteratively gathering data with samples of known right answers.Before starting the process of training, learning rates should be selected.In fact, several ANNs programs, such as MATLAB, automatically set the learning rate in terms of maximised performance of the model, to gain the best result.Regarding training the model, training methods mostly fall into one of two categories: a supervised training method in which the trainer tells the model if its result were correct; and an unsupervised training method in which there is neither teacher nor trainer during training to tell the network whether its output was correct.The network's weights are continuously modified until the difference between the model's outputs and the actual output converges to an acceptable level.The training process is stopped when a minimum root mean square error in equation 2 is reached.

√∑ ( ) 2 ……………… (2)
where: RMS: root mean square error, n: number of sample using in the training stage, O i : the actual output.P i : The model output.
Testing the model process is fundamentally the same as the training process, but the model will use sample (data) never seen before, and no corrections are made.If the results of a testing process are acceptable the model is suitable to use.If the result is inappropriate, a re-design of the model is required.The acceptable level of the result of the model will be evaluated based on the value of RMS (equation2) and absolute difference (in equation 3).

Final Model of ANNs:
Once the model is built, it can be utilised to predict the cost of new construction projects.It should be noted that a building practitioner is then able to use the final model to estimate new project costs without performing changes to the design structure of the ANNs model such as the transfer function, the number of inputs (important cost and non-cost factors) and hidden nodes, which had been selected at an initial stage.

Case Study The Purpose of Estimation
The aim of the following case study is to review/validate the newly developed framework for an estimation cost model for building projects.This case study aims to estimate the running costs (operation and maintenance costs) for three types of building projects (teaching, residential and laboratory facilities).The new work presented by this current study uses a comprehensive catalogue of 20 building projects compiled previously by Al-Hajj (1991).Al-Hajj's cost information is documented as stemming from three sources, namely: York University; an independent facilities management company; and the UK's Building Maintenance Cost Information Service (BMCIS-UK); the relevance of the case-study prepares the ground for future-work, described below.

Implementation of CSIs:
The key cost factors affecting the accuracy of running-cost estimation for the twenty (teaching, residential and laboratory facility) building projects catalogued by Al-Hajj(1991), employing an analysis of cost significant items (CSIs) (across an existing data-set of 20 projects towards verification of the concept of Pareto), finds 11 items as most influential across all building projects over the catalogue's analysis period of 18 years.These key items represent about 16% of all total items.The data for the 20 projects were entered into Excel.The input and output data is normalized and scaled to a range (-1 to 1) using equation 1 to suit neural networks processes.

b) Determining the best network architecture
Traditional parametric (trial and error) has been performed to select the number of hidden layers and the number of hidden nodes.During the training process, the number of hidden layers and hidden nodes are adjusted to find the best artificial neural network model to give the lowest value for the Root Mean Square (RMS) and absolute difference percentage for output parameters.Tangent Sigmoid is used as a transfer function of NNs model for both methods.In order to train/test the model and find a best number of nodes in hidden layers, MATLAB is utilised for its ease of use and speed of training.The best architecture resulting from MATLAB is compiled to spread sheet to compare the results of both methods.

c) Training and testing the network
In this case study, the 20 projects are divided to three sets.One set consist of 14 projects (70%) used for training the model and 3 projects (15%) are used towards model validation and the remaining 3 projects (15%) used to test the procedure.As mentioned before, one of the objectives of training the model is to identify the best structure of neural network model.The acceptable level of the result of the model is evaluated based on the value of RMS (equation2) and absolute difference (in equation 3).

a) back-propagation
The back-propagation method, the 'weights' which aim to connect nodes and biases are changed utilising a number of inputs and the desired output value.The difference between the network output and actual output, become network error sets, after which the network error is back propagated from the output layer to adjust the weights and biases.This step is repeatedly performed until the minimum level of network error is reached.The 20 model trails and error were applied to identify the number of hidden nodes on hidden layers.It was clear that increasing the number of hidden nodes in hidden layers leads to changing the value of RMS and Absolute Difference error.19 hidden nodes provided the lowest RMS value of 0.033 and an absolute difference error value of 0.96%.However, neural network models with one hidden node provided the highest RMS value of 0.388 with an absolute difference error value of 3.89%.The value of RMS & absolute difference error changed consecutively within the above mentioned range for the remaining 18 model trails.Table 3 below illustrates RMS and an absolute difference error for all 20 model trails.From table 4, the average difference between the actual value of running costs and the result from back-propagation neural network modelling for the testing projects was about 1.91%.
Alqahtani, A and Whyte, A (2013) 'Artificial neural networks incorporating cost significant items towards enhancing estimation for (life-cycle) costing of construction projects', Australasian Journal of Construction Economics and Building, 13 (3) 51-64 60 b) Spread sheet using Excel Solver 'Solver' finds the best set of values for some variables by maximizing or minimizing a desired output cell connected by formulas to the variables, under a set of user-specified constraints.In this case, the optimization goal is to reduce the NN weighted error to 0. In order to reach this goal, the adjustable variable has been identified as the weights from inputs to hidden nodes and from hidden nodes to outputs.Optimization constraints were set to limit the percentage error on both training and test project to 3% and 1% or lower to avoid erroneous network result on individual training projects.This paper applied 7 steps as suggested by Hegazy and Ayed (1998) to build ANNs model in spread sheet.
The best architecture resulting from MATLAB was used on the spread sheet (8 nodes on input layer-19 nodes in hidden layers-1 node on output layers).17 projects were used to train the model and 3 projects were used to test the model (the same 3 projects that were used to test back-propagation modelling).The results show that the running cost model developed by spread sheet neural network modelling performs well.No important differences are recognised between the estimated and actual running costs (table 4).From table 4 the average difference between the actual value of running costs and the result from spread sheet neural network modelling for the testing projects was 0.107%.The expected accuracy of the both models at training and testing stage is introduced in table 5 below.
The spread-sheet neural network model and back-propagation are able to estimate the total running cost with an average accuracy of 99% and 98%.
The neural network model results from both the training and testing stages and the actual value of running costs for both models were analysed through regression analysis in order to investigate the model response in more detail.The result of linearly regression is presented graphically in figure 4 below.Liner regression analysis consists of two parameters (as equation 4).
Y (actual value) = m*X (estimated value) + b…………………………….(4) where m & b represent the slope and the y-intercept of the best regression relating the actual value of running costs to the neural network model.For both models, in both training and testing stages, the slope is close to1 and the y-intercept is close to 0, indicating a good fit.In addition, regression analysis is able to provide the value of the correlation coefficient (R 2 ) between the actual value of running cost and the model output.

Discussion
It is argued that the framework developed in this research can be used to estimate the total cost of construction projects at different phases of a project's life cycle.It doesn't revolutionize the estimation cost method, but rather, improves upon the traditional approach of construction estimating and improves a client's knowledge of whole-costs.The framework seeks to consider the most important cost and non-cost factors to estimate the total cost of projects with clear steps to collect the relevant variables.This is perhaps a step beyond current methods which lack a standardised method to collect such variables.Using the concept of CSIs at each phase of construction projects has proved to be a valuable method towards the improvement of estimation cost practices.The most significant aspects of this framework is taking advantage of CSIs and integrate it with ANNs to improve the accuracy of estimation, saving time and cost in the process.Regarding the accuracy of estimation, an average variation of 1% was noted between the prediction costs and actual costs in the case study.Compared to other studies, the variation obtained here is less than obtained from previous work.This study is part of an ongoing study which aims to simplify and improve the accuracy of estimating costs in order to accelerate the understanding and implementation of LCC in the construction sector.A key aim of this paper was to create a new direction for future work.The next section illustrates the essential further development of framework and model.

Future Work
The following bullet points represent a way forward:  Going beyond the current very extensive data-set cited by Al-Hajj (as part of his PhD datageneration purposes) further research is necessary to apply the CSIs at all stages of proposed/recent local construction projects to now identify the most important cost factor affecting estimation of cost at each stage of construction;  Further research is necessary to identify key non-cost factors affecting the estimation cost; analysis of the literature and historical data will help determine these factors;  ANN models may be developed based on the framework suggested here to estimate the cost at each stage of construction projects. The sample of data used to train and test the ANN models will be extended in order to validate the ANN model.These recommended research directions would provide a better understanding of the framework.

Conclusion
This paper introduced a new framework for estimating LCC cost at each stage of construction projects.The framework developed here uses artificial neural networks in order to simplify and improve estimation processes.Identification of the main cost and non-cost/design factors affecting the accuracy of estimation cost at each stage of LCC is noted as an important step of such a framework.The concept of Cost Significant Items (CSI) alongside analyses of available literature as well as an analysis of an historical data-set was used in an application of the developed framework.Case-study analysis utilised 20 previously catalogued building projects to test the developed framework and sought to estimate the total running cost of this data-set.Two methods were used to develop NN modelling; back-propagation and spread-sheeting using Excel Solver.Both NNs modelling techniques were able to provide reliable results.It is suggested that these encouraging findings indicating improvements in a more reliable estimation process be further compared by multiple-regression modelling.
Both methods use three-layer NN, because it is argued here that they provide simple transparent methods towards NN modelling.The neural network model developed in this paper consists of three layers (as figure2below): 1-Input layer of 8 nodes: type of building, gross floor area, area of pitched roof, area of flat roof, area of external glazing, number of stories above ground floor, number of stories under ground floor and CSIs.2-hidden layer (trial and error in a training stage used to determine number of nodes) 3-Output layer contained one node (the total running cost).

Figure 2
Figure 2 neural network structure with three layers

Figure 3
Figure 3 Structure of the Best Model

Figure 4
Figure 4 Regression analyis of both models at training and testing stage

Table 1 cost significant items Identification of Design Factors influencing Estimation Accuracy
Table (1) below lists the resultant cost significant items for the data-set: Seven major design factors from Al Hajj's (1991) catalogue of 20 building projects, described as the most important design factors affecting building's estimation cost, are listed in table 2 below.Alqahtani, A and Whyte, A (2013) 'Artificial neural networks incorporating cost significant items towards enhancing estimation for (life-cycle) costing of construction projects', Australasian Journal of Construction Economics and Building, 13 (3) 51-64 57

Table 2 non-cost significant factors Design Neural Networks a) Selection NNs software/simulation and data collection This
paper compares two methods to create and design neural networks: 1-back-propagation method using MATLAB SOFTWARE (2012b) 2-spread sheet optimization using Microsoft Excel Solver.

Table 3 20 model trails for determining the best model
Figure 3 below present the properties of the best neural network model gained through the trial and error method at a training stage.
Alqahtani, A and Whyte, A (2013) 'Artificial neural networks incorporating cost significant items towards enhancing estimation for (life-cycle) costing of construction projects', Australasian Journal of Construction Economics and Building, 13 (3) 51-64 59

Table 4 Result of both neural network model for all 20 projects
Table 4 below.
For both models, in training and testing tests, R 2 is close to 1, indicating a good fit and linear correlation between the actual running cost and the neural network result at training and testing stage.

Table 5 the expected accuracy of both models
Alqahtani, A and Whyte, A (2013) 'Artificial neural networks incorporating cost significant items towards enhancing estimation for (life-cycle) costing of construction projects', Australasian Journal of Construction Economics and Building, 13 (3) 51-64 61