Predicting the academic success of architecture students by pre-enrolment requirement : using machine-learning techniques

In recent years, there has been an increase in the number of applicants seeking admission into architecture programmes. As expected, prior academic performance (also referred to as preenrolment requirement) is a major factor considered during the process of selecting applicants. In the present study, machine learning models were used to predict academic success of architecture students based on information provided in prior academic performance. Two modelling techniques, namely K-nearest neighbour (k-NN) and linear discriminant analysis were applied in the study. It was found that K-nearest neighbour (k-NN) outperforms the linear discriminant analysis model in terms of accuracy. In addition, grades obtained in mathematics (at ordinary level examinations) had a significant impact on the academic success of undergraduate architecture students. This paper makes a modest contribution to the ongoing discussion on the relationship between prior academic performance and academic success of undergraduate students by evaluating this proposition. One of the issues that emerges from these findings is that prior academic performance can be used as a predictor of academic success in undergraduate architecture programmes. Overall, the developed k-NN model can serve as a valuable tool during the process of selecting new intakes into undergraduate architecture programmes in Nigeria.


Introduction
Nigerian universities are experiencing an increase in the number of applicants seeking admission. For admission committees this poses serious yearly problems in terms of identifying, selecting and admitting outstanding students into the various departments. The selection of students by such committees is usually based on prescribed criteria for each programme at the different universities. A robust selection process should ensure that the best students are selected for each programme (Holt et al., 2006;Neame et al., 1992;Young, 1989). Furthermore, the previous academic performance of applicants is crucial information needed to aid the selection process. Grades obtained in secondary school terminal examinations, the Unified Tertiary Matriculation Examination (UTME) and post-UTME test scores (the post-UTME test is conducted by each university) are sources of key information especially in selecting exceptional students into the architecture undergraduate programme.
Conventionally, admission policies are designed to select applicants with high prior performance. This is because it is generally believed that high prior academic performance may be positively related to high academic success. The main challenge faced by many admission screening committees is the identification of the criteria (i.e. prior academic performance) which influence academic success of undergraduate students. Research into the relationship between the two variables (i.e. prior academic performance and academic success) has a long history (e.g. Young, 1989). Although the findings of previous studies (e.g. Abisuga et al., 2015;Allen and Carter, 2007) have shown that a positive relationship exists between these two variables, a significant portion of those studies are focused on 'explanatory' modelling. Shmueli (2010) points out that 'explaining' and 'prediction' are two distinct terms in statistical modelling. Further, it has been argued that predictive modelling is essential for testing theories which leads to generation of new knowledge (Shumeli, 2010;Runeson, 2011). In addition, Coleman (2007) asserted that prediction is a valid approach to test the validity of a theory against alternative theories.
The objectives of this research are to: (1) develop models for predicting academic success of undergraduate architecture students using prior academic performance as predictors; and (2) identify the most important predictors (i.e. academic performance) which have a significant impact on academic success. Two predictive models, namely: the k-NN algorithm and linear discriminant analysis were applied in this study. The need for this investigation is justifiable for several reasons. First, the number of applicants seeking admission into Nigerian universities have significantly increased in recent years. The developed model could be used as a decision support tool during the process of screen applicants. Second, reduction in attrition rates and effective utilization of resources expended on student's training. Government policies targeted at attracting applicants from educationally disadvantaged areas so as to meet the educational needs of such communities. The developed model could assist in identifying 'weak' students, who will require additional learning support. Finally, modelling provides a platform for evaluating alternatives strategies (Ogunlana, Li and Sukhera, 2003) and theories (Shmueli, 2010). The information provided by the model would be useful for developing strategies for identifying the 'best' applicants, who possess the appropriate knowledge for academic success in the undergraduate architecture programme.

Review of literature
A review of past studies that focused on the relationship between prior academic performance and the success of undergraduate students was conducted. The information generated from this review served as a basis for comparing the outcome of the present study to what was found in the past.

Academic success
A considerable amount of literature has been published on the academic success of university students (at undergraduate and postgraduate levels). The term 'academic success' refers to a phenomenon that incorporates academic achievement, attainment of learning objectives, acquisition of desired skills and competencies, satisfaction, persistence and post college performance (York et al., 2015). Academic success has also been viewed as completion of academic activities which improves the academic achievement of the student concerned. Academic success is important in achieving the set objectives of knowledge and skill development during the process of learning. Hence, there is a need to understand the underlying factors that affect students' academic success at the university. The literature on academic success has highlighted that several factors affect academic success of undergraduate students at university level. This has led to development of several theories found in the field of education, such as pedagogical theory, curriculum theory, and learning theory among others. Empirical studies have investigated the impact of a whole range of factors on the academic performance of undergraduate students in universities (Parker et al., 2004;Young, 1989). It is worth noting that the terms 'achievement' and 'academic success' are used interchangeably in the literature. Herminio (2005) classified two factors affecting academic success, namely: internal and external factors. The internal factors are class schedule, class size, classroom environment, role of the lecturers, technology and nature of examination while the external factors include extracurricular activities, family and work activities. Herminio's (2005) findings show that internal factors are much more significant than the external factors. Further, Ling et al. (2010) examined the effect of teaching and learning approaches on academic performance. It was found that the growing teaching approach and the achieving motive learning approach are related to improved academic success. Similarly, studies (e.g. McKenzie and Schweitzer, 2001;Win and Miller, 2005) have demonstrated that prior academic performance has significant impact on the academic success of first-year university students. Although Bone and Reid (2011) contend that numerical assessment scores/grades obtained in a subject may not be a true reflection of the knowledge gained by a student on a topic/subject; a numerical score/grade obviously remains the most available proxy for measuring academic success. The aim of the present study is not to investigate all the factors affecting academic success of undergraduate students. For the purpose of the study, the discussion presented in the next section is limited to the impact of prior academic performance on academic success of undergraduate students.

Prior academic performance and success on undergraduate programmes
Recent studies have examined the relationship between previous academic achievement and the academic success of undergraduate students at universities (see Abisuga et al., 2015;Curtis et al., 2007;Whyte et al., 2011;Shahiri and Husain, 2015). Curtis et al. (2007) reports that admission criteria are weak predictors of academic performance for first year students in a dentistry programme. Similarly, Whyte et al. (2011) determined that the best predictors of academic success for nursing and paramedics students are index scores and student's maturity in age. Furthermore, a number of studies confirmed that academic entry criteria and age maturity are the best predictors of academic success in undergraduate programmes in both nursing and paramedics (Van Rooyen et al., 2006;Whyte et al., 2011). The studies presented thus far provide evidence that a positive relationship exists between prior academic performance and academic success of undergraduate students. However, it should be noted that other factors influence the impact of this relationship.
The generalisability of much published research on this issue (i.e. the positive relationship between academic performance and academic success for undergraduate students) is problematic. Ting's (2001) findings suggest that a combination of prior academic achievement and psychological variables are significant predictors of academic success on undergraduate engineering programmes. However, Poole et al. (2007) affirms that admission criteria are significant predictors of academic performance in the first two years of undergraduate study in dentistry but dwindle in later years. In total contrast, studies such as Bone and Reid (2011) report that no relationship exists between prior academic achievement and academic success on undergraduate programmes in medicine and biology respectively, while Kirby and Dempster (2014) attest that providing accommodation and financial support are vital predictors of academic success on a foundation programme in a South African university. Roberts (2007) concludes that cognitive style might be a better predictor of students' ability to attain academic success irrespective of whether a student had previously studied arts or science. Based on the aforementioned, it is evident that there are contrasting results from studies on the relationship between prior academic achievement and academic success. However, it is important to gain insights into the strength of these relationships in previous studies focused on built environment disciplines.
Previous studies into the relationship between prior academic performance and academic success for undergraduate programmes in built environment disciplines are relatively few. Allen and Carter (2007) revealed that academic success is largely dependent on grades obtained by students in core-knowledge courses taken at earlier stages of study in a real estate programme. Abisuga et al. (2015) reported that admission criteria are weak predictors of academic success for building technology students. In a similar vein, Newell and Mallik (2011) found that prior academic performance in mathematics is significantly related to academic success for property undergraduate degree programme. In view of all the studies that have been reviewed so far, certain inferences can be drawn: (1) a relationship exists between prior academic achievement and academic success across all disciplines. However, the strength of this relationship weakens as students progress to later years; (2) studies have largely focused on 'explaining' the relationship rather than 'prediction' (see Shmueli, 2010 for detailed information on the difference between 'explain' and 'predict'); (3) the preference for linear modelling techniques was evident in previous built environment disciplines (Abisuga et al. 2015;Allen and Carter, 2007;Newell and Mallik, 2011). Therefore, the present study investigates the relationship between prior academic performance and student's academic success on an undergraduate architecture programme using machine-learning techniques. The study addresses a gap in the literature by applying machinelearning techniques which possess the ability to capture non-linear relationships present in real life data. Also, the estimated relationship is used for prediction.

Method
Many researchers have examined the determinants of academic success in built environment programmes using various methods. In literature, the research methods utilized include: questionnaire survey (Ling et al., 2010), simulation (Long et al., 2009) and longitudinal survey (Guillermo et al., 2014). Although several research methods have been applied in literature, it is important to note that the suitability of a particular approach to addressing a research problem is a principal factor considered in selecting a research method. The modelling research method offers an effective way of examining the relationship between dependent and independent variables (Fellows and Liu, 2015). The underlying pattern that is uncovered can then be used for prediction. In addition, prediction modelling is useful in theory/hypothesis testing (Shmueli, 2010). Therefore, two machine learning modelling techniques (i.e. k-NN algorithm and linear discriminant analysis) were used to predict academic success of undergraduate students on an architecture programme in a Nigerian university. Linear discriminant analysis was applied because this technique was employed in early studies (e.g. Young, 1989). Also, the k-NN modelling technique has provided useful predictions in different fields of academic endeavours, such as ergonomics (Sánchez et al., 2016); traffic engineering (Yoon and Chang, 2014) and medicine (Zhu et al., 2007) among others. The predictive accuracy (i.e. generalisation) of the developed models were compared. Due to space constraints, readers interested in the application of machine-learning models are referred to Cortez (2015).

Ethical consideration
Permission was obtained from the concerned academic department prior to data collection. In order to comply with ethical requirements, the academic records of each student was anonymised by the administrators in charge of the database. Subsequently, the anonymised data was transferred to the first author. This ensured that no connection could be made with any individual. It should be noted that issues relating to admission criteria and access to student's entry information are handled by senior academics and top university administrators. In addition, there is an internal process for cross-checking marks assigned for each course taken by each student. This serves as quality assurance in order to preserve objectivity in the metrics used to evaluate students' academic success.

Data
The architecture undergraduate programme at the Olabisi Onabanjo University began in 2003. The curriculum includes studio-based learning strategies and integration of relevant courses from other built environment disciplines. Currently, the admission criteria are designed to provide entry opportunities into the architecture programme for high academic achievers (based on prior academic achievement) and applicants from educationally disadvantaged areas. Further, the undergraduate architecture programme is accredited by the National Universities Commission.
The developed models were trained using data collected from the Department of Design and Architecture, Olabisi Onabanjo University, Ogun State, Nigeria. The collected data contained information on 102 students that completed the undergraduate programme between 2011 and 2014. This provided information relating to prior academic achievement and academic success for each student. Before fitting the collected data to the models, data cleaning was carried out. Owing to some missing information, the data for only 101 students were used during the model development phase of the current study.  student's grade (numeric: from 9 -A1 to 0 -no grade) English (ENG) student's grade (numeric: from 9 -A1 to 0 -no grade) Physics (PHY) student's grade (numeric: from 9 -A1 to 0 -no grade) Biology (BIO) student's grade (numeric: from 9 -A1 to 0 -no grade) Chemistry (CHEM) student's grade (numeric: from 9 -A1 to 0 -no grade) Local language (e.g. Yoruba-YOR) student's grade (numeric: from 9 -A1 to 0 -no grade) Geography (GEO) student's grade (numeric: from 9 -A1 to 0 -no grade) Technical Drawing/Fine Arts (TD) student's grade (numeric: from 9 -A1 to 0 -no grade) Economics (ECON) student's grade (numeric: from 9 -A1 to 0 -no grade) Further Mathematics (FM) student's grade (numeric: from 9 -A1 to 0 -no grade) Agricultural science (AGRIC) student's grade (numeric: from 9 -A1 to 0 -no grade) Total UTME score a (JAMB) Total score (numeric: from 1 to 0) Direct entry (DE) Mode of entry (binary: yes-1 or no) Academic Success CGPA at graduation (Binary: 1-Pass or 0-Fail) a Total UTME score = student's UTME score divided by 400 The collected data include 13 input (independent) variables, which are measures of prior academic achievement, and one output (dependent) variable. The input variables are the grades obtained in the Ordinary ('O') level examination, total score in the University Matriculation Examination (also called JAMB or UTME) and mode of entry (which is a dummy variable 1= Direct Entry and 0 = JAMB). The collected data are similar to those used in an earlier study (see Young, 1989). For the output variable, the cumulative grade point average (CGPA) obtained by each student upon completion of the architecture undergraduate programme was used as a measure of academic success. The numerical values of CGPA were transformed into categorical classes. The classification places each student into either of the two classes 'Pass' and 'Fail'. The detail of each class is presented in Table 1. It is important to note that achieving a minimum CGPA of 2.4 is a criterion used in selecting those students that will proceed to the Master of Science (MSc) Architecture programme. Therefore, this criterion is used as a measure of academic success. The details of the input variables and output variables used in developing the models are presented on Table 2.

Machine-learning models
In the present study, two modelling techniques were used to predict students' academic success: linear discriminant analysis and k-nearest neighbour. The machine-learning algorithms (i.e. linear discriminant analysis and k-nearest neighbour) were implemented in R programming software (R Core Team, 2015) and Rminer R-package (Cortez, 2010). Classic linear regression models are used to examine the relationship between a set of independent variables and a dependent variable. It is imperative to note that the linear regression model is not suitable for predicting a categorical dependent variable (i.e. a classification problem). Hence, the two machine-learning techniques were applied in the present study.
Linear discriminant analysis is a useful method for classifying cases (each student) into one of two groups based on a set of features (i.e. admission criteria for architecture undergraduate students). Each student is assigned to one of the predetermined groups based on CGPA at the end of their undergraduate programme (see Table 1). Discriminant analysis has been shown to be useful for classification problems in several academic disciplines. Typical examples can be found in classification of citrus fruits (Iqbal et al., 2016), screening for elderly drivers (Ferreira et al., 2012) and academic performance of first-year students (Young, 1989). (For additional details on the linear discriminant analysis model, Izenman (2008) may be consulted.) K-nearest neighbour (k-NN) is a machine-learning technique that has been applied to classification and regression tasks. In this research k-NN is used for classification. According to Parsian (2015), the underlying principle behind the k-NN algorithm is that no prior assumption is made about the function f: Where y is a dependent variable and x i are the independent variables. The function f is nonparametric because no parameter is estimated. Given new data sets (i.e. test data), the algorithm dynamically identifies k observations in the training data that are similar to p (the k nearest neighbour). The neighbours are determined by a similarity measure that is computed between the observations based on independent variables. The Euclidean distance between the independent variables in the training set ( ) (2) Parsian (2015) provides a detailed explanation of k-NN.
To validate the developed models, the collected data were divided into two: training set (70%) and test set (30%). The training data set was initially fitted to the model. Subsequently, the trained model was used to predict previously unseen data, i.e. test data set. This is done to evaluate the predictive performance of the model (generalisation capability). The percentage of correctly classified (degree of accuracy) and Cohen's kappa coefficient (Kappa) are computed as measures of predictive performance. In addition, a sensitivity analysis as explained in Cortex (2010) was applied to evaluate the relative importance of the input variable in the developed model.

Results
The processes carried out prior to fitting the model are described in the preceding section. The training data sets were fitted to linear discriminant analysis and the k-nearest neighbour model. The predictive accuracy of all the models are presented in this section.

Linear discriminant analysis
Admission criteria were used in the linear discriminant analysis to predict the academic success of architecture undergraduate students ('Pass' or 'Fail'). The purpose of linear discriminant analysis is to identify the best linear combination of independent variables to produce the best prediction of classifications (Verma, 2013). The linear discriminant analysis generated the coefficients presented in Table 3, which shows that the variables which impact the most on students' academic success are JAMB scores (with an absolute of 4.01), direct entry applicants (with an absolute of 3.36), and grades obtained in mathematics at 'O' level examinations (with an absolute of 0.49). The predictions generated from the linear discriminant analysis are presented in Table 4. The measure is evaluated by comparing the observed misclassification rate to that expected by chance alone. The percentage of cases correctly classified can be regarded as the benchmark for assessing the effectiveness of the discriminant function. In this study, 50.0% of all cases were correctly classified (see Table 3). However, the kappa statistic is -0.230. This indicates poor agreement between the predictive and actual class in the test data set.

K-nearest neighbour (k-NN)
The k parameter in the k-NN algorithm is a user-defined parameter. In this study, an initial experiment was carried out by setting k at values of 1, 3, 4, 5, 6, 7 and 9. Subsequently, k was set at 1. This is because no significant improvement in the accuracy of the k-NN model was observed due to changes in the value of k. The results of out-of-sample prediction (i.e. the test set) for the k-NN model are presented on Table 5. Overall, 73.33% of the instances (cases) were correctly classified. Kappa statistic is calculated as 0.318.

Comparison of various models
This section provides a comparative analysis of the predictive performance of the different models developed in this study for predicting the academic success of architecture graduates in a Nigerian university. The predictive accuracy of the developed models served as a basis for comparison. The results of the evaluation test are presented in the previous section (see Tables 4  and 5). It is evident that the k-NN model outperforms the linear discriminant analysis model, in terms of percentage of correctly classified cases and kappa statistic.

Sensitivity analysis
Compared to k-NN, linear models (such as regression) are easy to interpret; hence, machinelearning techniques are also described as 'black box' models. After using the developed k-NN model for prediction, a sensitivity analysis was carried out. Sensitivity analysis is a technique used to extract additional information on the importance of each independent variable in predicting the dependent variable in machine-learning models (see Cortez et al., 2009;Tinoco et al., 2011). Figure 1 shows the importance attributed by k-NN to each input variable (i.e. prior academic performance) based on sensitivity analysis. As can be seen from the figure, it is evident that the most influential input variables are MATH, PHY, and CHEM.

Discussion
The present study investigated the influence of prior academic performance on academic success of architecture undergraduate students. As explained in the literature review, it is evident that a number of factors influence academic success of students. However, it is worth mentioning that the current study was focused on predicting academic success of undergraduate architecture students using prior academic performance as predictors. The results show that the k-NN model is reliable. The overall accuracy of the developed k-NN model is 73.33%. The grades obtained in the following subjects: mathematics, physics, and chemistry grades at 'O' level examinations are significant predictors of academic success of architecture undergraduate students'. Surprisingly, grades obtained in local language were found to have a significant impact on academic success of architecture students. A possible explanation for this might be that lecturers explain complex architecture terms in the local language of the region where the case university is located. A note of caution is due here, since the current investigation did not entail observation of teaching process during lectures.
In this study, it was found that prior academic achievement is a significant predictor of academic success of architecture undergraduate students. This finding is in line with those obtained in previous studies (e.g. van Rooyen et al., 2006;Newell and Mallik, 2011;Abisuga et al., 2015). However, the coefficient of determination found in previous studies (Newell and Mallik, 2011;Abisuga et al., 2015) suggest that the strength of this relationship is weak. This inconsistency may be due to the fact that these studies were based on linear regression which may not be able to capture non-linearity present in real world data. It should be noted that studies, such as Bone and Reid (2011), reported that prior academic performance in a certain combination (i.e. biology and chemistry) of subjects' influence academic performance of students in level-one biology course Molecules Genes and Cells, however, prior knowledge of biology alone had no impact on academic success in the first-year biology course. Also, the current study found that prior academic performance in mathematics had the most significant impact on academic success of architecture students. These results are consistent with those of Newell and Mallik (2011) who reported that mathematics is an important determinant for academic success in a property undergraduate programme. Overall, the findings of the present study suggest that prior academic achievement is a key determinant in academic success in architecture undergraduate degree programmes. Due to the small sample size caution must be applied, as the findings might not be generalizable to other undergraduate programmes without further testing.
In the coming years, it is likely that universities will reduce the academic requirements for gaining admission into academic programmes. This is due to low pass rates of terminal examinations at secondary school level (Kolawole and Dele, 2002;Asikhia, 2010) and the government policy targeted for attracting students from educationally disadvantage backgrounds. The need to lower admission standards to admit these cohorts of students may affect student retention rates. In addition, there is an evident need to address deficiencies in student's prior learning (e.g. mathematics) prior to entering university in order to improve student retention and academic success in architecture undergraduate programmes. This creates an additional burden for stakeholders (e.g. universities, parents, etc.) due to the cost associated with such intervention strategies. Provision of additional tutorial classes for 'weak' students is a typical example of an intervention strategy. Since the purpose of the present study is to investigate the predictability of students' academic success using admission criteria as predictors, it is evident that the developed k-NN model can provide a reliable forecast. This information would be particularly useful for stakeholders involved in making decisions regarding admission of new architecture students into undergraduate programmes at universities.

Conclusion
The present study investigated the impact of prior academic achievement on academic success in an undergraduate architecture degree programme. The study is based on data collected from 101 students who completed a taught undergraduate architecture degree programme at the Olabisi Onabanjo University between 2011 and 2014. In order to evaluate the generalizability of the models, the training dataset was fitted to two models (i.e. Linear discriminant analysis and k-NN) and the developed models were used to forecast the test dataset (i.e. previously unseen data). This guaranteed that valid inferences can be drawn for theory testing.
Based on the results of the current study, it is evident that prior academic achievement is a significant predictor of academic success in this undergraduate architecture programme in Nigeria. In addition, prior academic performance in mathematics, physics, chemistry, and local language are significant determinants of academic success in the undergraduate architecture programme. This is in line with previous studies (Abisuga et al., 2015) which have acknowledged the role of prior academic performance in academic success of undergraduate students. Fellows and Liu (2015) assert that studies based on prediction, such as the one reported here, are a valid means for testing theories.The findings of this study suggest that there is a need to develop and implement strategies aimed at improving academic performance of students in terminal examinations at secondary school level. This can be achieved by, for example, additional investments in teacher training, motivation of students, and innovative teaching methods. This will ensure that there is significant improvement in the knowledge gained and the academic performance in secondary school subjects, which could substantially improve academic success of students, and impact positively on student retention rates at university level.
There are obvious limitations that affect the generalization of the results of this kind of study. The current study is focused entirely on using prior academic performance as predictors of academic success for architecture undergraduate students. The influence of other factors (such as teaching method, learning style, etc.) on the academic success of architecture undergraduate students is not discussed here, but may have the potential to add to this debate if further research were conducted in this area. Despite this limitation, the application of modeling techniques ensures that valid inferences can be drawn. The present study provides a benchmark against which similar future studies can be evaluated. Therefore, it can be suggested that the developed k-NN model could be a useful decision support tool that can be used by stakeholders in the process of selecting new candidates and identifying those students that need additional support in undergraduate architecture programmes at Nigerian universities.