An Unsupervised Text-Mining Approach and a Hybrid Methodology to Improve Early Warnings in Construction Project Management (original) (raw)

A Study of the Data Mining of Meeting Minutes of Construction Projects

Stellenbosch : Stellenbosch University, 2020

This research is motivated by the increased use of big data and the need to decrease the cost/time overruns experienced in the construction industry. During the construction period of a project, numerous factors contribute to the outcome of the project. Simply knowing some of these factors may not contribute to the successful completion of the project. Being able to use the known and the unknown factors to create a model that can predict the outcome of a project will enable the project management team to make informed decisions. This research aims to determine if the information currently being recorded in site progress meeting minutes, is sufficient to use in data mining applications for the prediction of the outcomes of a project, and to establish if new knowledge can be obtained from this process. Data mining to aid project management in the construction industry has seen limited application, especially in South Africa. Data mining is part of the Knowledge Discovery in Data (KDD) process, which is used to learn new information from data. The research starts with a literature review to identify a list of factors that influence the outcome of projects-positively and negatively. From the identified project outcome factors, the two that are highlighted most often are leadership and planning. These two overarching categories were used to determine if and how influencing attributes are recorded in the site meeting minutes. The current uses of data mining in the construction industry were investigated to determine how data mining and KDD have been implemented in the industry. Although KDD has been applied in the construction industry, no information was found about its application in the South African construction industry. Some of the reasons why it has not yet been implemented could be related to copyright, privacy and data security, and lack of incentives to implement data mining. An investigation of several projects' meeting minutes was undertaken where the meeting minutes were data mined to determine if they can be used to predict the outcome of future projects. The two overarching categories above where used to identify the information that is present in the meeting minutes. These attributes were then used as the data mining features. Two data mining applications were used to compare the applications and to validate the results. The most accurate data mining models were created using the Random Forest data mining algorithm. The prediction models are able to predict the outcome of future projects with a high degree of certainty.

Knowledge discovery from post-project reviews

2011

Many construction companies conduct reviews on project completion to enhance learning and to fulfil quality management procedures. Often these reports are filed away never to be seen again. This means that potentially important knowledge that may assist other project teams is not exploited. This paper investigates whether Knowledge Discovery from Text (KDT) and text mining (TM) could be used to "discover" useful knowledge from such reports. Text mining avoids the need to manually search a vast number of reports, potentially of different formats and foci, to seek trends that may be useful for current and future projects. Pilot tests were used to analyse 48 post-project review reports. The reports were first reviewed manually to identify key themes. They were then analysed using text mining software to investigate whether text mining could identify trends and uncover useful knowledge from the reports. Pilot tests succeeded in finding common occurrences across different projects that were previously unknown. Text mining could provide a potential solution and would aid project teams to learn from previous projects. However, a lot of work is currently required before the text mining tests are conducted and the results need to be examined carefully by those with domain knowledge to validate the results obtained.

Dealing with construction Cost overruns using data mining

2014

One of the main aims of any construction client is to procure a project within the limits of a predefined budget. However, most construction projects routinely overrun their cost estimates. Existing theories on construction cost overrun suggest a number of causes ranging from technical difficulties, optimism bias, managerial incompetence and strategic misrepresentation. However, much of the budgetary decision-making process in the early stages of a project is carried out in an environment of high uncertainty with little available information for accurate estimation. Using non-parametric bootstrapping and ensemble modelling in artificial neural networks, final project cost-forecasting models were developed with 1600 completed projects in this experimental research. This helped to extract information embedded in data on completed construction projects, in an attempt to address the problem of dearth of information in the early stages of a project. 92% of the 100 validation predictions were within ±10% of the actual final cost of the project whiles 77% were within ±5% of actual final cost. This indicates the model's ability to generalise satisfactorily when validated with new data. The models are being deployed within the operations of the industry partner involved in this research to help increase the reliability and accuracy of initial cost estimates.

Text mining in the identification of duties and responsibilities of the project manager

Scientific Papers of Silesian University of Technology Organization and Management Series, 2020

An attempt to identify the duties and responsibilities of the project manager by analysing job offers from a job website. An attempt to determine whether there were any changes between 2018 and 2019. Design/methodology/approach: Text mining was performed for fragments of job offers, describing the duties and responsibilities. The text mining analysis consisted of initial processing of the text, creation of a corpus of analysed documents, construction of a word frequency matrix and use of classical methods from the data mining are. Findings: The most common words in job offers are presented, as well as their correlation with other words. With the use of the Topic modeling algorithm, hidden topics describing the analysed job offers have been generated. These topics can also be used to identify the duties and responsibilities of a project manager. Research limitations/implications: Only the job offers meeting the following conditions were analysed: (1) they concerned the job of "project manager"; (2) the content was in Polish; (3) they were provided by www.pracuj.pl website; (4) they were collected from 09 to 11 April in 2018 and 2019. Practical implications: This method can be used by organizations training project managers, in order to modify and better adjust the curriculum to the needs of the labour market. Originality/value: Research has shown that text mining can be used to determine the responsibilities of a project manager by analysing job offers.

Text mining of Post Project Reviews

2008

Post Project Reviews (PPR) are a rich source of knowledge and information for organisations-if they have the time and resources to analyse them. Too often such reports are stored, unread by many who can benefit from them. PPRs attempt to document the project experience-both good and bad. If these reports were analysed collectively, they may expose important detail, perhaps repeated between projects. However, because most companies do not have the resources to examine these PPR, either individually or collectively, important insights are missed thereby leading to a missed opportunity to learn from previous projects. Hidden knowledge and experiences can be captured by using knowledge discovery and text mining to uncover patterns, associations, and trends in data. The results might then be used to enhance processes, improve customer relationships, and identify specific problem areas to address. This paper outlines an ongoing research project that investigates the use of knowledge discovery and text mining on Post Project Reviews. An illustrative example will be presented using case studies from the construction sector. The PPR processes of two construction companies were mapped with the aim of understanding the context, format, terminologies used and key knowledge areas suitable for text mining. The textual examination of the PPR reports was complemented by semi-structured interviews and workshops to understand the production and content of the reports. Preliminary results highlight that although organisations have publicised, standard processes for PPR, there is a variance in how these are conducted and produced on a regional basis. These variances provide a number of challenges for organisations from a corporate perspective. Also, there is an over-reliance on key individuals with little attempt to make some of their knowledge more explicit and therefore easier to disseminate between project team members. This paper summarises the challenges in identifying the type of knowledge to be text mined, the format of PPR reports and the process of conducting PPR. It will also highlights the development of suitable 2 ontologies for text mining PPR reports and provides recommendations on how to improve the PPR process of companies.

Understanding Construction Site Safety Hazards Through Open Data: Text Mining Approach

ASEAN Engineering Journal, 2021

Construction is an industry well known for its very high rate of injuries and accidents around the world. Even though many researchers are engaged in analysing the risks of this industry using various techniques, construction accidents still require much attention in safety science. According to existing literature, it has been found that hazards related to workers, technology, natural factors, surrounding activities and organisational factors are primary causes of accidents. Yet, there has been limited research aimed to ascertain the extent of these hazards based on the actual reported accidents. Therefore, the study presented in this paper was conducted with the purpose of devising an approach to extract sources of hazards from publicly available injury reports by using Text Mining (TM) and Natural Language Processing (NLP) techniques. This paper presents a methodology to develop a rule-based extraction tool by providing full details of lexicon building, devising extraction rules ...

"My cost runneth over": Data mining to reduce construction cost overruns.

Procs 29th Annual ARCOM Conference, 2013

Most construction projects overrun their budgets. Among the myriad of explanations giving for construction cost overruns is the lack of required information upon which to base accurate estimation. Much of the financial decisions made at the time of decision to build is thus made in an environment of uncertainty and oftentimes, guess work. In this paper, data mining is presented as a key business tool to transform existing data into key decision support systems to increase estimate reliability and accuracy within the construction industry. Using 1600 water infrastructure projects completed between 2004 and 2012 within the UK, cost predictive models were developed using a combination of data mining techniques such as factor analysis, optimal binning and scree tests. These were combined with the learning and generalising capabilities of artificial neural network to develop the final cost models. The best model achieved an average absolute percentage error of 3.67% with 87% of the validation predictions falling within an error range of ±5%. The models are now being deployed for use within the operations of the industry partner to provide real feedback for model improvement.

A Method of Detecting Early Warning in Project Management using Classification Approach

Aia, 2005

The importance of accurate estimation of student's future performance is essential in order to provide the student with adequate assistance in the learning process. To this end, this research aimed at investigating the use of Bayesian networks for predicting performance of a student, based on values of some identified attributes. We presented empirical experiments on the prediction of performance with a data set of high school students containing 8 attributes. The paper demonstrates an application of the Bayesian approach in the field of education and shows that the Bayesian network classifier has a potential to be used as a tool for prediction of student performance.

Project Management using Data Mining 1

2015

It could be a huge challenge to ensure the standard of discovered connection options in text documents for describing user preferences due to giant scale terms and information patterns. Most existing in style text mining and classification strategies has adopted term-based approaches. However, they need all suffered from the issues of ambiguity and synonymousness. Over the years, there has been usually command the hypothesis that pattern-based strategies ought to perform higher than term-based ones in describing user preferences; nonetheless, a way to effectively use giant scale patterns remains a tough downside in text mining. to create a breakthrough during this difficult issue, this paper presents associate innovative model for connection feature discovery. It discovers each positive and negative patterns in text documents as higher level options and deploys them over low-level options (terms). It conjointly classifies terms into classes and updates term weights supported their s...