Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

download Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

of 16

Transcript of Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    1/16

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    2/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Introduction

    Experience of project data mining over the last 40-50 years showed that utilization in this area of

    primitive methods of statistical analysis gives equally primitive and most importantly wrongresults. For unknown reasons, it is assumed that complex and multifaceted process of project

    data mining can be solved by such a primitive method, which is a direct application of regression

    analysis to these data.

    Even the prolonged failure in this area is unable to shake people's faith that their intuition is

    sufficient to solve complex quantitative problems of project data mining. Sometimes peoplemake even such a statement that the application of basic quantitative methods in this area is an

    end in itself and cannot lead to success.

    Specifying the above statements for project data mining one can be simply amazed by the

    insistence of the leading universities and research centers that continue to use statistical methods

    for this specific purpose, despite the fact that these methods in terms of accuracy over the past

    40-50 years have never paid off in the field of project management.

    Usually, if some scientific methodology systematically does not work very well, people simply

    refuse its further usage, trying to replace it with new, more reasonable solutions to the problems.But to our surprise, this does not happen in the area of project data mining and estimation of

    project parameters. It's time to realize that to solve problems in this area there is a need to shift

    from the outdated methods of statistical analysis to a more scientifically-sound methodologies.

    To do this it is necessary, following the experience of more developed areas of knowledge, to try

    to get out of simple empiricism which currently dominated in this area and to develop more

    sound scientific methodologies in the field of project management.

    Fortunately, one can cite many instructive examples from the other areas of knowledge. Nearly

    every serious quantitative science has passed through this way and it is not necessary to break anew ground in project management. In order to do this we must use the experience of those areas

    of knowledge, which in spirit are the closest to the problems of project management.

    In this sense, it is important to use the experience of classical thermodynamics, which has passed

    all the way from the primitive empiricism to the most current heights of scientific and practical

    achievements. Experience in other fields of knowledge shows that overcoming the limitations ofthe statistical approach one can proceed to the development of the genuine mathematical theory

    of projects.

    In this way, we must first get rid of the so-called statistical curse, when the results of dataprocessing are directly dependent on the choice of specific data. In a truly scientific approach,

    this cannot happen, and always stable results of data processing should be invariant with respect

    to specific data.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 2

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    3/16

    PM World Today June 2010 (Vol XII, Issue VI)

    As an example we can point to Ohm's law, the essence of which is independent of specific data.

    The same can be said about the other laws of a fundamental nature. Namely such approaches

    and theories must be developed in the field of project management.

    For example, if we study the functional relationship between the project total effort and itsduration, each new dataset could lead to a new and uncertain result.

    Of course, for 15-20 years one can collect data on, say, 10,000 projects and be confident that 7-8

    projects over the past month cannot change the statistical trend, derived from data on 10,000

    projects. But that does not mean that these stable results are correct and that this approach to dataprocessing is legitimate.

    In reality it is simply a self-deception, regardless of whether it is done consciously, or

    unconsciously. Assume we deal with the functional relationship between project effort and itsduration.

    Only the fact that the project data were collected over a long period of time makes the jointprocessing of the whole data meaningless, because of change in productivity during the long

    time of data collection due to new methodologies and tools.

    On the other hand, if we try to use for analysis purposes only the most recent projects, we will

    inevitably face the problem of non-applicability of statistical methods to small data.

    The persistent application of statistical methods in this case of small project data already wears

    cartoon character and can only be justified by considerations of business. Obviously, such a

    statistical approach to interpretation of small project data has nothing to do with the scientific

    method.

    Project data mining: State of the art

    Lets for the analysis of contemporary methods of project data use a database consisting of 56

    projects. The database contains information about the complexity of projectsW , their total effort

    E, the duration of projects T, average team size avN and productivity of teams P.

    Multi-parametric flat representation of these data with the aid of TRANSCALE tool [1] is shownin Fig.1. Lets using the sequence of coordinate axes denote this representation of data as

    [ avN ,T, E,P]. There are numerous other multi-parametric plane representations of these data.TRANSCALE tool enables smooth transitions between these representations.

    According to contemporary methods of project data mining, these data can be processed by

    statistical methods [2]. As a result of this empirical analysis the functional relationships between

    the parameters of projects can be obtained (Fig.2.1 - Fig. 2.8).

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 3

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    4/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Despite the fact that from a qualitative point of view, all the obtained curves have correct

    behavior, that is the logic of increases and decreases of project parameters are not violated, but

    this is not enough to ensure the adequacy and practicality of functional relationships obtained in

    this way.

    avN

    E

    Fig.1 [ avN ,T,E,P] presentation of project data in the flat multi-dimensional project space

    In addition, for one of the curves even qualitative adequacy is not ensured. It is a functional

    relationship between team productivity Pand the team average size avN that falls too fast. Other

    curves also contain qualitative discrepancies. Just these discrepancies cannot be detected with the

    naked eye.

    An overall analysis of statistical methods for processing project data shows that their accuracy is

    very low. This can be easily seen by applying the obtained empirical relationships for the

    individual assessments of projects. This circumstance indicates that the statistical methodology isa deadlock for the area of quantitative project management.

    A more detailed analysis shows that the statistical approach to the problems of data mining andproject estimation have two main drawbacks. Lets analyze these shortcomings using statistical

    curves, presented in Fig.2.1 - Fig. 2.8.

    1. These curves contain qualitative discrepancies, which simply means, that the trends presented

    in Fig.2 does not reflect the genuine behavior of functional relationships between project

    parameters.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 4

    TP

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    5/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Fig.2.1 Fig.2.2

    Fig.2.3 Fig.2.4

    Fig.2.5 Fig.2.6

    Fig.2.7 Fig.2.8

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 5

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    6/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Fig.2 Results of the statistical treatment of project data

    2. Even if these curves would not contain qualitative discrepancies, still they could not provide

    high accuracy of the estimates, because the result of the statistical processing of the entire system

    of data is a single fitting curve.

    According to the most elementary considerations, based on the method of least squares, a singlecurve is not able to provide a relevant accuracy for data mining and project estimation in

    principle.

    This problem can be solved only by replacing the data systems with the families of curves, rather

    than a single curve. Such a family of curves can be constructed based on different principles. The

    most basic and obvious of these principles is the construction of approximating curves using the

    state equation of projects with different conditions of constancy of the values of projectparameters.

    Representation of project data in the form of a family of curves

    For precise experimental investigation of phenomena people typically proceed as follows.

    If the phenomenon is described by the large number of parameters, two of them, the functional

    relationship between which is investigated, remain free, and the other parameters are kept

    constant. Then, changing the values of one of the free parameters, the values of the other freeparameter are measured. Then the same procedure is repeated for other constant values of other

    parameters. This approach permits the direct application of regression analysis for data analysis.

    But such an approach is possible only when there is a chance to control the parameters of the

    object under study.

    Unfortunately experimentation in such a classic manner in the area of project management is

    simply impossible, because it is connected with the huge organizational and financial difficulties.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 6

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    7/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Fig.3 Presentation of the functional relationship between project effort Eand its

    complexity W in the form of the family of curves

    Fig.4 Presentation of the functional relationship between team productivity Pand theaverage team size avN in the form of the family of curves

    For a more detailed discussion of the problem we turn to the state equation of projects [3].

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 7

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    8/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Fig.5. Presentation of the functional relationship between team productivity Pand project

    effort Ein the form of the family of curves

    In the field of project management, where there is no experiment in the classical sense of this

    word and the project data is a result of a random collection, there are other ways to overcomesuch difficulties.

    In particular, the project data can be divided into groups, using the condition of the relative

    constancy of one of project parameters.

    At the systemic level, the state equation of projects combines the parameters of the project and

    development team [3].

    WPTNav =** , (1)

    and

    WPE =* . (2)

    For the dividing of project data into groups, we can order that data by increasing values of team

    productivity, and divide this sequence of projects into groups. As a result we can have groups of

    projects with relatively constant values of productivity. Fig.3 represents the functionalrelationship between project effort and its complexity for the four groups of projects with

    relatively constant values of productivity. This allows us to replace functional relationship shown

    in Fig. 2.2 in the form of a single approximating curve, with the family of straight lines (Fig. 3),which have higher accuracy of approximation.

    Similarly, Fig.4 presents the functional relationship between team productivity and average team

    size for relativly constant values of the ratioT

    W, which is the throughput.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 8

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    9/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Comparing the accuracies of a family of curves, shown in this figure, with the accuracy of

    approximation, shown in Fig.2.7 it is easy to see that the accuracy in the case of a family of

    curves is higher. Decreasing the interval of the relative constancy of the ratio

    T

    Wone can

    achieve greater accuracy of approximation.

    Fig.5 represents another example, which shows the functional relationship between teamproductivity and project effort as a family of curves that is consistent with the zones of constant

    values of project complexity.

    Project data mining and project estimation have a common methodological

    basis

    From the methodological point of view project data mining and project estimation are closely

    linked, because they have a common conceptual framework. Therefore, lets consider theconceptual framework and common sources of information, on which are based both project data

    mining and estimation of projects.

    At the system level, the project can be represented by the following three main components.

    1. The model of accumulation of the work performed during the execution of the project or just a

    model of projects,

    2. The objectives of the project (development cost, project duration, risk and other program levelor corporate level goals and objectives)

    3. Restrictions imposed on the project.

    At a structural level, the presentation of the project with three components shown in Fig.6.

    Such a presentation of the project can be used for different purposes.

    In particular, it applies both to project data mining, and for the planning and execution of

    projects. Only in such diverse applications inputs and outputs for them differ from each other and

    have different meanings.

    In the case of the planning of projects having as inputs project complexity W and team

    productivityP, it is necessary to find out the total effort Erequired for the project and the

    distribution of that effort over time, including the definition of the planned project duration Tand the required number of people avN .

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 9

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    10/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Fig. 6 Quantitative presentation of projects with three components

    In the case of project data mining all parameters of individual projects are known and it is

    necessary to solve the problems associated with the classification of projects, and find out thefunctional dependencies between project parameters.

    Information, needed for the reconstruction of projects

    For the sake of simplicity, lets first determine which input information is needed for the

    reconstruction of the average behavior of a project.

    If in order to achieve such a goal to use as input information: 1. Project complexity W and

    2. Team productivityP, with the hope that these data are of sufficient reliability, then, on thisbasis can be estimated the amount of total project effort only.

    But for the planning or synthesis of a project we need not only the total amount of effort. Inaddition, we must have the distribution of this effort over time, which means that we must have

    the number of working people as a function of time. If the finding of this function is associated

    with difficulties, we must know at least the average team size avN .

    But the solution of the problem of finding of the effort distribution over time, having informationabout the complexity Wand productivity Ponly, is impossible in principle. This means that the

    solution to this problem requires additional input.

    To clarify the essence of this additional information, consider the possible different

    implementations of the same project (Fig. 7).

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 10

    Model of theprocess of work

    Project goals andobjectives

    Project constraints

    and restrictions

    Project model

    Outputs

    Inputs

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    11/16

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    12/16

    PM World Today June 2010 (Vol XII, Issue VI)

    in turn, will help us to improve the quality and accuracy of estimates and predictions of project

    parameters.

    1. Thus, as the complexity Wof project remains constant, then, of course, it cannot be the

    cause of changes in project parameters avN , T and E .

    2. Similarly, all changes of project parameters have little to do with the team productivity Pas well. In particular, the analysis of the functional relationship between team productivity

    Pand average team size avN indicates that the productivity is a slowly falling function of

    the number of people. In addition, for large values of the average team size change in

    productivity is so small that as a first approximation, team productivity can be considered

    as a constant. This means that only a small part of the change avN is related to the value

    of productivity and mainly that change is defined by the value of change of the duration of

    project T .

    3. If the value of team productivity Pis almost constant for the larger values of avN then itmeans that in this case the total project effort Ealso will have a constant value.

    4. In turn, this means that the distribution of project efforts over time is associated only with

    the values of avN ,T, avN , T and almost has nothing to do with the values of project

    complexity W and team productivity P.

    5. From here it can be made the main conclusion, which states that it is fundamentally

    impossible to obtain the effort distribution over time having as inputs project complexityWand team productivity Ponly.

    6. This means that any project estimation system designed for the definition of project

    duration and team size, along with input information on the project complexity and teamproductivity must have at least one more input. Otherwise, estimates of the project duration

    and average team size will be an arena of arbitrary decisions (by the way, is what ishappening now).

    Project objectives and effort distribution over time

    Analysis shows that to find out the distribution of project effort over time first of all we need to

    have information about the goals and objectives of project and their relative importance in

    achieving maximum benefits at the level of the whole enterprise.

    Find out the effort distribution over time, it means to define the duration of the project and, as a

    minimum, the average size of the development team.

    In turn, these values have a direct link with objectives of the project because each project within

    their feasibility range can be performed within a short time with large number of people and for a

    long time with a small number of people.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 12

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    13/16

    PM World Today June 2010 (Vol XII, Issue VI)

    On the other hand it is also known that the development cost of projects increases with the

    reduction of project duration. This means that the solution of the problem of effort distribution

    over time is closely related to the trade-off between the duration and cost of the project.

    In its turn this means that the solution of this problem can be reduced to the analysis of the

    priorities of the project objective functions and quantitative representation of these priorities.

    A detailed discussion of this problem can be found in [4], where projects are classified in termsof project objectives. This classification as the criterion for the similarity of projects uses the

    ratio of project duration over average team sizeavN

    TR = .

    From the standpoint of project data mining the above analysis means that it is necessary with the

    aid of this criterion to divide the database into the groups of similar projects after which the

    regression techniques can be applied to separate groups of data.

    In terms of project estimation this means that for a complete presentation of the essence of the

    project, along with the complexity of the project W and productivity Pit is necessary to have

    quantitative information about the project objectives and their priorities.

    Missing input information in the modern systems of project data mining and

    project estimation

    The main result of the above analysis is that in modern systems of project data mining andproject estimation there is a lack of information on the objectives of projects and their priorities.

    This circumstance makes it impossible to obtain accurate functional dependencies between the

    parameters of the project by statistical data processing.

    Further utilization of these inaccurate functional relationships for the assessment and prediction

    of new projects results large errors in the estimation of parameters of new projects, andultimately to the failure of projects.

    The need to integrate the goals of projects and their priorities in the process of data mining is

    explained by the fact that each specific value of project parameters reflects the entire designprocess, including the direct impact of goals and priorities on these parameters. Accordingly, the

    processing of data must take into account the same considerations. In particular, processing of

    project data must take into account the considerable impacts that have project objectives and

    their priorities on the project duration.

    The same applies to the estimation of projects in the process of their planning. Utilizing inplanning systems the input information on the project complexity and team productivity only is

    not enough for the comprehensive assessment of a project. In order to estimate the duration of

    projects and the average team size there is a need for the input information on the projectobjectives and their priorities.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 13

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    14/16

    PM World Today June 2010 (Vol XII, Issue VI)

    Conclusions

    1. The accuracy of statistical methods of contemporary quantitative project management is

    unacceptably low and therefore these methods are completely unsuitable to meet the dailyneeds of industry.

    2. Statistical methodology of project data mining and analysis has a number of fundamentalweaknesses (For small data, it is unsuitable, since in this case, this approach can generate

    very random results, and, moreover, the results of this treatment are highly dependent on the

    specific data. This methodology is not suitable also for the large databases, due to thedifficulties of processing of the collected data related to their incompatibility with each

    other).

    3. The Achilles heel of statistical methods of project data mining is the strong instability ofthe results of such treatment and their dependence on specific data.

    4. Even if as a result of statistical treatment of a large project database are obtained stableresults, they also may be unsuitable for practical applications, since the stable result doesnt

    mean correct result.

    5. Very often the stable results of statistical project data mining are not able to reflect the

    reality in an adequate way; moreover, they simply might be wrong.

    6. One of the main reasons of inaccuracy of statistical methods of project data mining is thatthe replacement of the entire system of data by a single approximating curve.

    7. In order to increase the accuracy of statistical methods of project data mining it isnecessary to cover the systems of data points not with a single curve but with the families of

    curves.

    8. For that purpose the system of data points must be divided into groups by applying

    advanced methods of project similarity analysis.

    9. These methods of project similarity analysis should be based on top-down analysis of the

    project objectives and their priorities.

    10. The main shortcoming of modern methods of project data mining and project estimationis that these methods do not take into account for the project objectives and their priorities.

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 14

  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    15/16

    PM World Today June 2010 (Vol XII, Issue VI)

    11. Methods of project data mining and project estimation should have the same

    methodological framework, based on the accounting of the objectives of projects and their

    priorities.

    References

    1. Pavel Barseghyan (2010) Project Nonlinear Scaling and Transformation Methodology andTRANSCALE Tool.PM World Today May 2010 (Vol XII, Issue V). 16 pages.

    2. S. Oligny, P. Bourque, A. Abrain, B. Fournier. Exploring the Relation Between Effort

    and Duration in Software Engineering Projects.http://www.lrgl.uqam.ca/publications/pdf/536.pdf

    3. Pavel Barseghyan. (2009). Principles of Top-Down Quantitative Analysis of Projects. Part1: State Equation of Projects and Project Change Analysis. PM World Today May 2009

    (Vol XI, Issue V) http://www.pmworldtoday.net/featured_papers/2009/may/Principlesof

    Top-Down-Quantitative-Analysis-of-Projects.html

    4. Pavel Barseghyan (2009) Problems of the Mathematical Theory of Human Work

    (Principles of mathematical modeling in project management).PM World TodayAugust 2009 (Vol XI, Issue VIII).

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 15

    http://www.lrgl.uqam.ca/publications/pdf/536.pdfhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.lrgl.uqam.ca/publications/pdf/536.pdfhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.htmlhttp://www.pmworldtoday.net/featured_papers/2009/may/Principlesof%20Top-Down-Quantitative-Analysis-of-Projects.html
  • 7/30/2019 Project Data Mining and Project Estimation Top-Down Methodology With TRANSCALE Tool

    16/16

    PM World Today June 2010 (Vol XII, Issue VI)

    About the Author

    Pavel Barseghyan, PhD

    Author

    Dr. Pavel Barseghyan is a consultant in the field ofquantitative project management, project data

    mining and organizational science. He is the founderof Systemic PM, LLC, a project management

    company. Has over 40 years experience in academia, the electronics industry,the EDA industry and Project Management Research and tools development.

    During the period of 1999-2010 he was the Vice President of Research forNumetrics Management Systems. Prior to joining Numetrics, Dr. Barseghyan

    worked as an R&D manager at Infinite Technology Corp. in Texas. He was also afounder and the president of an EDA start-up company, DAN Technologies, Ltd.

    that focused on high-level chip design planning and RTL structural floor planningtechnologies. Before joining ITC, Dr. Barseghyan was head of the ElectronicDesign and CAD department at the State Engineering University of Armenia,

    focusing on development of the Theory of Massively Interconnected Systems andits applications to electronic design. During the period of 1975-1990, he was alsoa member of the University Educational Policy Commission for Electronic Design

    and CAD Direction in the Higher Education Ministry of the former USSR. Earlier inhis career he was a senior researcher in Yerevan Research and Development

    Institute of Mathematical Machines (Armenia). He is an author of ninemonographs and textbooks and more than 100 scientific articles in the area of

    quantitative project management, mathematical theory of human work,electronic design and EDA methodologies, and tools development. More than 10Ph.D. degrees have been awarded under his supervision. Dr. Barseghyan holdsan MS in Electrical Engineering (1967) and Ph.D. (1972) and Doctor of Technical

    Sciences (1990) in Computer Engineering from Yerevan Polytechnic Institute

    (Armenia). Pavel can be contacted at [email protected].

    PM World Todayis a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net Page 16

    mailto:[email protected]:[email protected]