Ensuring the Robustness and Reliability of Data-Driven ...

20
Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing Shailesh Tripathi * 1,2 , David Muhr 1 , Brunner Manuel 1 , Frank Emmert-Streib 2,3 , Herbert Jodlbauer 1 , and Matthias Dehmer 1,4,5 1 Production and Operations Management, University of Applied Sciences Upper Austria, Austria 2 Predictive Society and Data Analytics Lab, Faculty of Information Technolgy and Communication Sciences, Tampere University, FI-33101 Tampere, Finland 3 Institute of Biosciences and Medical Technology, FI-33101 Tampere, Finland 4 Department of Biomedical Computer Science and Mechatronics, UMIT-The Health and Life Science University, 6060 Hall in Tyrol, Austria 5 College of Artificial Intelligence, Nankai University, Tianjin 300350, China * corresponding author * corresponding author 1 arXiv:2007.14791v1 [cs.SE] 28 Jul 2020

Transcript of Ensuring the Robustness and Reliability of Data-Driven ...

Page 1: Ensuring the Robustness and Reliability of Data-Driven ...

Ensuring the Robustness and Reliability of Data-Driven Knowledge

Discovery Models in Production and Manufacturing

Shailesh Tripathi ∗ 1,2, David Muhr1, Brunner Manuel1, Frank Emmert-Streib2,3, HerbertJodlbauer1, and Matthias Dehmer1,4,5

1Production and Operations Management, University of Applied Sciences Upper Austria,Austria

2Predictive Society and Data Analytics Lab, Faculty of Information Technolgy andCommunication Sciences, Tampere University, FI-33101 Tampere, Finland

3Institute of Biosciences and Medical Technology, FI-33101 Tampere, Finland4Department of Biomedical Computer Science and Mechatronics, UMIT-The Health and

Life Science University, 6060 Hall in Tyrol, Austria5College of Artificial Intelligence, Nankai University, Tianjin 300350, China

*corresponding author

∗corresponding author

1

arX

iv:2

007.

1479

1v1

[cs

.SE

] 2

8 Ju

l 202

0

Page 2: Ensuring the Robustness and Reliability of Data-Driven ...

Abstract

The implementation of robust, stable, and user-centered data analytics and machine learning modelsis confronted by numerous challenges in production and manufacturing. Therefore, a systematic approachis required to develop, evaluate, and deploy such models. The data-driven knowledge discovery frame-work provides an orderly partition of the data-mining processes to ensure the practical implementation ofdata analytics and machine learning models. However, the practical application of robust industry-specificdata-driven knowledge discovery models faces multiple data– and model-development–related issues. Theseissues should be carefully addressed by allowing a flexible, customized, and industry-specific knowledgediscovery framework; in our case, this takes the form of the cross-industry standard process for data mining(CRISP-DM). This framework is designed to ensure active cooperation between different phases to ade-quately address data- and model-related issues. In this paper, we review several extensions of CRISP-DMmodels and various data-robustness– and model-robustness–related problems in machine learning, whichcurrently lacks proper cooperation between data experts and business experts because of the limitationsof data-driven knowledge discovery models.

Keywords— Machine learning model; Robustness; Industry 4.0; Smart manufacturing; Industrial production; CRISP-DM; Data Analytics Applications

1 Introduction

Since the beginning of industry 4.0 initiatives, the concept of smart manufacturing has gained considerable attentionamong researchers from academia and industry. Specifically, data-driven knowledge discovery models are now regardedas an essential pillar for smart manufacturing. The concept of intelligent manufacturing systems was initially discussedin Refs. [34, 33], where the authors discussed the prospects of systems and their complexities and emphasized thatsystems should be built to be resilient to unforeseen situations and to predict trends in real time for large amounts ofdata.

In recent years, the idea of smart manufacturing developed further with the framework of multi-agent systems(MASs). MASs are groups of independent agents that cooperate with each other and are capable of perceiving,communicating, reproducing, and working not only toward a common goal but also toward individual objectives. Anagent is composed of several modules that enable it to work effectively both individually and collectively. The actingmodule of a learning agent collects data and information (percepts) from the external world through sensors andresponds through effectors, which results in actions. The learning module and critical module of an agent react toimprove the actions and the performance standards by interacting with each other. Furthermore, the problem generatormodule enforces the exploratory efforts of the agent to develop a more appropriate world view [75, 62, 54, 117, 87].

The learning agent is a software program or an algorithm that leads to an optimal solution to a problem. Thelearning processes may be classified into three categories: (1) supervised, (2) unsupervised, and (3) reinforcementlearning. The learning module is a key driver of a learning agent and puts forward a comprehensive automated smartmanufacturing process for autonomous functioning, collaboration, cooperation, and decision making. The availabilityand potential of data from an integrated business environment allow various business objectives to be formulated,such as automated internal functioning, organizational goals, and social and economic benefits. Various businessanalytic methods, metrics, and machine learning (ML) strategies serve to analyze these business objectives. Forinstance, Gunther et al. [29] extensively reviewed big data in terms of their importance to social and economicbenefits. Furthermore, they highlight three main issues of big data in order to realize their potential and to match upwith the ground realities of business functioning (i.e., how to transform data into information to achieve higher businessgoals). The three issues considered are work-practice goals, organizational goals, and supra-organizational goals. Theavailability of big data allows us to apply various complex ML models to various manufacturing problems that couldnot be addressed previously. However, big data do not provide valuable insight into situations where the problem stemsfrom not developing sufficient data competencies or addressing organizational issues. Ross et al. [85] discussed thebig data obsession of business organizations rather than asking the straightforward question of whether big data areuseful and whether they increase the value of business-related goals. They concluded that many business applicationsmight not need big data to add value to business outcomes but instead need to address other organizational issues andbusiness rules for evidence-based decision making founded on rather small data. However, their paper does not discussmanufacturing and production but instead focuses on issues related to business management rules. Nevertheless, asimilar understanding may be obtained in many other production- and manufacturing-related instances that do notrequire big production data but instead require sufficient samples of observational or experimental data to supportrobust data analytics and predictive models.

Kusiak [59] discusses key features of smart manufacturing and considers predictive engineering as one of the essentialpillars of smart manufacturing, and Stanula et al. [98] discuss various guidelines to follow for data selection as per theCRISP-DM framework for understanding business and data in manufacturing. Wuest et al. [122] discuss the variouschallenges encountered in ML applications in manufacturing, such as data preprocessing, sustainability, selection of theappropriate ML algorithm, interpretation of results, evaluation, complexity, and high dimensionality. The challengesraised by Wuest et al. [122] require a systematic and robust implementation of each phase of the CRISP-DM framework,as per the manufacturing environment. Kusiak [58] discusses the five key points that highlight the gaps in innovation andobstruct the advancement in smart manufacturing and emphasizes that academic research must collaborate extensively

2

Page 3: Ensuring the Robustness and Reliability of Data-Driven ...

to solve complex industrial problems. The second point is about new and improved processes of data collection,processing, regular summarization, and sharing. The third point is to develop predictive models to know outcomes inadvance. The fourth point deals with developing general predictive models to capture trends and patterns in the datato overcome future challenges instead of memorizing data by feeding in large amounts. The fifth point is to connectfactories and processes that work together by using open interfaces and universal standards. Kusiak [58] also emphasizesthat smart manufacturing systems should adopt new strategies to measure useful parameters in manufacturing processesto match up with new requirements and standards. The studies discussed above emphasize the development of MLmodels and their robustness so that ML can effectively meet the new manufacturing challenges. These robustness issuesmay be attributed to faulty sensors, corrupt data, manipulated data, missing data, or data drifting, but not to the useof appropriate ML algorithms. Here we highlight a set of common reasons that are responsible for the robustness ofML models in manufacturing.

The robustness of a model is its effectiveness in accurately predicting output when tested on a new data set. Inother words, the predictive power of a robust model does not decrease when testing it on a new independent dataset (which can be a noisy data set). The underlying assumption used to build such a model is that the trainingand testing data have the same statistical characteristics. However, in a real-world production scenario, such modelsperform poorly on adversarial cases when the data is highly perturbed or corrupted [57]. Goodfellow et al. [27] discusstwo techniques for determining the robustness of ML models in adversarial cases: adversarial training and defensivedistillation. These techniques help smooth the decision surface in adversarial directions. Another approach to robustmodeling is model verification [27, 81], which evaluates the upper bound of the testing error of ML models and therebyensures that the expected error does not exceed the estimated error determined based on a wide range of adversarialcases. Model verification is a new field of ML research and certifies that a model does not perform worse in adversarialcases. Various other ML approaches are still not widely applied; for example, some studies focus on deep learningmodels and support vector machines for image analysis.

In a production and manufacturing context, a robust ML model should not only perform well with training andtesting data but should also contribute to effective production planning and control strategies (PPCSs), demandforecasting, optimizing shop floor operations and quality improvement operations. Jodlbauer et al. [44] provided astochastic analysis to assess the robustness and stability of PPCSs (i.e., the optimal parameterization of PPCS variablesagainst environmental changes such as the variation of the demand pattern, machine breakdowns, and several otherproduction-related issues). The robustness of PPCSs is the flexibility of adjusting PPCS parameters in a changingexternal environment without incurring additional cost, and the sensitivity of PPCS parameters are described as thestability. Robust ML models can contribute to extracting deep insights from the data of several production processesand help to address various production and manufacturing goals, such as

– cost-effective production planning;

– improved production quality;

– production in time;

– long-term sustainability of production processes;

– effective risk management;

– real-time production monitoring;

– energy efficient production [23], which is another concern for green and environmentally friendly sustainableproduction.

To develop robust ML models, we need to deal with various data and modeling issues, some of which are interrelatedwith each other. For this reason, we split these problems conceptually into two key points: (I) model selection anddeployment-related issues and (II) data-related issues. The model selection and deployment-related issues include

• model accuracy;

• model multiplicity;

• model interpretability;

• model transparency.

Data-related issues concern model robustness, which arises at different phases of the CRISP-DM framework, including

• experimental design and sample size;

• model complexity;

• class imbalance;

• data dimensionality;

• data heteroscedasticity;

• incomplete data in terms of variability;

• unaccounted external effects;

• concept drift (when the data change over time);

• data labeling;

3

Page 4: Ensuring the Robustness and Reliability of Data-Driven ...

• feature engineering and selection.

In this paper, we address the model robustness issues of ML models for data from manufacturing and productionwithin the CRISP-DM framework. Specifically, data-driven approaches for knowledge discovery under the CRISP-DMframework need data analytics and ML models to apply, extract, analyze, and predict useful information from the datafor knowledge discovery and higher-level learning for various tasks in production processes.

Note that data acquisition for model building and deployment is not straightforward; it requires active cooperationbetween domain experts and data scientists to exchange relevant information so that any need at the time of datapreparation and data modeling can be addressed. Usually, the initial step is the business-understanding phase, whichsets business goals, puts forward an execution plan, and assesses the resources. Domain experts and higher-levelmanagement mostly set these objectives for the models, whereas data scientists are involved in executing data-relatedgoals and providing relevant data information in terms of scope, quality, and business planning. Different phases ofCRISP-DM need active interactions between domain experts and data experts to obtain a realistic implementation ofa data-driven knowledge discovery model. By “active interactions,” we mean flexible cooperation between data expertsand business experts, which allows the whole CRISP-DM process to work dynamically and updates the different phasesof CRISP-DM. Such active interactions are allowed to develop new phases and interdependencies between differentphases, thus ensuring a factory-specific data-driven knowledge discovery model. The data-driven discovery models aremost likely to encounter problems if a lack of active cooperation exists between groups. For example, in the data-understanding phase, data experts (i.e., data engineers and data scientists) should engage in active interactions, whichhelps to comprehensively understand the complexity of the problem for executing data mining and ML projects. Dataengineers should ensure data quality, data acquisition, and data preprocessing. However, data scientists are needed toevaluate the model-building requirements and to select appropriate candidate models for achieving the project goals.Model evaluation and interpretation requires an equal contribution from domain experts and data scientists. Overall,this describes the CRISP-DM framework; see Figure 1. The CRISP-DM framework provides a reasonable division ofresources and responsibilities to implement knowledge discovery through data mining and ML. The data-driven natureof CRISP-DM ultimately maintains production quality, maintenance, and decision making.

Before discussing the fundamental issues of model robustness, we discuss in more detail the CRISP-DM frameworkand model-robustness measures.

2 Data Mining Process Models, Cross-Industry Standard Processfor Data Mining

The CRISP-DM framework was introduced in 1996. Its goal is to provide a systematic and general approach forapplying data mining concepts to analyze business operations and gain in-depth insights into business processes [92]. Itis a widely accepted framework for industrial data mining and data analytics for data-driven knowledge discovery. TheCRISP-DM framework is an endeavor to provide a general framework that is independent of any given industry andof applications that execute data mining methods that look at different stages of a business [92]. Briefly, CRISP-DMdivides data-mining-related knowledge discovery processes into six phases: business understanding, data understanding,data preparation, modeling, evaluation, and deployment. Figure 1 shows a schematic overview of the CRISP-DMframework. The arrows in the figure show the dependencies of different phases. For example, the modeling phasedepends on data preparation and vice versa. When the data preparation phase is complete, the next step is themodeling phase, for which different candidate models with different assumptions about the data are proposed. Tomeet the data requirements for model building, data experts might step back to the data preparation phase and applyvarious data-preprocessing methods. Similarly, in another example, the data-acquisition and data-preprocessing stepsare done by data experts so that business goals can be supplied with the available data. In such cases, data quality,data size, and expected features based on prior business understanding should be present in the data. Business expertsand data experts can agree to obtain the data through a specific experimental design or by acquiring new observationaldata.

Furthermore, business goals need to be readjusted based on an understanding acquired during the data preparationand data understanding phases. The readjustment of business goals may imply the required time, cost, data quality,and the usefulness of the business plan. Once the data experts and business experts agree on data issues, the dataanalysis process moves forward to model development and evaluation; similarly, these phases require cooperationbetween different groups. Overall, CRISP-DM describes a cyclic process, which has recently been highlighted as theemergent feature of data science [19]. In contrast, ML or statistics focus traditionally on a single method for theanalysis of data. Thus, the CRISP-DM framework follows a data science approach.

A recent example of such an approach is the Data mining methodology for engineering applications (DMME) [42]model, which is an extension of the CRISP-DM model for engineering applications. This model adds three new phases,namely, technical understanding, technical realization (which sits between business understanding and data under-standing), and technical implementation (which sits between the evaluation and deployment phases). The DMMEmodel also draws new interactions between different phases, namely, between evaluation and technical understanding,between data understanding and technical understanding, and between technical understanding and technical realiza-tion. Furthermore, this model also emphasizes the cooperation between specialized goals for the refinement of theframework.

4

Page 5: Ensuring the Robustness and Reliability of Data-Driven ...

BusinessUnderstanding

Model Evaluation

Data UnderstandingData Preparation

ModelingBusiness objectives,and planing

Data acquisitionData preprocessingExplorationDescriptive analysis

ML Model implementationFeature engineeringFeature selectionEnsemble learning

Model accuracy,interpretability andtransparencyModel selection

Deployment

Figure 1: The CRISP-DM framework describes a cyclic process of a data analysis project.

CRISP- DM

Industry specific extensionGeneral extension

Example: CRISP-DM extension for stream analytics for

Healthcare applications [45] DMME: Engineering Applications [42]

Example: CASP-DM Context-Aware Standard Process for

data Mining [71] CRISP-DM extension for circular Economy [55]

Figure 2: Classification of CRISP-DM framework into two classes: general extension and industry-specificextension.

In light of big data, technical advancements, broader business objectives, and advanced data science models, Grady[28] discussed the need for a new, improved framework.

He coined the term “knowledge discovery in data science” (KDDS) and discussed different aspects of KDDS, suchas KDDS phases, KDDS process models, and KDDS activities. Grady also emphasized that this approach establishesa new, improved framework, which he calls the cross-industry standard process model for data science (CRISP-DS).Table 1 lists the extensions for the CRISP-DM model. These models show that the basic CRISP-DM framework cannotsatisfy the variety of data mining requirements from different sectors.

We now highlight two types of CRISP-DM data mining models: the industry- or application-specific extension, andthe concept-based extension (see Figure 2). For the circular economy, KDD and CRISP-DM are the concept-basedevolution of the CRISP-DM model. However, DMME and CRISP-DM for stream analytics are application-specificcustomizations of the basic model. Given the new industrial trends, concept-based evolution has become a generalizedextension of CRISP-DM. For example, the circular economy is a growing trend in the industry for lifelong services,product upgrades, and product recycling [97] for sustainable and environmentally conscious manufacturing. All businesssectors should gradually adopt the circular economy concept. Therefore, a knowledge discovery model must adoptconcept-related extensions of the CRISP-DM framework.

To implement a robust and flexible DM framework in a company, one needs to elaborate on the different phases ofCRISP-DM and its extended structures because new technologies and emerging trends in production and manufacturingdemand advanced and flexible data mining and knowledge discovery models.

5

Page 6: Ensuring the Robustness and Reliability of Data-Driven ...

Table 1: List of extensions of CRISP-DM process model framework based on industry-specific requirements(application-specific requirements) due to changing business trends.

Model Description ApplicationDMME [42] Adds two new phases between business un-

derstanding and data understanding andone phase between model evaluation anddeployment

Engineering applications

KDDS with big data [28] Adding various new activities, especially tohandle big data

A proposed framework asa need for the current sce-nario in big data and datascience applications

CRISP-DM extension forstream analytics [45]

Data preparation and data modeling stageto be redefined for multidimensional andtime-variant data, where the IoT systemsends multiple signals over time

Healthcare application.

CRISP-DM extension incontext of circular econ-omy [56]

Adds a data validation phase and new in-teractions between different phases

Aligning business analyt-ics with the circular econ-omy goals

Context-aware standardprocess for data mining(CASP-DM) [72]

The deployment context of the model candiffer from the training context. Therefore,for context-aware ML models and modelevaluation, new activities and tasks areadded at different phases of CRISP-DM.

Robust and systematicreuse of data transforma-tion and ML models if thecontext changes

APREP-DM [76] Extended framework for handling outliers,missing data, and data preprocessing atthe data-understanding phase

General extension for au-tomated preprocessing ofsensor data

QM-CRISP-DM [90] CRISP-DM extension for the purpose ofquality management considering DMIACcycle

Adding quality manage-ment tools in each phaseof CRISP-DM frameworkvalidated in the real-worldelectronic production pro-cess

ASUM-DM [43] IBM-SPSS initiative for the practical im-plementation of the CRISP-DM frame-work, which combines traditional and ag-ile principals. It implements existingCRSIP-DM phases and adds additionaloperational, deployment, and project-management phases.

General framework thatallows comprehensive,scalable, and product-specific implementation.

A variability-aware designapproach for CRISP-DM[111]

Extends the structural framework to cap-ture the variability of data modeling andanalysis phase as feature models for moreflexible implementation of data processmodels

General framework whichconsiders model and datavariability for the im-proved automation of dataanalysis

3 Human-Centered Data Science

The new wave of artificial intelligence (AI) and ML solutions aims to replace numerous human-related tasks anddecision-making processes by automated methods. This raises concerns regarding responsible and ethical ML and AImodels that do not underestimate human interests and do not incorporate social and cultural biases. The intendedobjective of AI is to help humans enhance their understanding of larger system-related goals (e.g., societal goals).The ML systems have, until now, primarily focused on applying cost-effective automations that are useful to businessorganizations for maximizing profits. Such automations are not part of complex decision-making processes. However,in the future, autonomous systems would take over for many crucial decision-making processes. One example of such anautomation is self-driving cars. Many large car manufacturing companies have announced the launch of self-driving cars,but the acceptability and use of such vehicles is a major challenge [39, 110]. Kaur and Rampersad [47] quantitativelyanalyze the acceptance of technology that addresses human concerns. The study analyzes the public response to theissues, which are security, reliability, performance expectancy, trust or safety, and adoption scenarios.

6

Page 7: Ensuring the Robustness and Reliability of Data-Driven ...

HumanAI/ Machine

learning agent

Contextual understanding Social and cultural biases Human trust on machine Ethics awareness Responsibility towards the task

undertaken

Rational and data oriented outcome Data and model related biases Not responsible for the prediction results Performance on adversarial cases

Ethical and unbiased data, analytics andmachine learning models

Empowering human decision making ability, Enhancing understanding, reducing biases,Assisting for causal understanding

Data experts

Busienssexperts

CRISP- DM UsersDeployed

model

Model utility Model Transparency Model Performance

on Adversarial cases Incremental learning Model Security

Users experience Users evaluation on

models‘ response

Assisting for business understanding,Interpreting context (business objectives)

Improvement on model development process, Model evaluation and recalibration,Contextual learning,Model reusability

(A)

(B)

Adaptability and customization,Understanding users response,Managing work related goals,Users Trust

Figure 3: (A) Schematic view of interactions between humans and AI and ML agents. (B) Schematic view ofpost-deployment of ML model not left in isolation but continuously updating and addressing human concernsby allowing human interactions to ensure a human-centered data science.

Similarly, such hypotheses can be discussed in industrial production and manufacturing cases when a self-automatedsystem or a self-functioning machine based on AI or ML systems decides various production and manufacturing tasks.

In these cases, how a system would function in complex scenarios would be a question, both ethical and technical.This type of autonomy can affect the complex production process of PPCSs estimation, which might, in adversarial

cases, influence the robustness and stability of PPCSs, predictive maintenance, and other production processes [22].Another futuristic question one can ask is, what should be done with an automated system when it is deployed? Shouldit be left with no human intervention, should it always supersede human understanding, and how can humans and AIsystems cooperate effectively?

In industrial production cases, human-centered data science issues can be seen from the following human-centricperspective:

– what is the role of human-centered ML and data science processes in decision making related to work specificgoals, business goals, and societal goals?

– can production processes be completely automated with no involvement of humans by AI agents with human-likeintelligence?

– the emergence of complexity when a series of tasks is automated and integrated: can an AI or ML system exhibitthe higher level of intelligence required for independent and integrated decision making?

– when industrial production processes are completed by a combination of humans and ML agents, how can humansand ML agents interact effectively for integrated analysis and decision making?

7

Page 8: Ensuring the Robustness and Reliability of Data-Driven ...

The first point mentioned here is still in its infancy and awaits the future of integrating complex data mining andML processes. The future design of data science modeling of complex data and business goals is beyond the scope ofthis paper. However, a fundamental question that remains relevant is how can large-scale automation led by AI and MLmodels impact humans? At its core, this question probes how AI can efficiently serve, empower, and facilitate humanunderstanding and decision-making tasks and provide deeper insight into the solutions to these complex problems.This question also leads to the second and third points; namely, should we leave the decision making entirely in thehands of AI-ML models? In other words, can a machine decide by itself what is right or wrong for humans? Completeautomation would lead to the emergence of many complex aspects of the production process, and dealing with suchcomplexity would be the future challenge of research into production and manufacturing. In such complex cases, therole of humans should be to interact and collaborate with the automated systems. They should use intelligent modelsfor decision making and have the wisdom to override machine decisions in exceptional cases.

The human perspective and machine and statistical learning have different characteristics and capabilities. A MLmodel makes decisions based on a narrow data-related view with no contextual understanding. The AI-ML modelscannot be held responsible for their decisions. However, they can produce results rationally and logically based on thedata used to train them. Interactions between humans and machines are possible in two ways: The first way involvesthe model developer and ML models, where the training of models are the responsibility of data experts and businessexperts. They should train models with a diverse range of data with no bias and no breach of data ethics. The secondway is between model users and ML models. The results predicted by models empower and assist humans to makedecisions regarding simple to complex tasks. Figure 3(A) shows a schematic view of human-machine interactions.In this figure, we highlight various characteristics of human and ML models and how effective interactions betweenhumans and ML models complement and empower human decision making and further improves the capacity of modelsto make fair analyses.

Further extending the human-machine interaction concept, we now discuss the industrial-production frameworkshown in Figure 3(B), which highlights a CRISP-DM process in the deployment phase of the model. The deployedmodel should have proven its utility and allow transparency so that the user can understand the models’ prediction andtrust the results. The deployed model interacts with data experts, business experts, and actual users. In an industrialproduction process, the users are technical operators or machine handlers. These interactions have different meaningsfor different users. Data experts interact with the model to improve the prediction accuracy and model performance.They provide contextual meaning to the results predicted by the model and train it further for contextual learning.The data experts also explore the reusability of a model for other cases [79]. Business experts should test these resultsagainst their understanding of larger business goals and should explain the business context based on model results.Furthermore, the results should allow the model to be reused and adaptable to changes in production and manufacturingscenarios.

In production and manufacturing, data-driven knowledge processes must adopt human-centered concerns in dataanalytics and ML models; the human-machine interaction must not fizzle out after the deployment of the model [2].The model should remain interactive with its users and be allowed to evolve and update itself. The other characteristicsof the deployed models are their transparency and performance in adversarial cases. Model transparency should enableusers to access various model features and evaluate the predicted results. Additionally, the human-machine interactionmust be aware of the surrounding environment so that the model can adapt to environmental changes, as required forintelligent learning of ML models. Continuous interactions between humans and machines improve the generality ofthe model by incremental learning and update the model so that it gives a robust and stable response and can adaptto changes.

4 Model Assessment

Model building and evaluation in CRISP-DM are two phases in which data are (i) searched for patterns and (ii) analyzed.These models are divided into four classes: descriptive, diagnostic, predictive, and prescriptive [5]. The primarygoal of these models is pattern recognition, machine-health management, condition-based maintenance, predictivemaintenance, production scheduling, life-cycle optimization, and supply-chain management prediction. Alberto et al.[15] provide a comprehensive review of various methods applied in manufacturing and production with representativestudies of descriptive, predictive, and prescriptive models of various industrial sectors, including manufacturing. Vogl etal. [115] review current practices for diagnostic and prognostic analysis in manufacturing and highlight the challengesfaced by the prognostic health management system in the current scenario of automated maintenance.

Figure 4(A) shows a schematic view of the division of data analysis problems. The divisions are useful tounderstand the nature of the problem and the methods that are applicable to the problem.

Leek [63] emphasizes that, even if the statistical results are correct, data analysis can be wrong because the wrongquestions are asked. One must be aware of such a situation in an industrial framework and should select the rightapproach for the analysis. He further divides data analysis into six categories (see Figure 5). This chart is usefulfor the basic understanding of data analysis for business experts and data experts, which helps prevent knowledgediscovery from diverging from the real objective.

The business understanding and the later data modeling approaches for solving the type of problem shown inFigure 4A requires a systematic interaction between data experts and business experts. Such communication ispossible only when the data experts and business experts are aware of each other and agree with each other. Therefore,

8

Page 9: Ensuring the Robustness and Reliability of Data-Driven ...

Data Analytics Applications

Descriptive Analysis Diagnostic Analysis Predictive Analysis Prescreptive Anslysis

Purpose: „What is happening“Methods: Data summarization Data clustering Correlation analysis

Purpose: „Why it is happening“Methods: Anomoly Detection Statistical hypothesis testing

Purpose: „What will happen“Methods: Regression analysis Forcasting methods Machine learning models

Purpose: „What should bedone“Methods:Problem formulation andOptimization using real orsimulated data

Business Experts Data Experts

Model Assessment

Model AccuracyModel Selection

(Multiplicity)Model

InterpretabilityModel Transparency

Data Principals: allow flexible and well-organized communication between Data and business experts

(A)

(B)

Figure 4: (A) Components of data analytics problems. (B) Model assessment.

9

Page 10: Ensuring the Robustness and Reliability of Data-Driven ...

Data Analysis Flow

Descriptve Analysis

Summarize Data?

Interpreted summary?Yes

NoNot a data anlysis

No

Did you quantify how likely the findings are to support for new samples.

Exploratory Analysis

Are you trying to figure out how changing the average of one measurement affects another

Yes

Yes

No

No No

Are you trying to predict measurement(s) for individuals

Yes

Inferential Analysis

Predictive AnalysisYes

Is the effect you looking for an average effect or a deterministic effect

Average

DeterministicMechanistic Analysis

Causal Analysis

Figure 5: Schematic diagram showing data analytics process. Connections are shown between the individualanalysis steps that constitute the whole project [63].

10

Page 11: Ensuring the Robustness and Reliability of Data-Driven ...

they need to address the data- and model-related communication in the context of business understanding. Hickset al. [37] discuss the vocabulary required to convey the accurate meaning of the data analysis. Additionally, Theydescribe the six principles of data analysis: Data matching, Exhaustive, Skeptical, Second order, Transparent, andReproducible. These six principles are applicable to an industrial framework to communicate the results of the dataanalysis. A clear understanding of these principles helps business experts and data experts develop robust models fordata-driven knowledge discovery. Data- and model-related issues are not restricted only to understanding business anddata; the most crucial part is model assessment, which requires accurate nonambiguous model selection, interpretability,and transparency. Model assessment depends on various issues, namely, model accuracy, model multiplicity, modelinterpretability, and model transparency (see schematic diagram in Figure 4B), and various strategies can be appliedto obtain the best model assessment. Model assessment requires an agreement between data and business experts,so all four components of model assessment shown in Figure 4B should be addressed properly. The first part is themodel accuracy; Raschka [82] reviews model evaluation, selection, and ML model selection for various ML methods,and Palacio-Nino and Berzal [78] review the evaluation of unsupervised learning methods. Akaike information criterion(AIC), Bayesian information criterion (BIC), cross-validation error, and probabilistic model selection are some of theapproaches used to select less complex models. Model interpretation is another crucial issue in data-driven knowledgediscovery with ML models.

In general, monotonically increasing functions (linear and nonlinear), such as additive models, sparse linear models,and decision trees, are regarded as interpretable models, whereas nonlinear and nonmonotonic functions have highercomplexity and thus a lower interpretability [31, 83]. Doshi-Velez [16] provides a theoretical framework for the inter-pretability of models by defining three classes for evaluating interpretability: The first class is application-groundedevaluation, where the domain experts evaluate the model with real tasks. Next, human-grounded evaluation allows amodel to be tested with simplified tasks without a specific end-goal. The third class is functionally grounded evaluation,which uses a predefined definition of interpretability of a model as proxy and optimizes the candidate models. In anindustrial setup, most of the model evaluations involve functionally grounded evaluation. Application- and human-grounded evaluation requires more resources and could be expensive and time consuming. Functionally groundedevaluation can be useful for learning if the selection of proxy models is based on factors that are relevant to real-worldapplications.

Model agnostic interpretability is another approach for interpreting ML models. In this approach, a model is treatedas a black box, and the model is analyzed by entering perturbed input data and analyzing the respective output [83].The model agnostic approach allows the application of a wide range of black-box ML models as interpretable forpredictive analysis. Local interpretable model-agnostic explanations [84] explain a model locally by constructing alocal binary classifier of a regularized linear model by using simulated or perturbed input data to explain an instancepredicted by the black-box machine learning model.

Another issue with model assessment is the multiplicity of models; such cases are known as the “Rashomon” effect[6], whereby multiple predictive models make predictions with competing accuracy of the same target variable. In suchcases, one should not come to a conclusion about the validity of a prediction in terms of its explanatory variables fromone or a few models until the hypothesis of model multiplicity is invalidated. In industrial scenarios, the Rashomoneffect can occur when we have multiple models with competing accuracy; one can then choose a narrative to interpretthe results based on the selected model. Such interpretations can differ for different models and would impact thebusiness objective or business understanding. Fisher [21] proposes the model class reliance, which is a measure of a setof variables of a model that show high predictive accuracy. Marx et al. [73] propose the ambiguity and discrepancymeasures to evaluate the multiplicity problem in classification models. In this approach, models are selected from a setof “good” models, which maximizes the discrepancy compared to a proposed baseline model. To select a model withsimplicity and accuracy, Semenova and Rudin [91] propose the Rashomon ratio measure, which is a ratio of competingmodels in hypothesis space.

In a business scenario, when manufacturing becomes increasingly complex and integrated, data-driven models areuseful to make informed decisions and automate functioning, which allows for efficient and cost-effective production.Even for a model with high accuracy, user trust is always a crucial consideration for the deployment of a model. Insuch cases, transparency is always a matter of concern and should be addressed with simple explanations. Trust issuesarise because of data-source uncertainty, the performance of deployed models, and user belief in the model output.This uncertainty can seep into the whole process of data mining, from data preprocessing to model building, and thusimpacts the entire business-intelligence process. Sacha et al. [88] provide a framework to address human-machinetrust issues that are caused by uncertainties propagated in data processing, model building, and visual analytics.The proposed framework to deal with the uncertainty is divided into two parts: The first part is the system-relateduncertainty and emphasizes handling machine uncertainty by quantifying uncertainty in data processing and modelbuilding, aggregation of total uncertainty at different phases of data analysis, and interactive exploration of uncertaintythrough visual and other means. The second part involves human factors that emphasize transparency, which allowssystem functions to access experts and users for review and to explore uncertainty, thus building an awareness of theemerging uncertainty—and additionally tracking human behavior, which is also useful to estimate human bias and howit impacts human-machine interactions.

Data transparency and model transparency should ensure that the entire process is explainable to business expertsand users. It should also be secure from external attacks, which allows models to be deployed for making businessdecisions, for user control, and for social acceptance of intelligent systems [96, 119]. Weller [119] discusses eight types oftransparency rules for developers, business experts, and users to help them understand the existing system functioning.

11

Page 12: Ensuring the Robustness and Reliability of Data-Driven ...

Transparency allows a large and complex system to be robust and resilient and self-correcting, and ultimately ensuressteady and flexible models.

5 Data-related robustness issues of machine-learning models

A combination of data understanding, business understanding, and data preparation are the steps that contributeto model building. A comprehensive CRISP-DM setup should address various business-related issues and ultimatelycontribute to business understanding. Data modeling and evaluation are central units in a CRISP-DM framework. Inthe following, we discuss data issues for robust data analytics and ML models. The shortcomings of data, such as samplesize, data dimensionality, data processing, data redundancy, feature engineering, and several other data problems,should be addressed with the appropriate strategy for robust ML models. Several data-related issues are interdependentwith data understanding, such as the preparation and modeling phases of the CRISP-DM framework. In such cases,any data-related issues reflect on model evaluation and on the deployment phase in terms of underperformance andbiased predictions of real-world problems.

5.1 Experimental design and sample size

In data-driven process optimization, the data may be observational data or experimental data. Experimental data areused to test the underlying hypothesis to build an understanding and to optimize the response variable by controllingor modifying various combinations of input parameters. In manufacturing, the experimental designs mostly focus onoptimizing various parameters for quality control. The experimental design is a key criterion for determining whethera model employs all the right answers to be understood in ML-based analysis and manufacturing optimization. Theproduction process is controlled by setting various process variables, which, during production, are monitored forchanges. In an experimental design, we monitor certain process variables by controlling other variables to understandhow they affect product quality. The one-factor approach is common and involves repeating experiments with thefactors changed between each experiment until the effect of all factors is recorded. However, this one-factor approach istime consuming and expensive. For a robust experimental design, Taguchi proposed a methodology that uses orthogonalarrays, signal-to-noise ratios, and the appropriate combination of factors to study the entire parameter space througha limited number of experiments [104, 103, 109, 12]. An optimal experimental design is a useful strategy that standsbetween business and data understanding. This approach also benefits model robustness in the evaluation phase inthe CRISP-DM framework. The active interaction between business understanding and data understanding serves tosolve the optimal sample-size issue and prevent an underpowered analysis.

5.2 Model complexity

The complexity of predictive models is sensitive to the prediction of testing data. Any complex model can lead tooverfitting, whereas simpler models lead to prediction bias. This property of predictive models is known as “bias-variance trade-off.” Complexity in the model could lead to robustness issues, such as large testing error or the wronginterpretation of model parameters. In some cases, if future data contain a lower signal-to-noise ratio, large predictionerror may occur. This problem can arise because of a large number of correlated features or through feature engineering,where we create a large number of redundant features with high correlation or when we try to fit a high-degreepolynomial model in our data set.

Vapnik–Chervonenkis (VC) dimension [113] is a quantitative measure for measuring the complexity of models andis used to estimate the testing error in terms of model complexity. A model is selected based on the best performancewith testing data and, as the complexity increases, the optimized training error decreases and the expected testingerror first increases, then decreases, and then increases again. The training error and VC are used to calculate theupper bound of the testing error. This method is called “structural risk minimization” and is a model that minimizesthe upper bounds for the risk. Regularization [12, 107, 125, 20] and feature-selection methods (both wrapper- andfilter-based) [51, 69] are the useful strategies of variable selection to keep model complexity in check by minimizing biasand variance. Hoffmann et al. [38] propose a sparse partial robust M regression method, which is a useful approachfor regression models to remove variables that are important due to the outlier behavior in the data. The modelcomplexity affects the modeling and model evaluation, such as model description, robustness, and parameter settingsin the CRISP-DM framework.

5.3 Class imbalance

In production and manufacturing data, class imbalance is a common phenomenon. One such example is vibration datafrom cold testing of engines: all manufactured engines go through cold testing, and the technical expert tries to identifydifferent errors in the engines by using strict threshold criteria. They label any errors found with a particular errorlabel based on their technical knowledge. The proportion of engines with errors and the number of good engines arevery low. To implement a multiclass classification model using previous data, which automatically identifies the errorclass of an engine, developing such models with high accuracy is difficult. Skewed sample distributions are a commonproblem in industrial production-related quality control, and imbalanced data lead to models where predictions areinclined toward the major class. In such cases, the major class is “good engines.” If a model classifies a faulty

12

Page 13: Ensuring the Robustness and Reliability of Data-Driven ...

engine as a good engine (high false positives), then it would severely affect the reputation of quality control. In suchcases, false positives could severely affect product reputation. Undersampling of major classes and oversampling ofminor classes form the basis of the solution commonly proposed for the problem. References [105, 17, 126] discussfault detection models with an imbalanced training dataset in the industrial and manufacturing sector. Resamplingmethods (oversampling, undersampling, and hybrid sampling) [13, 32, 77, 102, 10], feature selection and extraction,cost-sensitive learning, and ensemble methods [30, 55] are other approaches to deal with the class-imbalance problem.The use of evaluation measures considering the presence of undersampled classes is also recommended to evaluatemodels, such as the probabilistic thresholding model [100], adjusted F measure [70], Matthews correlation coefficient[74], and AUC/ROC. The imbalance is critical to model building and evaluation problems in the CRISP-DM framework.

5.4 Data dimensionality

In ML and data mining, high-dimensional data contain a large number of features (dimensions), so the requirementof optimal samples increases exponentially. Therefore, a high-dimensional dataset is sparse (p � n). In modernmanufacturing, the complexity of the manufacturing processes is continuously increasing, which also increases thedimensionality of process variables that need to be monitored. Wuest et al. [122] discuss manufacturing efficiency andproduct quality by considering the complexities and high dimensionality of manufacturing programs. High-dimensionaldata are a challenge for model robustness, outlier detection [128], time, and algorithmic complexity in ML. Thesehigher-dimensional data require changes in existing methods or the development of new methods for such sparse data.One approach is to use a dimensionality-reduction strategy such as a principal component analysis [101] and use theprincipal components for ML methods. Such a strategy also requires a careful examination because of informationloss when data are projected onto lower dimensions. Similarly, another strategy is feature selection, where the mostimportant feature variables are found and we remove the redundant features from the model. Regularization [107],tree-based models [40, 48, 4], mutual information [114, 3], and clustering are other approaches for feature selection.

5.5 Data heteroscedasticity

In regression problems, the underlying assumption y for the error term in the response data is that it has constantvariance (homoscedasticity). Although such assumptions are useful to simplify models, real-world data do not followthe assumption of the constant variance error term and thus violate the assumption of homoscedasticity. Therefore,real data are heteroscedastic. Predictions based on a simple linear model of such data would still be consistent butcan lead to practical implications that produce unreliable estimates of the standard error of coefficient parameters,thus leading to bias in test statistics and confidence intervals. Heteroscedasticity can be caused directly or indirectlyby effects such as changes in the scale of observed data, structural shifts in the data and outliers, or the omission ofexplanatory variables. Heteroscedasticity can be parametric, which means that it can be described as a function ofexplanatory variables [120], or unspecified, meaning that it cannot be described by explanatory variables. Examplesin manufacturing are related to predictive maintenance (i.e., predicting the remaining useful life of a machine basedon vibration data), quality control, and optimization [61, 106]. Heteroscedasticity in regression models is identified byanalyzing residual errors. Residual plots and several statistical tests may be used to test for heteroscedasticity in data[26, 120, 7, 71]. Other studies and methods to model the variance of response include weighted least squares [120],the heteroscedastic kernel [11], heteroscedastic Gaussian process regression [49, 53], heteroscedastic Bayesian additiveregression trees (BART) [80], and heteroscedastic support vector machine regression [41].

5.6 Incomplete and missing data

Data quality is an essential part of data-driven process optimization, process quality improvement, and smart decisionmaking in production planning and control. In the manufacturing process, data come from various sources and isheterogeneous in terms of variety and dimensionality. Data quality is one of the important issues for the implementationof robust ML methods in process optimization. Data quality may suffer for various reasons such as missing values,outliers, incomplete data, inconsistent data, noisy data, and unaccounted data [60]. The typical way to deal with missingdata is to delete them or replace them with the average value or most frequent value and apply multiple imputations[89] and expectation maximization [1]. Another method is to consider the probability distribution of missing values instatistical modeling [86, 65]. These considerations are of three types: (1) missing completely at random, (2) missingat random, and (3) missing not at random. Joint modeling and fully conditional specifications are two approachesto multivariate data imputation [8, 112]. Recent studies to impute missing data in production and manufacturingand optimize production quality considered classification and regression trees to impute manufacturing data [121],modified fully conditional specifications [108], and multiple prediction models of a missing data set to estimate productquality [46]. Loukopoulos et al. [66] studied the application of different imputation methods to industrial centrifugalcompressor data and compared several methods. Their results suggest multiple imputations with self-organizing mapsand k nearest neighbors as the best approach to follow. Incomplete and missing data have repercussions in businessand data understanding, exploratory analysis, variable selection, model building, and evaluation. A careful strategyinvolving imputation or deletion should be adopted for missing data.

13

Page 14: Ensuring the Robustness and Reliability of Data-Driven ...

5.7 External effects

External effects are important factors in data-driven process optimization. In a big data framework, external effectsmight be unnoticed, incorrectly recorded, or not studied in detail (i.e., unknown). External effects can involve en-vironmental factors (e.g., temperature, humidity), human resources, machine health, running time, material qualityvariation, and other factors that are dealt with subjectively but significantly affect production and data quality. Forexample, to create a ML model to predict product quality, the data (process parameters) are recorded by sensorsfrom several machines. In the injection molding process, the process parameters cannot estimate product quality alonebecause product quality is significantly affected by temperature, humidity, machine health, and material type, andthese external factors must be considered to optimize quality. Similarly, in the sintering process [99], where productquality is predicted in terms of its shape accuracy, the data from various experiments show a considerable variation indifferent process parameters from unknown and unaccounted sources. In such cases, the prediction can be biased orerroneous with high variance. A careful examination involving exploratory analysis is required to detect such effects.Gao et al. [25] studied a hierarchical analytic process to produce a comprehensive quality evaluation system of largepiston compressors that considers external and intrinsic effects.

5.8 Concept drift

Changes in data over time due to various factors such as external effects, environmental changes, technological advances,and other various reasons emerge from unknown real-world phenomena. ML models, in general, are trained with staticdata and thus do not consider the dynamic nature of the data, such as changes over time in the underlying distribution.In such case, the performance of these models deteriorates over time. Robust ML modeling requires identifying changesin data over time, separating noise from concept drift, adapting changes, and updating the model. Webb et al. [118]provide a hierarchical classification of concept drift that includes five main categories: drift subject, drift frequency,drift transition, drift recurrence, and drift magnitude. They make the case that, rather than defining concept driftqualitatively, a quantitative description of concept drift is required to understand problems in order to detect andaddress time-dependent distributions. These quantitative measures are cycle duration, drift rate, magnitude, frequency,duration, and path length and are required for a detailed understanding of concept drift. Such an understanding allowsmodeling by new ML methods that are robust against various types of drifts because they can update themselves byactive online learning and thereby detect drifts. Lu et al. [68] provide a detailed review of current research, methods,and techniques to deal with problems related to concept drift. One key point in that paper is that most of the methodsidentify “when” a drift happens in the data but cannot answer “where” or “how” it happened. They further emphasizethat research into concept drift should consider the following themes:

(1) identifying drift severity and regions to better adapt to concept drift;

(2) techniques for unsupervised and semisupervised drift detection and adaptation;

(3) a framework for selecting real-world data to evaluate ML methods that handle concept drift;

(4) integrating ML methods with concept-drift methodologies (Gama et al. [24] discuss the various evaluation andassessment practices for concept-drift methods and strategies to rapidly and efficiently detect concept drift).

Cheng et al. [64] present the ensemble learning algorithm for condition-based monitoring with concept drift andimbalance data for offline classifiers. Yang et al. [124] discuss a novel online sequential extreme learning model todetect various types of drifts in data. Finally, Wang and Abraham [36] proposed a concept-drift detection frameworkknown as “linear four rates” that is applicable for batch and streaming data and also deals effectively with imbalanceddatasets.

In the manufacturing environment, concept drift is an expected phenomenon. When discussing with technicalexperts, data drift should be understood and explained clearly by experts for data and business understanding toensure that future models are flexible and adaptable to changes due to data drift.

5.9 Data labeling

Data annotations (i.e., labels) as output allow ML methods to learn functions from the data. Specifically, input dataare modeled through class labels by using supervised ML models for classification problems. If the labeling is noisy ornot appropriately assigned, the predictive performance of the classification model is severely degraded. Therefore, forsupervised ML models, data labeling or annotation is a nontrivial issue for data quality, model building, and evaluation.The labeling can either be done manually by crowd sourcing or by using semisupervised learning, transfer learning,active learning, or probabilistic program induction [127]. Sheng et al. [93] discuss repeated-labeling techniques to labelnoisy data. In a typical manufacturing environment, labeling is done by operators or technical experts; in some cases,the same type of data is annotated by multiple operators who work in different shifts or at different sites. Sometimes,the operators do not follow a strict protocol and label intuitively based on their experience. In such cases, data qualitycan suffer immensely. Inadequate learning from data may occur if a large number of samples is not available for aspecific problem and the data labeling is noisy.

Data labeling is a part of the data understanding phase and should follow a clear and well-defined frameworkbased on the technical and statistical understanding of the manufacturing and production processes and their outcome.Many manufacturing and production tasks are managed by machine operators, technical experts, engineers, and domain

14

Page 15: Ensuring the Robustness and Reliability of Data-Driven ...

experts using measures and experience to optimize the quality of the output. Because these experts are well aware ofthe existing situation, they should be allowed to label data to produce robust ML models and maintain data quality.

5.10 Feature engineering

The deep learning framework provides a promising approach for automated smart manufacturing. These algorithmsundertake automatic feature extraction and selection at higher levels in the networks. They are used for numerousimportant tasks in manufacturing, such as fault detection [67] or predicting machine health [14]. Wang et al. [116]comprehensively reviewed deep learning methods for smart manufacturing. Importantly, deep learning models maysuffer from the same issues as discussed in previous sections. Data complexity, model and time complexity, modelinterpretation, and the requirement of large amounts of data means that they may not be suited for every industrialproduction problem. Therefore, the demand remains strong in industrial production for traditional ML algorithmsusing feature-selection methods.

In ML problems, a feature is an attribute used in supervised or unsupervised ML models as an explanatory or inputvariable. Features are constructed by applying mathematical functions to raw attributes (i.e., input data) based ondomain knowledge. As a result, feature engineering and extraction lead to new characteristics of an original data set(i.e., it constructs a new feature space from the original data descriptors). Features can be created by methods thatare linear, nonlinear, or use independent component analysis. The main objective of feature-extraction techniques is toremove redundant data and create interpretable models to improve prediction accuracy or to create generalized featuresthat are, e.g., classifier independent. In general, feature learning enhances the efficiency of regression and classificationmodels used for predictive maintenance and product quality classification. In the manufacturing industry, featureengineering and extraction is a common practice for the classification of engine quality, machine health, or similarproblems using vibration data. For instance, fast Fourier transform or power spectral density statistical features arestandard feature extraction methods used for vibration data [9].

Sondhi [95] discusses various feature-extraction methods based on decision trees, genetic programming, inductivelogic programming, and annotation-based methods. Interestingly, genetic programming in combination with symbolicregression searches for a mathematical equation that best approximates a target variable. Given its simple and under-standable structure, this approach has been useful for many industrial applications of modeling and feature extraction[99, 123, 52] . However, training for a large number of variables, model overfitting, and the time complexity of themodel are problematic for this approach. Caesarendra and Tjahjowidodo [9] discuss five feature categories for vibrationdata:

1. phase-space dissimilarity measurement;

2. complexity measurement;

3. time-frequency representation;

4. time-domain feature extraction;

5. frequency-domain feature extraction.

These feature engineering methods are useful to summarize various characteristics of vibration data and reduce the sizeof high-dimensional data. Mechanical vibration is a useful indicator of machine functioning. The features of vibrationdata are often used to train ML models to predict fault detection, do predictive maintenance, and for various otherdiagnostic analyses. He et al. [35] discuss the statistical pattern analysis framework for fault detection and diagnosisfor batch and process data. In this approach, instead of process variables, sample-wise and variable-wise statisticalfeatures that quantify process characteristics are extracted and used for process monitoring. This approach is usefulfor real-world dynamic, nonlinear, and non-Gaussian process data, which are transformed into multivariate Gaussian-distributed features related to process characteristics that are then used for multivariate statistical analysis. Ko etal. [50] discuss various feature-extraction and -engineering techniques for anomaly detection or binary classification indifferent industries.

Recent results for deep learning networks and support vector machines demonstrate that feature selection cannegatively affect the prediction performance for high-dimensional genomic data [94]. However, whether these resultstranslate to data in other domains remains to be seen. Until then, feature selection remains a standard method of datapreprocessing.

In general, for the CRISP-DM framework, feature engineering is essential for real-world problems of model building.Real-world data features are not independent—they are correlated with each other, which hints of interactions betweenthe underlying processes in manufacturing. Many of the higher-level effects related to the maintenance and productquality are combinations of lower-level effects of several processes and are reflected by the data. Therefore, approachesthat use feature engineering, feature extraction, and feature selection are useful to build models to predict or test forhigher-level effects.

6 Conclusion

This paper addresses some of the key issues affecting the performance, implementation, and use of data mining and MLmodels. Specifically, we discuss existing methods and future directions of model building and evaluation in the area of

15

Page 16: Ensuring the Robustness and Reliability of Data-Driven ...

smart manufacturing. In this context, CRISP-DM is a general framework that provides a structured approach for datamining and ML applications. Its goal is to provide comprehensive data analytics and ML toolboxes to address the dataanalysis challenges in manufacturing and ultimately to help establish standards for data-driven process models thatcan be implemented for improving production quality, predictive maintenance, error detection, and the other problemsin industrial production and manufacturing.

The CRISP-DM framework describes a cyclical flow of the entire data mining process with several feedback loops.However, in many realistic data mining and modeling cases, it is an incomplete framework that requires additionalinformation and active interactions between different sections of CRISP-DM. The demand for additional or new in-formation can arise due to real-world production complexity (e.g., data-related or industry-specific applications, otherorganizational and manufacturing issues, or new emerging business trends).

This review discusses data and model-related issues for building robust models that are successively linked to abusiness understanding. One of the challenges for robust model building is the active interaction between differentphases of the CRISP-DM framework. For instance, business understanding and data understanding go hand in handbecause we learn business objectives (business understanding) and use relevant data (data understanding), so datashould be explored appropriately (data preparation) by various statistical and analytical tools to enhance our businessunderstanding from a realistic viewpoint. These three phases require active cooperation by data experts and businessexperts so that any gap or misinformation can be quickly corrected. Similarly, each phase in the CRISP-DM frameworkcan seek interactions through a feedback loop to clarify the process, and such interactions can be temporary or perma-nent extensions of the model. Thus, the active interactions between the different phases of the CRISP-DM frameworkshould be systematically and efficiently managed so as to improve the whole process and make it easy to customizeand update.

Overall, CRISP-DM describes a cyclic process that is similar to general data-science approaches [19]. Interestingly,this differs from ML or statistics that focus traditionally on a single method only for the analysis of data, establishingin this way one-step processes. Thus, CRISP-DM can be seen as an industrial realization of data science.

Another important issue discussed herein is related to the model assessment from the viewpoint of data experts,business experts, and users. We discuss four essential components of model assessment—model accuracy, model inter-pretability, model multiplicity, and model transparency—which are crucial for the final deployment of the model. Amodel should not deviate from its goal as it encounters new data; it should predict results with accuracy and providea satisfactory interpretation. Multiple models with competing accuracy also present a challenge when it comes toselecting a robust and interpretable model.

We define cases where the interpretability of the model is crucial for business understanding and decision making.The fourth issue is model transparency, which should allow users to obtain information on the internal structure ofalgorithms and methods to enable them to use various functional parameters so that the deployment phase can beimplemented and integrated with the least effort and users can use the model and rely on its predictions.

This review focuses on the following important points for the application of data-driven knowledge discovery processmodels:

a. We adopt a relevant and practically implementable, scalable, cost-effective, and low-risk CRISP-DM processmodel with industry-specific and general industrial-trend related extensions.

b. Each phase of the CRISP-DM framework should interact actively and self-actively so that any data and modelissue can be resolved with no lag.

c. When using CRISP-DM, we define the problem category related to industrial production and maintenance forthe practical implementation of data analysis.

d. Define the type of analysis to be performed for the model-building to avoid wrong answers to the right questions.

e. Various data issues should be appropriately handled for data understanding and business understanding, suchas exploration, preprocessing, and model building, which are discussed in Section 4.

f. Model evaluation and deployment should ensure that the implemented model satisfies the requirements of modelaccuracy, model interpretability, model multiplicity, and model transparency.

g. The CRISP-DM framework should adopt a human-centric approach to allow for transparent use of the modeland so that users and developers can evaluate, customize, and reuse the results that enhance their understandingof better production quality, maintenance, and other production, business, and societal goals.

In general, for data-driven models, data are at center stage and all goals are realized by the potential of theavailable data [18]. The assessment of achieved business goals at the end phase of the CRISP-DM is also analyzedthrough the success of the model. Thus, all newer understanding depends on the data obtained and how we turn thatdata into useful information. These objectives cannot be obtained solely through the efforts of business experts or dataexperts. Instead, a collective effort of active cooperation is required for the appropriate process of human-centeredCRISP-DM to achieve larger business goals. Additionally, a human-centered approach should be considered for theintegrated and complex system where we deploy several ML models to monitor and execute multiple production andmanufacturing tasks. These various models and the human business experts, data experts, technical experts, and userscannot be independently functioning entities. They interact with each other and form a complex system that leadsto the emergence of various higher-level characteristics of smart manufacturing. In such complex systems, the variousAI and machine tasks must be transparent, interactive, and flexible so as to create a robust and stable manufacturingand production environment. Future research in this direction will explore industry-specific requirements to developdata-driven knowledge discovery models that can be implemented in practice.

16

Page 17: Ensuring the Robustness and Reliability of Data-Driven ...

References[1] Genevera I. Allen and Robert Tibshirani. Transposable regularized covariance models with an application to missing data imputa-

tion. Ann. Appl. Stat., 4(2):764–790, 06 2010.

[2] Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. Power to the people: The role of humans in interactivemachine learning. AI Magazine, December 2014.

[3] Mohamed Bennasar, Yulia Hicks, and Rossitza Setchi. Feature selection using joint mutual information maximisation. ExpertSystems with Applications, 42(22):8520 – 8532, 2015.

[4] Dimitris Bertsimas and Jack Dunn. Optimal classification trees. Machine Learning, 106(7):1039–1082, Jul 2017.

[5] Dimitris Bertsimas and Nathan Kallus. From predictive to prescriptive analytics. Management Science, 66(3):1025–1044, 2020.

[6] Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statist. Sci., 16(3):199–231,08 2001.

[7] T. S. Breusch and A. R. Pagan. A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5):1287–1294, 1979.

[8] S. Van Buuren, J. P.L. Brand, C. G.M. Groothuis-Oudshoorn, and D. B. Rubin. Fully conditional specification in multivariateimputation. Journal of Statistical Computation and Simulation, 76(12):1049–1064, 2006.

[9] Wahyu Caesarendra and Tegoeh Tjahjowidodo. A review of feature extraction methods in vibration-based condition monitoringand its application for degradation trend estimation of low-speed slew bearing. Machines, 5(4), 2017.

[10] Silvia Cateni, Valentina Colla, and Marco Vannucci. A method for resampling imbalanced datasets in binary classification tasksfor real-world problems. Neurocomputing, 135:32 – 41, 2014.

[11] Gavin C. Cawley, Nicola L.C. Talbot, Robert J. Foxall, Stephen R. Dorling, and Danilo P. Mandic. Heteroscedastic kernel ridgeregression. Neurocomputing, 57:105 – 124, 2004. New Aspects in Neurocomputing: 10th European Symposium on Artificial NeuralNetworks 2002.

[12] Tao C. Chang and Ernest Faison III. Shrinkage behavior and optimization of injection molded parts studied by the taguchi method.Polymer Engineering & Science, 41(5):703–710, 2001.

[13] Nitesh Chawla, Kevin Bowyer, Lawrence O. Hall, and W Philip Kegelmeyer. Smote: Synthetic minority over-sampling technique.J. Artif. Intell. Res. (JAIR), 16:321–357, 01 2002.

[14] Jason Deutsch, Miao He, and David He. Remaining useful life prediction of hybrid ceramic bearings using an integrated deeplearning and particle filter approach. Applied Sciences, 7(7), 2017.

[15] Alberto Diez-Olivan, Javier Del Ser, Diego Galar, and Basilio Sierra. Data fusion and machine learning for industrial prognosis:Trends and perspectives towards industry 4.0. Information Fusion, 50:92 – 111, 2019.

[16] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning, 2017.

[17] Lixiang Duan, Mengyun Xie, Tangbo Bai, and Jinjiang Wang. A new support vector data description method for machinery faultdiagnosis with unbalanced datasets. Expert Systems with Applications, 64:239 – 246, 2016.

[18] F Emmert-Streib and M Dehmer. Defining data science by a data-driven quantification of the community. Machine Learning andKnowledge Extraction, 1(1):235–251, 2019.

[19] F. Emmert-Streib, S. Moutari, and M. Dehmer. The process of analyzing data is the emergent feature of data science. Frontiersin Genetics, 7:12, 2016.

[20] Frank Emmert-Streib and Matthias Dehmer. High-dimensional lasso-based computational regression models: Regularization, shrink-age, and selection. Machine Learning and Knowledge Extraction, 1(1):359–383, 2019.

[21] Aaron Fisher, Cynthia Rudin, and Francesca Dominici. All models are wrong, but many are useful: Learning a variable’s importanceby studying an entire class of prediction models simultaneously, 2018.

[22] Manuel Fradinho Duarte de Oliveira, Emrah Arica, Marta Pinzone, Paola Fantini, and Marco Taisch. Human-Centered Manufac-turing Challenges Affecting European Industry 4.0 Enabling Technologies, pages 507–517. 08 2019.

[23] Christian Gahm, Florian Denz, Martin Dirr, and Axel Tuma. Energy-efficient scheduling in manufacturing companies: A reviewand research framework. European Journal of Operational Research, 248(3):744 – 757, 2016.

[24] Joao Gama, Raquel Sebastiao, and Pedro Pereira Rodrigues. On evaluating stream learning algorithms. Machine Learning,90(3):317–346, Mar 2013.

[25] Guangfan Gao, Heping Wu, Liangliang Sun, and Lixin He. Comprehensive quality evaluation system for manufacturing enterprisesof large piston compressors. Procedia Engineering, 174:566 – 570, 2017. 13th Global Congress on Manufacturing and ManagementZhengzhou, China 28-30 November, 2016.

[26] Leslie G. Godfrey. Testing for multiplicative heteroskedasticity. Journal of Econometrics, 8(2):227 – 236, 1978.

[27] Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Commun.ACM, 61(7):56–66, June 2018.

[28] N. W. Grady. Kdd meets big data. In 2016 IEEE International Conference on Big Data (Big Data), pages 1603–1608, Dec 2016.

[29] Wendy Arianne Gunther, Mohammad H. Rezazade Mehrizi, Marleen Huysman, and Frans Feldberg. Debating big data: A literaturereview on realizing value from big data. The Journal of Strategic Information Systems, 26(3):191 – 209, 2017.

[30] Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. Learning from class-imbalanced data:Review of methods and applications. Expert Systems with Applications, 73:220 – 239, 2017.

[31] P. Hall and N. Gill. An Introduction to Machine Learning Interpretability. O”’”Reilly Media, Newton, MA, USA, 2018.

[32] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: A new over-sampling method in imbalanced data sets learning.In De-Shuang Huang, Xiao-Ping Zhang, and Guang-Bin Huang, editors, Advances in Intelligent Computing, pages 878–887, Berlin,Heidelberg, 2005. Springer Berlin Heidelberg.

[33] J. Hatvany and F.J. Lettner. The efficient use of deficient knowledge. CIRP Annals, 32(1):423 – 425, 1983.

[34] J. Hatvany and L. Nemes. Intelligent manufacturing systems— a tentative forecast. IFAC Proceedings Volumes, 11(1):895 – 899,1978. 7th Triennial World Congress of the IFAC on A Link Between Science and Applications of Automatic Control, Helsinki,Finland, 12-16 June.

[35] Q. Peter He, Jin Wang, and Devarshi Shah. Feature space monitoring for smart manufacturing via statistics pattern analysis.Computers & Chemical Engineering, 126:321 – 331, 2019.

[36] Heng Wang and Z. Abraham. Concept drift detection for streaming data. In 2015 International Joint Conference on NeuralNetworks (IJCNN), pages 1–9, July 2015.

17

Page 18: Ensuring the Robustness and Reliability of Data-Driven ...

[37] Stephanie C. Hicks and Roger D. Peng. Elements and principles for characterizing variation between data analyses, 2019.

[38] Irene Hoffmann, Sven Serneels, Peter Filzmoser, and Christophe Croux. Sparse partial robust m regression. Chemometrics andIntelligent Laboratory Systems, 149:50 – 59, 2015.

[39] Daniel F. Howard and Danielle Dai. Public perceptions of self-driving cars: The case of berkeley, california. 2014.

[40] William H. Hsu. Genetic wrappers for feature selection in decision tree induction and variable ordering in bayesian network structurelearning. Information Sciences, 163(1):103 – 122, 2004. Soft Computing Data Mining.

[41] Q. Hu, S. Zhang, M. Yu, and Z. Xie. Short-term wind speed or power forecasting with heteroscedastic support vector regression.IEEE Transactions on Sustainable Energy, 7(1):241–249, Jan 2016.

[42] Steffen Huber, Hajo Wiemer, Dorothea Schneider, and Steffen Ihlenfeldt. Dmme: Data mining methodology for engineeringapplications – a holistic extension to the crisp-dm model. Procedia CIRP, 79:403 – 408, 2019. 12th CIRP Conference on IntelligentComputation in Manufacturing Engineering, 18-20 July 2018, Gulf of Naples, Italy.

[43] J.Haffar. Have you seen asum-dm?, 2015.

[44] H. Jodlbauer and A. Huber. Service-level performance of mrp, kanban, conwip and dbr due to parameter stability and environmentalrobustness. International Journal of Production Research, 46(8):2179–2195, 2008.

[45] P. Kalgotra and R. Sharda. Progression analysis of signals: Extending crisp-dm to stream analytics. In 2016 IEEE InternationalConference on Big Data (Big Data), pages 2880–2885, Dec 2016.

[46] Seokho Kang, Eunji Kim, Jaewoong Shim, Wonsang Chang, and Sungzoon Cho. Product failure prediction with missing data.International Journal of Production Research, 56(14):4849–4859, 2018.

[47] Kanwaldeep Kaur and Giselle Rampersad. Trust in driverless cars: Investigating key factors influencing the adoption of driverlesscars. Journal of Engineering and Technology Management, 48:87 – 96, 2018.

[48] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highlyefficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3146–3154. Curran Associates, Inc., 2017.

[49] Kristian Kersting, Christian Plagemann, Patrick Pfaff, and Wolfram Burgard. Most likely heteroscedastic gaussian process regres-sion. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 393–400, New York, NY, USA,2007. ACM.

[50] Taehoon Ko, Je Hyuk Lee, Hyunchang Cho, Sungzoon Cho, Wounjoo Lee, and Miji Lee. Machine learning-based anomaly detectionvia integration of manufacturing, inspection and after-sales service data. Industrial Management and Data Systems, 117:927–945,2017.

[51] Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1):273 – 324, 1997. Relevance.

[52] Mark E. Kotanchek, Ekaterina Y. Vladislavleva, and Guido F. Smits. Symbolic Regression Via Genetic Programming as aDiscovery Engine: Insights on Outliers and Prototypes, pages 55–72. Springer US, Boston, MA, 2010.

[53] Peng Kou, Deliang Liang, Lin Gao, and Jianyong Lou. Probabilistic electricity price forecasting with variational heteroscedasticgaussian process and active learning. Energy Conversion and Management, 89:298 – 308, 2015.

[54] KHALID KOUISS, HENRI PIERREVAL, and NASSER MEBARKI. Using multi-agent architecture in fms for dynamic scheduling.Journal of Intelligent Manufacturing, 8(1):41–47, Jan 1997.

[55] Bartosz Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence,5(4):221–232, Nov 2016.

[56] Eivind Kristoffersen, Oluseun Omotola Aremu, Fenna Blomsma, Patrick Mikalef, and Jingyue Li. Exploring the relationshipbetween data science and circular economy: An enhanced crisp-dm process model. In Ilias O. Pappas, Patrick Mikalef, Yogesh K.Dwivedi, Letizia Jaccheri, John Krogstie, and Matti Mantymaki, editors, Digital Transformation for a Sustainable Society in the21st Century, pages 177–189, Cham, 2019. Springer International Publishing.

[57] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale, 2016.

[58] Andrew Kusiak. Smart manufacturing must embrace big data. Nature, April 2017.

[59] Andrew Kusiak. Smart manufacturing. International Journal of Production Research, 56(1-2):508–517, 2018.

[60] Gulser Koksal, Inci Batmaz, and Murat Caner Testik. A review of data mining applications for quality improvement in manufac-turing industry. Expert Systems with Applications, 38(10):13448 – 13467, 2011.

[61] Chia-Yen Lee, Ting-Syun Huang, Meng-Kun Liu, and Chen-Yang Lan. Data science for vibration heteroscedasticity and predictivemaintenance of rotary bearings. Energies, 12(5), 2019.

[62] J.-H. Lee and C.-O. Kim. Multi-agent systems applications in manufacturing systems and supply chain management: a reviewpaper. International Journal of Production Research, 46(1):233–265, 2008.

[63] Jeffery T. Leek and Roger D. Peng. What is the question? Science, 347(6228):1314–1315, 2015.

[64] Chun-Cheng Lin, Der-Jiunn Deng, Chin-Hung Kuo, and Linnan Chen. Concept drift detection and adaption in big imbalanceindustrial iot data using an ensemble learning method of offline classifiers. IEEE Access, 7:56198–56207, 2019.

[65] R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, 2019.

[66] Panagiotis Loukopoulos, George Zolkiewski, Ian Bennett, Suresh Sampath, Pericles Pilidis, Fang Duan, and David Mba. Addressingmissing data for diagnostic and prognostic purposes. In Ming J. Zuo, Lin Ma, Joseph Mathew, and Hong-Zhong Huang, editors,Engineering Asset Management 2016, pages 197–205, Cham, 2018. Springer International Publishing.

[67] Chen Lu, Zhenya Wang, and Bo Zhou. Intelligent fault diagnosis of rolling bearing using hierarchical convolutional network basedhealth state classification. Advanced Engineering Informatics, 32:139 – 151, 2017.

[68] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. Learning under concept drift: A review. IEEE Transactions on Knowledgeand Data Engineering, pages 1–1, 2018.

[69] Sebastian Maldonado and Richard Weber. A wrapper method for feature selection using support vector machines. InformationSciences, 179(13):2208 – 2217, 2009. Special Section on High Order Fuzzy Sets.

[70] Antonio Maratea, Alfredo Petrosino, and Mario Manzo. Adjusted f-measure and kernel scaling for imbalanced data learning.Information Sciences, 257:331 – 341, 2014.

[71] Carol A. Markowski and Edward P. Markowski. Conditions for the effectiveness of a preliminary test of variance. The AmericanStatistician, 44(4):322–326, 1990.

[72] Fernando Martınez-Plumed, Lidia Contreras Ochando, Cesar Ferri, Peter A. Flach, Jose Hernandez-Orallo, Meelis Kull, NicolasLachiche, and Marıa Jose Ramırez-Quintana. Casp-dm: Context aware standard process for data mining. ArXiv, abs/1709.09003,2017.

18

Page 19: Ensuring the Robustness and Reliability of Data-Driven ...

[73] Charles T. Marx, Flavio du Pin Calmon, and Berk Ustun. Predictive multiplicity in classification, 2019.

[74] B.W. Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et BiophysicaActa (BBA) - Protein Structure, 405(2):442 – 451, 1975.

[75] Laszlo Monostori. Ai and machine learning techniques for managing complexity, changes and uncertainties in manufacturing.Engineering Applications of Artificial Intelligence, 16(4):277 – 291, 2003. Intelligent Manufacturing.

[76] H. Nagashima and Y. Kato. Aprep-dm: a framework for automating the pre-processing of a sensor data analysis based on crisp-dm.In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pages555–560, March 2019.

[77] Iman Nekooeimehr and Susana K. Lai-Yuen. Adaptive semi-unsupervised weighted oversampling (a-suwo) for imbalanced datasets.Expert Systems with Applications, 46:405 – 416, 2016.

[78] Julio-Omar Palacio-Nino and Fernando Berzal. Evaluation metrics for unsupervised learning algorithms, 2019.

[79] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359,Oct 2010.

[80] Matthew Pratola, Hugh Chipman, Edward George, and Robert McCulloch. Heteroscedastic BART Using Multiplicative RegressionTrees. arXiv e-prints, page arXiv:1709.07542, Sep 2017.

[81] Francesco Ranzato and Marco Zanella. Robustness verification of support vector machines, 2019.

[82] Sebastian Raschka. Model evaluation, model selection, and algorithm selection in machine learning. CoRR, abs/1811.12808, 2018.

[83] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Model-agnostic interpretability of machine learning, 2016.

[84] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier,2016.

[85] J. W. Ross, C. M. Beath, and A. Quaadgras. You may not need big data after all. Harvard Business Review, 91:12, 2013.

[86] DONALD B. RUBIN. Inference and missing data. Biometrika, 63(3):581–592, 12 1976.

[87] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice-Hall, Inc., Upper Saddle River, NJ,USA, 1995.

[88] Dominik Sacha, Hansi Senaratne, Bum Chul Kwon, Geoffrey Ellis, and Daniel Keim. The role of uncertainty, awareness, and trustin visual analytics. IEEE Transactions on Visualization and Computer Graphics, 22:1–1, 11 2015.

[89] Fritz Scheuren. Multiple imputation: How it began and continues. The American Statistician, 59(4):315–319, 2005.

[90] F. Schafer, C. Zeiselmair, J. Becker, and H. Otten. Synthesizing crisp-dm and quality management: A data mining approach for pro-duction processes. In 2018 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD),pages 190–195, Nov 2018.

[91] Lesia Semenova and Cynthia Rudin. A study in rashomon curves and volumes: A new perspective on generalization and modelsimplicity in machine learning, 2019.

[92] Colin Shearer. The crisp-dm model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 2000.

[93] Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. Get another label? improving data quality and data mining usingmultiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and DataMining, KDD ’08, pages 614–622, New York, NY, USA, 2008. ACM.

[94] J Smolander, A Stupnikov, G Glazko, M Dehmer, and F Emmert-Streib. Comparing biological information contained in mrna andnon-coding rnas for classification of lung cancer patients. BMC Cancer, 19(1):1176, 2019.

[95] Parikshit Sondhi. Feature construction methods : A survey. 2009.

[96] Aaron Springer. Enabling effective transparency: Towards user-centric intelligent systems. In Proceedings of the 2019 AAAI/ACMConference on AI, Ethics, and Society, AIES ’19, pages 543–544, New York, NY, USA, 2019. ACM.

[97] Walter R. Stahel. The circular economy. Nature, 531(7595):435–438, 2016.

[98] Patrick Stanula, Amina Ziegenbein, and Joachim Metternich. Machine learning algorithms in production: A guideline for efficientdata source selection. Procedia CIRP, 78:261 – 266, 2018. 6th CIRP Global Web Conference – Envisaging the future manufacturing,design, technologies and systems in innovation era (CIRPe 2018).

[99] Sonja Strasser, Jan Zenisek, Shailesh Tripathi, Lukas Schimpelsberger, and Herbert Jodlbauer. Linear vs. symbolic regressionfor adaptive parameter setting in manufacturing processes. In Christoph Quix and Jorge Bernardino, editors, Data ManagementTechnologies and Applications, pages 50–68, Cham, 2019. Springer International Publishing.

[100] C. Su and Y. Hsiao. An evaluation of the robustness of mts for imbalanced data. IEEE Transactions on Knowledge and DataEngineering, 19(10):1321–1332, Oct 2007.

[101] Abdulhamit Subasi and M. Ismail Gursoy. Eeg signal classification using pca, ica, lda and support vector machines. Expert Systemswith Applications, 37(12):8659 – 8666, 2010.

[102] Zhongbin Sun, Qinbao Song, Xiaoyan Zhu, Heli Sun, Baowen Xu, and Yuming Zhou. A novel ensemble method for classifyingimbalanced data. Pattern Recognition, 48(5):1623 – 1637, 2015.

[103] G. Taguchi, E.A. Elsayed, and T.C. Hsiang. Quality engineering in production systems. McGraw-Hill series in industrial engineeringand management science. McGraw-Hill, 1989.

[104] G. Taguchi, S. Konishi, S. Konishi, and American Supplier Institute. Orthogonal Arrays and Linear Graphs: Tools for QualityEngineering. Taguchi methods. American Supplier Institute, 1987.

[105] M. Tajik, S. Movasagh, M. A. Shoorehdeli, and I. Yousefi. Gas turbine shaft unbalance fault detection by using vibration data andneural networks. In 2015 3rd RSI International Conference on Robotics and Mechatronics (ICROM), pages 308–313, Oct 2015.

[106] Satu Tamminen, Henna Tiensuu, Ilmari Juutilainen, and Juha Roning. Steel property and process models for quality control andoptimization. In Physical and Numerical Simulation of Materials Processing VII, volume 762 of Materials Science Forum, pages301–306. Trans Tech Publications Ltd, 8 2013.

[107] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Method-ological), 58(1):267–288, 1996.

[108] Tong Guan, Z. Zhang, Wen Dong, Chunming Qiao, and X. Gu. Data-driven fault diagnosis with missing syndromes imputation forfunctional test through conditional specification. In 2017 22nd IEEE European Test Symposium (ETS), pages 1–6, May 2017.

[109] Resit Unal and Edwin B. Dean. Taguchi approach to design optimization for quality and cost : An overview. 1991.

[110] C. Urmson and W. ”. Whittaker. Self-driving cars and the urban challenge. IEEE Intelligent Systems, 23(2):66–68, March 2008.

19

Page 20: Ensuring the Robustness and Reliability of Data-Driven ...

[111] M. C. Vale Tavares, P. Alencar, and D. Cowan. A variability-aware design approach to the data analysis modeling process. In 2018IEEE International Conference on Big Data (Big Data), pages 2818–2827, Dec 2018.

[112] Stef van Buuren. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods inMedical Research, 16(3):219–242, 2007. PMID: 17621469.

[113] V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theoryof Probability & Its Applications, 16(2):264–280, 1971.

[114] Jorge R. Vergara and Pablo A. Estevez. A review of feature selection methods based on mutual information. Neural Computingand Applications, 24(1):175–186, Jan 2014.

[115] Gregory W. Vogl, Brian A. Weiss, and Moneer Helu. A review of diagnostic and prognostic capabilities and best practices formanufacturing. Journal of Intelligent Manufacturing, 30(1):79–95, 2019.

[116] Jinjiang Wang, Yulin Ma, Laibin Zhang, Robert X. Gao, and Dazhong Wu. Deep learning for smart manufacturing: Methods andapplications. Journal of Manufacturing Systems, 48:144 – 156, 2018. Special Issue on Smart Manufacturing.

[117] Shiyong Wang, Jiafu Wan, Daqiang Zhang, Di Li, and Chunhua Zhang. Towards smart factory for industry 4.0: a self-organizedmulti-agent system with big data based feedback and coordination. Computer Networks, 101:158 – 168, 2016. Industrial Technolo-gies and Applications for the Internet of Things.

[118] Geoffrey I. Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. Characterizing concept drift. Data Mining andKnowledge Discovery, 30(4):964–994, Jul 2016.

[119] Adrian Weller. Transparency: Motivations and challenges, 2017.

[120] Halbert White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica,48(4):817–38, 1980.

[121] T. Kirk White, Jerome P. Reiter, and Amil Petrin. Imputation in u.s. manufacturing data and its implications for productivitydispersion. The Review of Economics and Statistics, 100(3):502–509, 2018.

[122] Thorsten Wuest, Christopher Irgens, and Klaus-Dieter Thoben. An approach to monitoring quality in manufacturing using super-vised machine learning on product state data. Journal of Intelligent Manufacturing, 25(5):1167–1180, Oct 2014.

[123] Guangfei Yang, Xianneng Li, Jianliang Wang, Lian Lian, and Tieju Ma. Modeling oil production based on symbolic regression.Energy Policy, 82:48 – 61, 2015.

[124] Z. Yang, S. Al-Dahidi, P. Baraldi, E. Zio, and L. Montelatici. A novel concept drift detection method for incremental learning innonstationary environments. IEEE Transactions on Neural Networks and Learning Systems, pages 1–12, 2019.

[125] Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology), 68(1):49–67, 2006.

[126] XiaoLi Zhang, BaoJian Wang, and XueFeng Chen. Intelligent fault diagnosis of roller bearings with multivariable ensemble-basedincremental support vector machine. Knowledge-Based Systems, 89:56 – 85, 2015.

[127] Lina Zhou, Shimei Pan, Jianwu Wang, and Athanasios V. Vasilakos. Machine learning on big data: Opportunities and challenges.Neurocomputing, 237:350 – 361, 2017.

[128] Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. A survey on unsupervised outlier detection in high-dimensional numericaldata. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5):363–387, 2012.

20