Data mining

StatSoft Pacific Pty Ltd
http://www.statsoft.com.au
By Tony Grichnik and Mike Seskin, Caterpillar; Thomas Hill, StatSoft
Friday, 06 October, 2006


Data mining can offer manufacturers an effective solution when traditional methods of analysis are not suitable. The return on investment can result in improved product performance, less scrap and reword and decreased testing.

The market for new analytical methods for statistical process control (SPC) and optimisation is driven by a number of factors. In particular, companies that have invested in Six Sigma leadership, training and cultural development have emphasised quantitative assessment and problem solving. Using traditional Six Sigma tools and processes, they've realised significant improvements and cost savings. They've also recognised that a number of significant manufacturing problems are still unsolved.

Manufacturers are looking for ways to apply quantitative analysis to help their employees create better products. These companies are clear about their business objectives and have the historical data to characterise their problem domains. What they lack are methods to gain the insights necessary to approach these problems.

Traditional methods such as design of experiments (DOE), linear regression and correlation, and quality control charts aren't suitable in many circumstances for a variety of reasons:

  • The number of relevant factors is large.
  • The underlying relationship between the factors and critical outcomes is complex.
  • The important factors interact with each other.
  • The nature of the data violates the assumptions.

Data-mining methods can offer effective solutions to manufacturers facing these problems. Although data mining has been widely adopted in other industries, applying it specifically to manufacturing has been hindered by a lack of required expertise, the need to constrain solutions to practical implementations, the tendency to optimise outcomes to target specifications, the creation of the necessary tools to allow engineering professionals to perform what-if scenarios, and the absence of other tools to deploy predictive models to the factory floor for ongoing monitoring and decision making.

The return on investment from applying data mining in manufacturing has resulted in improved product performance, less scrap and rework, decreased testing, improved product performance, and fewer field service and warranty claims. Data-mining methods can provide significant value to all the data collected, stored and managed by manufacturers during the last several years.

The term 'data-mining methods' refers to a category of analytical methods geared towards determining useful relationships in huge, complex sets of data. The term arose partially to distinguish these methods from traditional statistical ones.

The interest in data-mining methods originally began in nonmanufacturing domains through a set of interrelated developments: data storage and computing power.

Data storage

Relational database management systems have become commonplace during the last few decades. Storage hardware continues to be more scalable and less expensive. As a result, companies can collect, store and manage more data effectively. Whereas analyses and decisions in the past were limited to small data sets dealing with an immediate issue, opportunities now abound to build predictive models holistically using the available historical data.

Computing power

At the same time that databases have matured and storage is becoming less expensive, the power available on standard workstation computers continues to improve. Knowledge workers today have a comparatively large amount of computing power available to them in terms of processor speed, working memory and hard-disk space.

These factors have contributed to organisations' readiness to adopt a new set of analytic methods to address data-mining opportunities. Originally, applications for data mining of business problems weren't found in manufacturing or quality-related disciplines but rather with credit-risk scoring in financial services companies, customer up-selling and cross-selling, and customer-retention applications in marketing domains. Quality practitioners in manufacturing are now realising the applicability and advantages of these methods to their databases. The wealth of existing data in SPC software databases provides a great opportunity to derive valuable insights about a company's processes and how they contribute to product quality outcomes.

A popular category of data-mining methods is recursive partitioning, often called 'tree' methods because the graphical outputs from these methods resemble a tree. Tree methods come in different algorithms, including classification and regression trees (C&RT) and chi-squared automatic interaction detectors (CHAID). This means that data-mining practitioners should apply multiple methods to the same problem to determine which provide the best predictive accuracy.

The data-mining process

Data mining is an analytic process designed to explore large amounts of data in search of consistent patterns or systematic relationships between variables, and then validating the findings by applying the detected patterns to new data subsets. The ultimate goal is prediction. Predictive data mining is the most common type and the one with the most direct business applications.

The process of data mining consists of three stages: initial exploration, model building or pattern identification with validation and/or verification, and deployment (ie, applying the model to new data to generate predictions).

Stage 1: Exploration
This stage usually starts with data preparation, which may involve cleaning data, data transformations, selecting subsets of records and, in the case of data sets with large numbers of variables (or 'fields'), performing preliminary feature selection operations to bring the number of variables to a manageable range, depending on the statistical methods being considered.

Stage 2: Model building and validation
This stage involves considering various models and choosing the best one based on its predictive performance (ie, explaining the variability in question and producing stable results across samples). This might sound like a simple operation, but it can involve a very elaborate process. To achieve this goal, a variety of techniques have been developed, many of which are based on so-called 'competitive evaluation of models', that is, applying different models to the same data set and then comparing their performance to choose the best.

Stage 3: Deployment
This final stage involves using the model selected in the previous stage and applying it to new data to generate predictions of the expected outcome.

The data-mining process aligns very well with DMAIC methodology. In fact, many predictive modelling projects in manufacturing follow the DMAIC project process and include the data-mining steps of modelling (analyse), validation (analyse) and deployment (improve and control).

Where and when to use it

The application of data-mining software is relevant in complex manufacturing processes that involve many steps in different and diverse industries. The software can model relationships between specific manufacturing process parameters and outcomes even when data are sparse and relationships complex.

Manufacturing operations that have spent the last several years investing in measurement and data storage systems are candidates for adopting data-mining and predictive modelling software. The software builds on existing data collection methodologies and makes full use of the data already collected and stored.

Manufacturing industries adopting data mining for predictive modelling and optimisation of their processes include heavy equipment, automotive, aerospace, machine tool, packaging, pharmaceuticals, robotics, semiconductor, medical and others with complex products.

Originally published in Quality Digest (www.qualitydigest.com), September 2006

Related Articles

Anticipating maintenance problems with predictive analytics

By utilising predictive analytics, process manufacturers can predict failures, enhance...

Air-gapped networks give a false sense of security

So-called 'air-gapped' OT networks can still fall victim to cyber attacks, so what is the...

Maximising automation flexibility: the ISV-driven approach

Vendor lock-in has long been a significant barrier to innovation in the industrial sector, making...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd