## What are statistical tools?

Statistics is a branch of science that deals with samples, measurements, calculations and estimation of population properties.

### Statistical Process Control

Statistics found its application in manufacturing way back in the 1920s with Dr Shewhart of Bell Laboratories. We acclaim him for his application of Statistics in Manufacturing Science to predict when the process will produce a defect.

From there on, the application of statistics in manufacturing spread across the world and became instrumental in the Japanese Quality Revolution.

According to Dr Shewhart, there are only two mistakes the people in manufacturing are doing –

1. Reacting when they are not supposed to react and
2. Not reacting when they supposed to react

i.e., there are two types of deviations happen in any process. One being inherent variations (aka, common cause variations) of the process. And the second is variations happening due to some unique disturbances (aka assessable causes or special causes).

He recommended using the statistical process of differentiating between these two types of causes. To simplify the differentiation, he used control charts for the same.

## 7 QC Tools

When Dr Deming and Dr Juran visited Japan in 1951, they instilled a culture of data-based quality improvement. Dr Deming introduced process flow diagrams, other basic graphical tools. At the same time, Dr Juran taught tools Pareto charts. Similarly, industry experts in Japan had devised tools like cause-and-effect diagram (Dr Ishikawa), why-why analysis and more.

## Statistics and Six Sigma

Statistician Bill Smith with the help of Dr Michael Harry developed a problem-solving method based on statistics in Motorola. Processes can benefit by reaching a quality level of 99.99967% of acceptance through applying this statistical methodology. This level of quality is denoted as sixth level in Sigma Scale.

Bob Galvin, the then CEO of Motorola named the method after its result, i.e., Six Sigma Methodology.

Six Sigma methodology is a framework that uses the best practices of project management, principles of statistics and best practices of statistical and improvement techniques.

We frequently use below Statistical techniques in Six Sigma methodology.

1. Sampling – sampling methods and then size,
2. Data Collection and Visualisation,
3. Screening of Data,
4. Stratification – using QC tools,
5. Measurement System Analysis,
6. Stability of process – using control charts,
7. Process capability measurements,
8. Comparison of multiple data sets,
9. Establishing the relationship between two or more variables,
10. Utilising control charts to control the parameters after improvement.

## Application of Statistical Tools

Now we have a plethora of statistical tools, and we have an ocean of know-hows in the repository.

But I frequently see that process experts getting overwhelmed by the tools. And sometimes, we end up applying the wrong tool. Even worse, we expect tools to give us more information out of the intended scope.

We need to remember, that every tool is developed for a defined purpose. And hence, they work well only when we use them correctly.

Many a time, people ask me is there a ready reckoner for beginners as well as experts.

### Tools are tools!

Tools are tools; They benefit the user depending on how rightly he uses them and how well he uses them. So, I am sharing the Application Matrix.

This matrix talks about the tool, its intended purpose of application and the outcome we can derive from it. And it clarifies when to apply the tool and prerequisites.

## Statistical Tools Application Matrix.csv

Key wordWhat is it?When to useDecision MakingDeciding FactorOutcomePrerequisitePoint of caution
HistogramData DistributionA tool that describes data in terms of frequency and its distributionWhen you want to visualise data in the form of a curveIndicativeCentral Location and Min & Max value of data, distribution across the range and shape of the dataContinuous DataOnly one data set can be studied using one histogram.
Not suitable for Sequential properties (process)
Pareto ChartPrioritise & Vital FewA tool to compare different attributes based on their frequency of appearanceWhen you want to prioritise on major category / major failuresDecisive% Cumulative contribution > 80%Vital 20% categories among the analysed categoriesFrequency or count (repetition in raw data)Require process expertise to categorise
Box PlotData Distribution & CompareA tool to describe data distribution & compare the location and spread of multiple data setsSimilar to histogram but points out median and outliers in data; and when you want to compare the distribution of more than one data setIndicative / DecisiveMedian Line, Height of box and length of WhiskerCentral location (concentration), spread of data (height of the box), skewness (position of median line) and Outliers. We can compare multiple data sets in a single graph with multiple box plots.One parameter of multiple data setsIndicative or comparative decisions can be taken. Not an absolute decision making tool like Pareto or Regression
Time SeriesTrends and PatternsA tool to see process behaviours over a time period / change of process behaviour over the timeWhen you want to understand the change in the data against a specific interval of time or sequenceIndicative (Suspect)Can Identify typo errors, visual validation of data correctness, trends and patterns of dataThe interval has to be constant (time interval / sequence interval). We need to follow Simple Systematic samplingNeed process expertise to interpret the graph
Control ChartStatistical StabilityA tool to check whether the process needs adjustments or improvementsWhen you want to see whether the process is under statistical control / stability (without disturbance)Decisive / Indicative (suspect)Any point of control limitStatistical stability of process & presence of any special cause variation. A process with one or more special causes is considered not ready for improvement. Control Limits also indicate where we can expect the next data pointWe need to follow Systematic sampling to derive a control chart. The interval has to be constant (time interval / sequence interval). Systematic Sub-group sampling will be required to construct a X-bar R Chart of X-bar S ChartHigher stability does not mean higher capability
Normality TestNormalityA tool to check whether the data is following a normal distribution (bell-shaped curve) or notWhen you want to measure how much close (resembles) the data is to the ideal normal curveDecisivep-value (more than 0.05 is said to be close enough to Normal data)Same as purpose
Process Capability CpCapabilityA tool to measure the ability of our process to meet customer requirement with respect to the studied parameter in short termWhen you want to measure our capability of process to meet customer specificationsDecisiveCp ValueHow less is our process variation in comparison with the allowed variationThe interval has to be constant (time interval / sequence interval).To be used for only established process (not for a new process). Use of short term Standard Deviation.
Process Capability CpkCapabilityA tool to measure the ability of our process to meet customer requirement with respect to the studied parameter in short termWhen you want to measure the capability of process to meet customer specificationsDecisiveCpk Value (Least of Cpk-Upper and Cpk-Lower is considered as Process CpkHow much close is my mean to the target and how much less my process variations compared to allowed variationsThe interval has to be constant (time interval / sequence interval).Use of short term Standard Deviation
DPMOCapabilityA tool to measure the number of defects the process will produceWhen you want to assess our process capability in terms of number of defects produced by the processDecisiveDPMO ValueHow many defects will my process produce in futureWe need to count the number of defects produced in a process and not the number of defective pieces. i.e., % rejection data cannot be converted into DPMOIt is different from PPM. Optimum number of Defect Opportunities
BrainstormingCause AnalysisA tool to collect expert ideas or opinions on a particular subject from a small teamWhen you want to gather all possible causes of a rejection / all possible ideas for a solutionIndicative (Suspect)-List of all possible cause of failure or solutions that are mutually exclusive and collectively exhaustivePeople participating in brainstorming has to have some basic knowledge about the process and the problem. Everyone has to participate.Every point counts. To be exhaustive
Fish-bone AnalysisCause AnalysisA framework to collect ideas from people related to 6 Categories of failures (6Ms)When you want to gather all possible causes of a rejection / all possible ideas for a solutionIndicative (Suspect)-If we covered all the 6 categories then there is a high probability that we have covered all possible causesPeople participating in brainstorming has to have some basic knowledge about the process and the problem.Everything can not be done using spreadsheets.
'Man' has to be considered as 5th category not first.
Why-Why AnalysisCause AnalysisA tool to dig into the root causeWhen you want explore all root causesIndicative (Suspect)-List of all possible root causesFocus and dedication of Time.
People participating in brainstorming has to have some basic knowledge about the process and the problem.
Knowing where to stop
RegressionRelationshipA tool to check the relationship between two factorsWhen you want mathematically determine the relationship between two process variablesDecisiveR-Square ValueMathematical equation stating the relationship between the analysed variables. Whether one variable is impacting the other variable, if so how strongly it impacts.Data of two variables (more than 2 is also possible) preferably collected at same time (simultaneously)Correlation is not causation
Hypothesis TestingComparison & Validity of AssumptionA tool to compare two data sets for equality or one data set against a standardWhen you want to statistically validate / compare the difference between properties of two or more data sets or one data set property against a standard value (spec)Decisivep-value (more than 0.05 is favouring assumption of equality)Statistical (scientific) decision about the population of data sets - whether equal/same or significantly different.Clarity on what to compare - mean or dispersion. Right way of forming statement of assumptions (Null & Alternate Hypotheses).May contradict mathematical decision. 'Equal' does not mean equal - rather means unable to find significant difference.