16 Statistical Process Control In Python

Figure 2.1: Statistical Process Control!

In this workshop, we will learn how to do statistical process control in Python using statistical tools plotnine Visualization! Statistical process control refers to (1) using statistics to measure variations in product quality over time and (2) identifying benchmarks to know when intervention is needed. let’s get started!

launch

packages

# Remember to install these packages using a terminal, if you haven't already!
!pip install pandas plotnine scipy

we will experiment pandas For data manipulation, plotnine For visualization, and scipy For statistical purposes.

import pandas as pd
from plotnine import *

custom work

This workshop uses custom functions functions/ Directory. You may need both:- functions_distributions.py – For reliability and delivery functions – functions_process_control.py – for statistical process control tasks

To use these functions, you need to get them from the repository at github.com/timothyfraser/sigma/tree/main/functions.

Add functions directory to your python path

import sys
import os
# Add the functions directory to Python path
sys.path.append('functions')  # or path to wherever you placed the functions folder

Once you have the functions available, you can import them:

from functions_distributions import density, tidy_density, approxfun
# from functions_process_control import ggprocess, ggsubgroup, ggmoving, ggcapability  # if needed

our case

For today’s workshop, we will consider why quality control matters in a local economy by examining the case of the Japanese hot springs bath economy! hot springs, or onsenHot springs are a major source of tourism and recreation for families in Japan, drawing residents from across the country each year, often to rural communities where the perfect geological conditions produce naturally occurring hot springs. Restaurants, taxi and bus companies, and many service sector companies rely on their local efforts to bring a steady stream of tourists into the local economy. Therefore, it is often in the best interest onsen To ensure quality control, operators must monitor the temperature, minerals, or other aspects of their hot spring baths in order to maintain their company’s (and city’s!) reputation for quality rest and relaxation!

Onsen-Those who visit often seek Specific Types of Hot Springs, Why It’s Important onsen Really delivering what it advertises! Sarbulia and Payyappallimana (2012) describe some of these benchmarks.

temperature: Onsen are divided into “additional hot springs” (>42°C), “hot Springs” (41~34°C), and “Warm Springs” (33~25°C,
pH: Onsen is classified as “acidic” (pH < 3), “mildly acidic” (pH 3~6), “neutral” (pH 6~7.5), “mild alkaline” (pH 7.5~8.5), and “alkaline” (pH > 8.5,
sulfur:sulphur onsen There is usually about 2 mg of sulfur per 1 kg of hot spring water; sulfur levels Sure More than 1 mg counts as sulfur onsen. (It smells like rotten eggs!)

These are good examples of quality control metrics onsen Operators might want to keep an eye out!

Even monkeys are fans of onsen! Reading [**more here!**](https://www.nytimes.com/2018/04/03/science/japan-monkeys-hot-springs-stress.html)

Figure 4.1: Monkeys are onsen fans too! Reading more here!

our data

You have been assigned to evaluate quality control at the local level onsen In sunny Kagoshima Prefecture! Every month, for 15 months, you systematically took 20 random samples of hot spring water and recorded it temperature, pHAnd sulfur level. How can you determine that this onsen There is a danger of moving out of one sector of the market (e.g. extra hot!) into another (just normal hot springs?).

Let’s read into our data workshops/onsen.csv,

# Add functions directory to path if not already there
import sys
if 'functions' not in sys.path:
    sys.path.append('functions')

from functions_distributions import density, tidy_density, approxfun

water = pd.read_csv('workshops/onsen.csv')
water.head(3)

##    id  time  temp   ph  sulfur
## 0   1     1  43.2  5.1     0.0
## 1   2     1  45.3  4.8     0.4
## 2   3     1  45.5  6.2     0.9

Process Descriptive Statistics

First, let’s understand our process by calculating some basic descriptive statistics. We will create a simple function to calculate the mean and standard deviation, which are fundamental to evaluating process variation.

from pandas import Series
def describe(x: Series):
  x = Series(x)
  out = pd.DataFrame({
    'mean': [x.mean()],
    'sd': [x.std()],
  })
  out['caption'] = ("Process Mean: " + out['mean'].round(2).astype(str) +
                    " | SD: " + out['sd'].round(2).astype(str))
  return out

tab = describe(water['temp'])
tab

##     mean        sd                         caption
## 0  44.85  1.989501  Process Mean: 44.85 | SD: 1.99

Let’s now apply this to our temperature data to see the overall process mean and variation.

process overview view

The process overview chart is one of the most important tools in SPC. It shows us how our process behaves over time, helping us identify patterns, trends and potential issues. We will create a visualization that will show individual measurements, subgroup means, and overall process averages.

g1 = (ggplot(water, aes(x='time', y='temp', group='time')) +
  geom_hline(aes(yintercept=water['temp'].mean()), color='lightgrey', size=3) +
  geom_jitter(height=0, width=0.25) +
  geom_boxplot() +
  labs(x='Time (Subgroup)', y='Temperature (Celsius)', subtitle='Process Overview', caption=tab['caption'][0]))

# Save the plot
g1.save('images/05_process_overview.png', width=8, height=6, dpi=100)

05 process overview

g2 = (ggplot(water, aes(x='temp')) + geom_histogram(bins=15, color='white', fill='grey') + theme_void() + coord_flip())

# Save the plot
g2.save('images/05_process_histogram.png', width=8, height=6, dpi=100)

05 process histogram

The histogram shows us the distribution of all temperature measurements, giving us information about overall process variation. This helps us understand if our process is focused and how much variation we are seeing.

Subgroup (within group) statistics

At SPC, we often work with subgroups – Small samples taken at regular intervals. This allows us to distinguish between common cause variation (inherent in the process) and special cause variation (due to specific events). Let’s calculate the statistics for each subgroup to see how the process behaves over time.

stat_s = (water.groupby('time').apply(lambda d: pd.Series({
  'xbar': d['temp'].mean(),
  'r': d['temp'].max() - d['temp'].min(),
  'sd': d['temp'].std(),
  'nw': len(d)
})).reset_index())
stat_s['df'] = stat_s['nw'] - 1
stat_s['sigma_s'] = ( (stat_s['df'] * (stat_s['sd']**2)).sum() / stat_s['df'].sum() )**0.5
stat_s['se'] = stat_s['sigma_s'] / (stat_s['nw']**0.5)
stat_s['upper'] = stat_s['xbar'].mean() + 3*stat_s['se']
stat_s['lower'] = stat_s['xbar'].mean() - 3*stat_s['se']
stat_s.head(3)

##    time    xbar    r        sd    nw    df   sigma_s        se      upper      lower
## 0     1  44.635  4.2  1.342533  20.0  19.0  1.986174  0.444122  46.182366  43.517634
## 1     3  45.305  7.9  2.001440  20.0  19.0  1.986174  0.444122  46.182366  43.517634
## 2     5  44.765  5.9  1.628133  20.0  19.0  1.986174  0.444122  46.182366  43.517634

Here we calculated the key statistics for each subgroup:

xbar:mean of each subgroup
R: Range (maximum – minimum) within each subgroup.
sd: standard deviation within each subgroup
sigma_s: clustered standard deviation within the subgroup.
From: Standard error for each subgroup mean

Overall statistics (between groups)

Let’s now calculate overall process statistics that summarize behavior across all subgroups:

stat_t = pd.DataFrame({
  'xbbar': [stat_s['xbar'].mean()],
  'rbar': [stat_s['r'].mean()],
  'sdbar': [stat_s['sd'].mean()],
  'sigma_s': [(stat_s['sd']**2).mean()**0.5],
  'sigma_t': [water['temp'].std()]
})
stat_t

##    xbbar    rbar    sdbar   sigma_s   sigma_t
## 0  44.85  7.2625  1.93619  1.986174  1.989501

These data give us:

xbbar: Grand mean (average of all subgroup means)
rbar:average range in subgroups
sdbar: average standard deviation across subgroups
sigma_s: clustered standard deviation within the subgroup.
sigma_t: total process standard deviation

Average and Standard Deviation Chart

Control charts are the heart of SPC. They help us monitor process stability over time and detect when a process goes out of control. We will create charts for both the subgroup mean (X-bar chart) and standard deviation (S chart).

labels = pd.DataFrame({
  'time': [stat_s['time'].max()]*3,
  'type': ['xbbar','upper','lower'],
  'name': ['mean','+3 s','-3 s'],
  'value': [stat_s['xbar'].mean(), stat_s['upper'].iloc[0], stat_s['lower'].iloc[0]]
})

control_chart = (ggplot(stat_s, aes(x='time', y='xbar')) +
  geom_hline(aes(yintercept=stat_s['xbar'].mean()), color='lightgrey', size=3) +
  geom_ribbon(aes(ymin='lower', ymax='upper'), fill='steelblue', alpha=0.2) +
  geom_line(size=1) + geom_point(size=5) +
  geom_label(data=labels, mapping=aes(x='time', y='value', label='name'), ha='right') +
  labs(x='Time (Subgroups)', y='Average', subtitle='Average and Standard Deviation Chart'))

# Save the plot
control_chart.save('images/05_control_chart.png', width=8, height=6, dpi=100)

05 control chart

This control chart shows:

mid line: grand mean (xbbar)
control limits: upper and lower 3-sigma limits based on standard error
personal points:mean of each subgroup plotted over time
shaded area: control border area

Points outside the control limits or showing non-random patterns indicate that the process may be out of control and requires investigation.

learning check 1

Question

Prepare similar process overview chart for pH,

[View Answer!]

def ggprocess(x, y, xlab='Subgroup', ylab='Metric'):
  import pandas as pd
  from plotnine import ggplot, aes, geom_hline, geom_jitter, geom_boxplot, labs
  d = pd.DataFrame({'x': x, 'y': y})
  g = (ggplot(d, aes(x='x', y='y', group='x')) +
       geom_hline(aes(yintercept=d['y'].mean()), color='lightgrey', size=3) +
       geom_jitter(height=0, width=0.25) +
       geom_boxplot() +
       labs(x=xlab, y=ylab, subtitle='Process Overview'))
  return g

ph_chart = ggprocess(water['time'], water['ph'])

# Save the plot
ph_chart.save('images/05_ph_chart.png', width=8, height=6, dpi=100)

05 ph chart

Moving Range Chart (n=1)

When we have individual measurements rather than subgroups, we use moving range charts. The moving range is the absolute difference between consecutive measurements, which helps us estimate process variation when we cannot calculate sub-group statistics.

indiv = water.iloc[[0,20,40,60,80,100,120,140]]
mr = (indiv['temp'].diff().abs().dropna())
mrbar = mr.mean()
import numpy as np
d2 = np.mean(np.abs(np.diff(np.random.normal(0,1,10000))))
sigma_s = mrbar / d2
se = sigma_s / (1**0.5)
upper = mrbar + 3*se
lower = 0

istat = pd.DataFrame({'time': indiv['time'].iloc[1:], 'mr': mr, 'mrbar': mrbar, 'upper': upper, 'lower': lower})
mr_chart = (ggplot(istat, aes(x='time', y='mr')) +
  geom_ribbon(aes(ymin='lower', ymax='upper'), fill='steelblue', alpha=0.25) +
  geom_hline(aes(yintercept=mr.mean()), size=3, color='darkgrey') +
  geom_line(size=1) + geom_point(size=5) +
  labs(x='Time (Subgroup)', y='Moving Range', subtitle='Moving Range Chart'))

# Save the plot
mr_chart.save('images/05_moving_range_chart.png', width=8, height=6, dpi=100)

05 moving range chart

The moving range chart shows:

mid line: Average Speed Limit (MRbar)
upper control limit: Estimated process based on standard deviation
lower control limit: Set to 0 (moving range cannot be negative)
personal points:each moving range value

This chart helps us monitor process variation when we have individual measurements rather than subgroups.

conclusion

You have successfully created SPC visuals and statistics in Python: process overview, subgroup statistics, and moving range logic. These tools help us understand process behavior, identify when a process is in control or out of control, and make data-driven decisions about process improvements.

<a href

16 Statistical Process Control in Python

launch

packages

custom work

our case

our data

Process Descriptive Statistics

process overview view

Subgroup (within group) statistics

Overall statistics (between groups)

Average and Standard Deviation Chart

learning check 1

Moving Range Chart (n=1)

conclusion

Like this:

Related

Leave a Comment Cancel reply

launch

packages

custom work

our case

our data

Process Descriptive Statistics

process overview view

Subgroup (within group) statistics

Overall statistics (between groups)

Average and Standard Deviation Chart

learning check 1

Moving Range Chart (n=1)

conclusion

Share this:

Like this:

Related

Leave a Comment Cancel reply