Industrial Use Case: Predictive Maintenance in Manufacturing (Sensor Anomaly Detection)
In heavy manufacturing, machines like CNC drills, wind turbines, and industrial pumps are equipped with IoT sensors monitoring telemetry such as vibration, temperature, rotational speed, and pressure.
When a component begins to fail, its sensor readings behave abnormally compared to the established baseline of normal operation. Because equipment failures are rare events (typically < 2% of total runtime data), this scenario presents a classic unsupervised anomaly detection challenge perfectly suited for Isolation Forest.
Complete Python Implementation
The script below generates a synthetic streaming dataset from an industrial pump, injects anomalies simulating mechanical failure, trains an Isolation Forest model, and evaluates its ability to detect these failures.
Python
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
# =====================================================================
# 1. GENERATE SYNTHETIC INDUSTRIAL SENSOR DATA
# =====================================================================
def generate_industrial_sensor_data(num_records=10000, anomaly_ratio=0.015):
"""
Generates telemetry data for an industrial pump.
Features: Temperature (°C), Vibration (mm/s), Rotational Speed (RPM), Pressure (PSI)
"""
np.random.seed(42)
# Baseline normal operational behavior
normal_records = int(num_records * (1 - anomaly_ratio))
data_normal = {
'temperature': np.random.normal(loc=65.0, scale=3.0, size=normal_records), # Stable around 65C
'vibration': np.random.normal(loc=1.8, scale=0.2, size=normal_records), # Stable low vibration
'rpm': np.random.normal(loc=1500.0, scale=50.0, size=normal_records), # Nominal rotation speed
'pressure': np.random.normal(loc=45.0, scale=2.5, size=normal_records) # Normal system pressure
}
df_normal = pd.DataFrame(data_normal)
df_normal['is_anomaly'] = 0 # Ground truth label for normal operation
# Inject anomalous behavior (Mechanical failure patterns: high heat, intense vibration)
anomaly_records = num_records - normal_records
data_anomaly = {
'temperature': np.random.normal(loc=92.0, scale=5.0, size=anomaly_records), # Overheating
'vibration': np.random.normal(loc=4.5, scale=0.8, size=anomaly_records), # Severe structural wobble
'rpm': np.random.normal(loc=1200.0, scale=150.0, size=anomaly_records), # Dropping/erratic RPM due to friction
'pressure': np.random.normal(loc=25.0, scale=8.0, size=anomaly_records) # Pressure drop/leakage
}
df_anomaly = pd.DataFrame(data_anomaly)
df_anomaly['is_anomaly'] = 1 # Ground truth label for mechanical anomaly
# Combine and shuffle the data to simulate a live data stream
df_industrial = pd.concat([df_normal, df_anomaly], ignore_index=True)
df_industrial = df_industrial.sample(frac=1, random_state=42).reset_index(drop=True)
return df_industrial
# Execute data generation
df_sensor = generate_industrial_sensor_data()
print("--- Dataset Summary ---")
print(f"Total Operational Records: {df_sensor.shape[0]}")
print(f"Normal Data Points: {df_sensor['is_anomaly'].value_counts()[0]}")
print(f"Anomalous Data Points (Failures): {df_sensor['is_anomaly'].value_counts()[1]}")
print("\nFirst 5 records of raw stream:")
print(df_sensor.head())
# =====================================================================
# 2. DATA PREPROCESSING
# =====================================================================
# Extract features for training (Unsupervised: We drop the ground truth labels)
X = df_sensor.drop(columns=['is_anomaly'])
y_true = df_sensor['is_anomaly']
# Scale features (Optional but recommended for consistency across sensor types)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# =====================================================================
# 3. TRAINING THE ISOLATION FOREST MODEL
# =====================================================================
# Define the expected contamination rate based on domain knowledge/historical metrics
contamination_rate = 0.015
iso_forest = IsolationForest(
n_estimators=100,
contamination=contamination_rate,
random_state=42,
n_jobs=-1
)
# Fit the unsupervised model on the features
iso_forest.fit(X_scaled)
# =====================================================================
# 4. ANOMALY DETECTION & INTERPRETATION
# =====================================================================
# Predict labels: -1 indicates an anomaly, 1 indicates normal
raw_predictions = iso_forest.predict(X_scaled)
# Map predictions to match our ground truth (0 = normal, 1 = anomaly)
y_pred = np.where(raw_predictions == -1, 1, 0)
# Extract continuous anomaly scores (scores closer to 1 are highly anomalous)
# In sklearn, the decision_function returns negative values for anomalies,
# so we map them to a 0-1 scale where higher means more anomalous.
anomaly_scores = iso_forest.score_samples(X_scaled)
df_sensor['anomaly_score'] = 1 - anomaly_scores
df_sensor['predicted_anomaly'] = y_pred
# =====================================================================
# 5. EVALUATION METRICS
# =====================================================================
print("\n--- Model Performance Evaluation ---")
print("Confusion Matrix:")
print(confusion_matrix(y_true, y_pred))
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=['Normal Operation', 'Mechanical Failure']))
# Showcase high-risk items requiring maintenance dispatch
print("\n--- Sample of Flagged High-Risk Anomalies (Action Required) ---")
high_risk_actions = df_sensor[df_sensor['predicted_anomaly'] == 1].head(3)
print(high_risk_actions[['temperature', 'vibration', 'rpm', 'pressure', 'anomaly_score']])
Key Takeaways from this Implementation
- No Labels Used for Training: The
iso_forest.fit(X_scaled)step does not see theis_anomalycolumn. The model uncovers patterns entirely by carving up feature space randomly and isolating sparse vectors. - Feature Distribution Dependency: The Isolation Forest flags anomalies effectively because the failure points sit significantly outside the dense cluster distributions of regular operation (e.g., temperature spikes to 92°C vs. a normal 65°C baseline).
- Actionable Out-of-Bounds Metrics: The output provides an
anomaly_score. In an industrial plant setting, engineers can set alert thresholds on this score to trigger proactive maintenance cycles before catastrophic equipment damage occurs.