CNN based Skin Lesion Classification for Early Melanoma Detection

Saba Shahrukh December 21, 2025 0

another impressive project similar to the previous ones, focusing on a different and impactful medical science use case. This time, let’s delve into Dermatology for the Real-Time Classification of Skin Lesions for Early Melanoma Detection.

AI-Powered Real-Time Skin Lesion Classification for Early Melanoma Detection

Revolutionizing Dermatology with AI-Powered Skin Cancer Screening

Abstract:

Melanoma is the deadliest form of skin cancer, but it is highly curable if detected early. Traditional dermatological examination relies on visual inspection and dermoscopy, followed by biopsy for definitive diagnosis. This process can be subjective, time-consuming, and may lead to delayed diagnosis, especially in primary care settings or areas with limited access to dermatologists. This article introduces a real-time AI-powered system designed for automated classification of skin lesions from dermoscopic images, with a primary focus on differentiating benign moles from malignant melanoma. Building upon the principles of efficient image classification using Convolutional Neural Networks (CNNs), this project aims to significantly enhance diagnostic accuracy, reduce unnecessary biopsies, and facilitate earlier intervention, thereby transforming skin cancer screening and improving patient outcomes globally.

1. Introduction

Skin cancer, particularly melanoma, represents a growing public health concern. While common, melanoma’s aggressiveness necessitates early detection for successful treatment and survival. Manual examination by dermatologists, often aided by dermoscopy (a non-invasive imaging technique that visualizes subsurface skin structures), is the current standard. However, the subtle visual differences between benign and malignant lesions can be challenging, even for experienced clinicians, leading to missed diagnoses or an abundance of benign lesion biopsies.

The integration of Artificial Intelligence, especially deep learning techniques like CNNs, offers a powerful avenue to augment dermatological expertise. CNNs are exceptionally adept at identifying intricate patterns in images, making them ideal for analyzing dermoscopic photographs. This project proposes a real-time AI system that automates the initial screening of skin lesions, classifying them into categories such as benign nevus (mole) and melanoma. By providing immediate, objective assessments, this system can empower general practitioners, aid tele-dermatology consultations, and allow dermatologists to prioritize their focus on high-risk or ambiguous cases, ultimately streamlining diagnosis and patient care pathways.

2. Project Objective

The primary objective of this project is to develop and deploy an efficient, accurate, and real-time AI system for the automated classification of skin lesions, specifically for distinguishing melanoma from benign nevi, using dermoscopic images.

The key goals are:

Accelerate Screening: Reduce the time and resources required for preliminary skin lesion assessment.
Improve Diagnostic Accuracy: Enhance the objectivity and consistency of melanoma detection, minimizing false positives (unnecessary biopsies) and, critically, false negatives (missed melanomas).
Expand Accessibility: Enable improved skin cancer screening in primary care and remote areas where specialized dermatological expertise is scarce.
Support Clinicians: Serve as a valuable “second opinion” tool for clinicians, augmenting their diagnostic capabilities.
Real-time Inference: Design the system to provide near-instantaneous classification results for new dermoscopic images.

3. Use Case in Medical Science: Skin Lesion Classification (Melanoma vs. Benign Nevus)

This project focuses on a critical binary classification task in dermatology:

Benign Nevus: A common, harmless mole.
Melanoma: A malignant skin cancer originating in melanocytes.

This particular use case is highly relevant and impactful due to several factors:

High Mortality if Late, High Curability if Early: Melanoma has a high mortality rate if diagnosed at advanced stages, but over 98% survival if detected early when it’s still confined to the epidermis. This makes early detection paramount.
Visual Diagnosis: Skin lesions are visually accessible, making them amenable to image-based AI diagnostics. Dermoscopy further enhances the visual features, improving AI’s potential.
Subjectivity Challenge: Even experienced dermatologists can face challenges distinguishing subtle cases. An objective AI tool can provide invaluable support.
Population-level Screening: The high prevalence of moles in the general population makes routine screening a resource-intensive task. AI can help prioritize which lesions need closer human examination.

In a real-time application, this AI system could be integrated into a handheld dermoscope or a mobile application. A general practitioner or even a trained health worker could capture a dermoscopic image of a suspicious lesion. The AI would then instantly classify it as “Likely Benign” or “Refer for Dermatologist Evaluation (Possible Melanoma),” along with a confidence score. This immediate feedback would guide the next steps in patient management, drastically shortening the time to diagnosis for high-risk lesions.

4. Data Understanding

For training a robust AI model for skin lesion classification, comprehensive datasets of dermoscopic images with expert-verified diagnoses are indispensable. The most widely used and influential dataset for this purpose is:

ISIC Archive (International Skin Imaging Collaboration)

Description: The ISIC Archive is the largest publicly available collection of dermoscopic images of skin lesions, encompassing various types including melanoma, nevus, basal cell carcinoma, squamous cell carcinoma, and benign keratosis.
- Diagnosis: Each image is accompanied by an expert-verified diagnosis (e.g., melanoma, nevus, seborrheic keratosis).
- Metadata: Often includes patient demographics, lesion location, and sometimes clinical images.
- Challenges: The ISIC challenges (e.g., ISIC 2017, ISIC 2018, ISIC 2019, ISIC 2020) have provided structured datasets with varying numbers of classes and levels of data quality.
Image Format: JPEG or PNG.
Challenges:
- Class Imbalance: Melanoma images are significantly fewer than benign nevus images, requiring careful handling during training (e.g., oversampling, weighted loss).
- Image Variability: Images come from different dermoscopes, with variations in lighting, resolution, and artifacts (e.g., hair, gel bubbles, ruler markings).
- Subtle Features: The visual distinction between benign and early malignant lesions can be very subtle, requiring the model to learn fine-grained features.
- Lesion Delineation: Often, the lesion itself needs to be segmented from the surrounding skin, though for classification, we typically use cropped images or let the CNN learn to focus.

For this project, we will specifically target the ISIC 2019 dataset, which focuses on classifying eight different lesion types, but we will simplify it to a binary classification for demonstrating real-time melanoma detection:

Benign Nevus (NV)
Melanoma (MEL)

We will preprocess the images to remove common artifacts and resize them to a consistent dimension.

5. Technical Implementation: Code Structure and Explanation

The Python code will follow a standard machine learning pipeline: data acquisition (simulated), preprocessing, visualization, model building, training, and robust evaluation. TensorFlow/Keras will be used for deep learning, scikit-learn for utilities, and Pillow, NumPy, Matplotlib, Seaborn for image and data handling.

5.1. Import Necessary Libraries

Python

# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import zipfile
from PIL import Image # For image loading and manipulation
from sklearn.model_selection import train_test_split # For splitting data
from sklearn.preprocessing import LabelEncoder # For encoding categorical labels
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc # For model evaluation
from sklearn.utils import class_weight # For handling class imbalance

import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model # For building and loading models
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization # Core CNN layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator # For data augmentation
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint # Callbacks for training
from tensorflow.keras.optimizers import Adam # Optimizer
from tensorflow.keras.utils import to_categorical # For one-hot encoding labels

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow Version: {tf.__version__}")
print(f"Keras Version: {tf.keras.__version__}")

5.2. Data Loading and Preprocessing

This section defines how to load the ISIC dataset images and their corresponding labels. Given the variability and potential for artifacts in dermoscopic images, preprocessing is crucial.

Python

# --- 1. Load the data ---

# Define the base directory for the ISIC 2019 dataset
# IMPORTANT: Adjust this path to where your ISIC 2019 dataset images and metadata are located.
# The ISIC 2019 dataset typically has 'ISIC_2019_Training_Input' folder for images
# and 'ISIC_2019_Training_Metadata.csv' or 'ISIC_2019_Training_GroundTruth.csv'
# For this project, we'll simplify by focusing on 'ISIC_2019_Training_Input' and 'ISIC_2019_Training_GroundTruth.csv'.

data_root_dir = 'ISIC_2019_data' # Example: Assuming 'ISIC_2019_data' is the extracted root folder
image_dir = os.path.join(data_root_dir, 'ISIC_2019_Training_Input')
ground_truth_csv = os.path.join(data_root_dir, 'ISIC_2019_Training_GroundTruth.csv')

# Check if data directories/files exist
if not os.path.exists(image_dir) or not os.path.exists(ground_truth_csv):
    print(f"Error: ISIC 2019 data not found. Please ensure '{data_root_dir}' contains "
          f"'ISIC_2019_Training_Input' and 'ISIC_2019_Training_GroundTruth.csv'.")
    print("Using dummy data for demonstration as actual dataset not found.")

    # Create dummy data for demonstration if actual data is not present
    dummy_df = pd.DataFrame({
        'image': [f'ISIC_{i:07d}' for i in range(200)],
        'MEL': np.random.randint(0, 2, 200), # 0 or 1
        'NV': np.random.randint(0, 2, 200),
        'BCC': np.zeros(200), 'AK': np.zeros(200), 'BKL': np.zeros(200),
        'DF': np.zeros(200), 'VASC': np.zeros(200), 'SCC': np.zeros(200)
    })
    # Ensure one-hot encoding for dummy data
    dummy_df['diagnosis'] = ''
    for i in range(len(dummy_df)):
        if dummy_df.loc[i, 'MEL'] == 1:
            dummy_df.loc[i, 'diagnosis'] = 'MEL'
        elif dummy_df.loc[i, 'NV'] == 1:
            dummy_df.loc[i, 'diagnosis'] = 'NV'
        else: # Fallback for dummy if neither MEL nor NV is 1
            dummy_df.loc[i, 'diagnosis'] = 'NV' if np.random.rand() > 0.5 else 'MEL' # Ensure variety
    
    # Filter for MEL and NV for binary task
    df_labels = dummy_df[dummy_df['diagnosis'].isin(['MEL', 'NV'])].copy()
    df_labels['target'] = df_labels['diagnosis'].map({'NV': 0, 'MEL': 1})
    
    print(f"Using {len(df_labels)} dummy entries for MEL/NV classification.")

    # Create dummy images (actual image loading will fail without real files)
    all_images = np.random.randint(0, 255, (len(df_labels), *TARGET_IMG_SIZE, 3), dtype=np.uint8)
    all_binary_labels = df_labels['target'].values

else:
    df_ground_truth = pd.read_csv(ground_truth_csv)

    # Filter for 'MEL' (Melanoma) and 'NV' (Nevus) for binary classification
    # Convert one-hot encoded diagnosis columns to a single 'diagnosis' column for easier mapping
    # Assuming the ground truth CSV has columns like 'MEL', 'NV', 'BCC', etc. which are one-hot encoded
    
    # Identify the actual diagnosis by finding the column with value 1
    # Create a mapping for diagnosis
    cols_to_check = ['MEL', 'NV', 'BCC', 'AK', 'BKL', 'DF', 'VASC', 'SCC']
    df_ground_truth['diagnosis'] = df_ground_truth[cols_to_check].idxmax(axis=1)

    df_labels = df_ground_truth[df_ground_truth['diagnosis'].isin(['MEL', 'NV'])].copy()
    df_labels['target'] = df_labels['diagnosis'].map({'NV': 0, 'MEL': 1})

    print(f"Loaded {len(df_labels)} entries filtered for MEL/NV from {ground_truth_csv}")

    # Define target image size for CNN input. Dermoscopic images can also be large.
    TARGET_IMG_SIZE = (224, 224) # Common size for many pre-trained models (e.g., ResNet, VGG)

    # Function to load and preprocess images (adapted for ISIC naming convention)
    def load_and_preprocess_isic_image(image_id, base_image_dir, target_size=(224, 224)):
        """
        Loads an ISIC image, converts it to RGB, resizes it, and returns it as a NumPy array.
        Handles .jpg as primary extension.
        """
        image_path = os.path.join(base_image_dir, f"{image_id}.jpg")
        if not os.path.exists(image_path):
            image_path = os.path.join(base_image_dir, f"{image_id}.png") # Try .png if .jpg not found
        if not os.path.exists(image_path):
            print(f"Image not found at {image_path}")
            return None

        try:
            img = Image.open(image_path)
            img = img.convert('RGB')
            img = img.resize(target_size, Image.LANCZOS)
            return np.array(img)
        except Exception as e:
            print(f"Error loading image {image_path}: {e}")
            return None

    # Load images and labels
    images_list = []
    labels_list = []

    # For quicker demo, load a subset if the dataset is large
    # For a real project, remove [:N_SAMPLES_TO_LOAD] to load all relevant images
    N_SAMPLES_TO_LOAD = 2000 # Loading a subset for quicker execution

    # Shuffle DataFrame and select a subset to ensure diversity if loading only N samples
    df_labels_subset = df_labels.sample(min(N_SAMPLES_TO_LOAD, len(df_labels)), random_state=42).reset_index(drop=True)

    for idx, row in df_labels_subset.iterrows():
        img_array = load_and_preprocess_isic_image(row['image'], image_dir, TARGET_IMG_SIZE)
        if img_array is not None:
            images_list.append(img_array)
            labels_list.append(row['target'])

    all_images = np.array(images_list)
    all_binary_labels = np.array(labels_list)

    print(f"Total images loaded: {len(all_images)}")
    print(f"Total labels loaded: {len(all_binary_labels)}")

# Normalize pixel values to [0, 1]
X = all_images.astype('float32') / 255.0

# Define class names and number of classes for binary task
class_names = ['Benign Nevus (NV)', 'Melanoma (MEL)']
num_classes = len(class_names)

# One-hot encode binary labels
y = to_categorical(all_binary_labels, num_classes=num_classes)

print(f"Shape of preprocessed images (X): {X.shape}")
print(f"Shape of one-hot encoded labels (y): {y.shape}")
print(f"Binary Class names: {class_names}")

# Calculate class weights for imbalance
class_weights = class_weight.compute_class_weight(
    class_weight='balanced',
    classes=np.unique(all_binary_labels),
    y=all_binary_labels
)
class_weights_dict = dict(enumerate(class_weights))
print(f"\nCalculated Class Weights: {class_weights_dict}")

5.3. Data Visualization

Visualizing the class distribution (which will likely be imbalanced) and sample images helps highlight the challenges and visual characteristics of the dataset.

Python

# --- 2.2 Data Visualisation ---
# 2.2.1 Create a bar plot to display the binary class distribution
plt.figure(figsize=(6, 4))
sns.countplot(x=all_binary_labels, palette='coolwarm')
plt.title('Distribution of Skin Lesion Classes (Melanoma vs. Benign Nevus)')
plt.xlabel('Class (0: Benign Nevus, 1: Melanoma)')
plt.ylabel('Number of Images')
plt.xticks(ticks=[0, 1], labels=class_names)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

print("\nBinary class distribution details:")
print(pd.Series(all_binary_labels).value_counts().sort_index())

# 2.2.2 Visualise some sample images
def plot_sample_skin_lesion_images(images, labels, class_names, num_samples=8):
    """
    Plots sample dermoscopic images with their corresponding binary labels.
    Ensures a mix of classes.
    """
    plt.figure(figsize=(18, 9))
    unique_labels_indices = [np.where(np.argmax(labels, axis=1) == i)[0] for i in range(len(class_names))]

    selected_indices = []
    # Try to pick at least a few from each class
    for indices_for_class in unique_labels_indices:
        if len(indices_for_class) > 0:
            selected_indices.extend(np.random.choice(indices_for_class, min(3, len(indices_for_class)), replace=False))

    # Fill up to num_samples with random images if needed
    while len(selected_indices) < num_samples:
        rand_idx = np.random.randint(0, len(images))
        if rand_idx not in selected_indices:
            selected_indices.append(rand_idx)

    for i, idx in enumerate(selected_indices[:num_samples]):
        ax = plt.subplot(2, num_samples // 2, i + 1)
        plt.imshow(images[idx])
        plt.title(f"{class_names[np.argmax(labels[idx])]}")
        plt.axis("off")
    plt.tight_layout()
    plt.show()

print("\nSample Dermoscopic Images from Dataset:")
plot_sample_skin_lesion_images(X, y, class_names, num_samples=10)

5.4. Data Splitting

Stratified splitting is exceptionally important here due to the significant class imbalance between benign and melanoma cases.

Python

# --- 2.4 Data Splitting ---
# 2.4.1 Split the dataset into training and validation sets
# Using 80% for training and 20% for validation
# Stratify by 'all_binary_labels' to maintain class distribution in both sets
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=all_binary_labels
)

print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of X_val: {X_val.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of y_val: {y_val.shape}")

# Verify the class distribution in training and validation sets
print("\nTraining set binary class distribution (encoded):")
print(pd.Series(np.argmax(y_train, axis=1)).value_counts().sort_index())

print("\nValidation set binary class distribution (encoded):")
print(pd.Series(np.argmax(y_val, axis=1)).value_counts().sort_index())

5.5. Model Building and Training (Baseline Model)

A deep CNN model with appropriate regularization will be built. Given the complexity of skin lesion features, a deeper architecture might be beneficial.

Python

# --- 3. Model Building and Evaluation ---
# 3.1 Model building and training
# 3.1.1 Build and compile the model (Baseline Model without augmentation)

def build_skin_lesion_cnn_model(input_shape, num_classes):
    """
    Builds a Sequential CNN model optimized for skin lesion classification.
    Incorporates Conv2D, MaxPooling2D, BatchNormalization, and Dropout layers.

    Args:
        input_shape (tuple): Shape of the input images (height, width, channels).
        num_classes (int): Number of output classes (e.g., 2 for binary).

    Returns:
        tf.keras.Model: Compiled Keras Sequential model.
    """
    model = Sequential([
        # Input Block - Larger kernels can be good for initial texture
        Conv2D(32, (7, 7), activation='relu', input_shape=input_shape, padding='same'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.2),

        # Second Block
        Conv2D(64, (5, 5), activation='relu', padding='same'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.25),

        # Third Block
        Conv2D(128, (3, 3), activation='relu', padding='same'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.3),

        # Fourth Block
        Conv2D(256, (3, 3), activation='relu', padding='same'),
        BatchNormalization(),
        MaxPooling2D((2, 2)),
        Dropout(0.35),

        # Flatten the output for the fully connected layers
        Flatten(),

        # Fully Connected Layers
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5), # High dropout for FC layers

        # Output layer
        Dense(num_classes, activation='softmax')
    ])

    # Compile the model
    optimizer = Adam(learning_rate=0.0003) # Fine-tuned learning rate
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy', tf.keras.metrics.Precision(name='precision'),
                            tf.keras.metrics.Recall(name='recall'),
                            tf.keras.metrics.AUC(name='auc')]) # Add Precision, Recall, AUC
    return model

input_shape = (TARGET_IMG_SIZE[0], TARGET_IMG_SIZE[1], 3)
baseline_model = build_skin_lesion_cnn_model(input_shape, num_classes)
print("Baseline Skin Lesion Model Summary:")
baseline_model.summary()

# 3.1.2 Train the model (Baseline Model)
# Define callbacks
early_stopping = EarlyStopping(monitor='val_auc', patience=25, restore_best_weights=True, mode='max', verbose=1) # Monitor AUC
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=12, min_lr=0.0000001, verbose=1)
model_checkpoint = ModelCheckpoint('best_baseline_skin_lesion_model.keras', monitor='val_auc', save_best_only=True, mode='max', verbose=1)

print("\n--- Training Baseline Skin Lesion Model ---")
history_baseline = baseline_model.fit(
    X_train, y_train,
    epochs=150,
    batch_size=32,
    validation_data=(X_val, y_val),
    class_weight=class_weights_dict, # Use class weights for imbalance
    callbacks=[early_stopping, reduce_lr, model_checkpoint],
    verbose=1
)

# Plot training history for baseline model
def plot_training_history(history, title_suffix=""):
    plt.figure(figsize=(18, 6))

    # Plot accuracy
    plt.subplot(1, 3, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.title(f'Model Accuracy {title_suffix}')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    # Plot loss
    plt.subplot(1, 3, 2)
    plt.plot(history.history['loss'], label='Train Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title(f'Model Loss {title_suffix}')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)
    
    # Plot AUC
    plt.subplot(1, 3, 3)
    plt.plot(history.history['auc'], label='Train AUC')
    plt.plot(history.history['val_auc'], label='Validation AUC')
    plt.title(f'Model AUC {title_suffix}')
    plt.xlabel('Epoch')
    plt.ylabel('AUC')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

plot_training_history(history_baseline, "(Baseline Skin Lesion Model)")

5.6. Model Testing and Evaluation (Baseline Model)

For melanoma detection, high recall (sensitivity) for the melanoma class is critical to avoid missing true positives, even if it comes with a slight increase in false positives. AUC is also a key metric.

Python

# --- 3.2 Model Testing and Evaluation (Baseline Model) ---
# 3.2.1 Evaluate the model on validation dataset. Derive appropriate metrics.

print("\n--- Evaluating Baseline Skin Lesion Model on Validation Set ---")
baseline_eval_results = baseline_model.evaluate(X_val, y_val, verbose=1)
# The order of results matches the metrics defined in model.compile
baseline_loss = baseline_eval_results[0]
baseline_accuracy = baseline_eval_results[1]
baseline_precision = baseline_eval_results[2]
baseline_recall = baseline_eval_results[3]
baseline_auc = baseline_eval_results[4]

print(f"\nBaseline Skin Lesion Model Validation Loss: {baseline_loss:.4f}")
print(f"Baseline Skin Lesion Model Validation Accuracy: {baseline_accuracy:.4f}")
print(f"Baseline Skin Lesion Model Validation Precision: {baseline_precision:.4f}")
print(f"Baseline Skin Lesion Model Validation Recall: {baseline_recall:.4f}")
print(f"Baseline Skin Lesion Model Validation AUC: {baseline_auc:.4f}")

# Get predictions
y_pred_probs_baseline = baseline_model.predict(X_val)
y_pred_baseline = np.argmax(y_pred_probs_baseline, axis=1) # Predicted classes (0 or 1)
y_true_val = np.argmax(y_val, axis=1) # True classes (0 or 1)

# Classification Report
print("\nClassification Report (Baseline Skin Lesion Model):")
print(classification_report(y_true_val, y_pred_baseline, target_names=class_names))

# Confusion Matrix
conf_matrix_baseline = confusion_matrix(y_true_val, y_pred_baseline)
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix_baseline, annot=True, fmt='d', cmap='Reds',
            xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix (Baseline Skin Lesion Model)')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# ROC Curve
y_pred_proba_melanoma = y_pred_probs_baseline[:, 1] # Probability of Melanoma
fpr, tpr, thresholds = roc_curve(y_true_val, y_pred_proba_melanoma)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve (Baseline Skin Lesion Model)')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

5.7. Data Augmentation and Augmented Model Training

Data augmentation for dermoscopic images should include transformations that mimic variations in image capture (e.g., rotation, zoom, flips) and subtle changes in lesion appearance.

Python

# --- 4. Data Augmentation ---
# 4.1 Create a Data Augmentation Pipeline
# 4.1.1 Define augmentation steps for the datasets.

# Create an ImageDataGenerator for data augmentation
train_datagen_skin = ImageDataGenerator(
    rotation_range=20,          # Random rotation
    zoom_range=0.15,            # Random zoom
    width_shift_range=0.2,      # Random horizontal shift
    height_shift_range=0.2,     # Random vertical shift
    shear_range=0.15,           # Shear intensity
    horizontal_flip=True,       # Randomly flip inputs horizontally
    vertical_flip=True,         # Vertical flip can be appropriate for skin lesions
    fill_mode='nearest',        # Strategy for filling in new pixels
    brightness_range=[0.8, 1.2] # Adjust brightness
)

# For validation data, we only ensure normalization consistency
val_datagen_skin = ImageDataGenerator()

# Create augmented training and validation data generators
augmented_train_generator_skin = train_datagen_skin.flow(X_train, y_train, batch_size=32, shuffle=True)
validation_generator_skin = val_datagen_skin.flow(X_val, y_val, batch_size=32, shuffle=False)

print("Data augmentation pipeline defined and generators created for Skin Lesion.")

# 4.1.2 Train the model on the new augmented dataset.
# Re-build the model to ensure fresh weights for fair comparison with augmentation.
augmented_skin_lesion_model = build_skin_lesion_cnn_model(input_shape, num_classes)
print("\nAugmented Skin Lesion Model Summary:")
augmented_skin_lesion_model.summary()

# Define callbacks for augmented training
early_stopping_aug_skin = EarlyStopping(monitor='val_auc', patience=30, restore_best_weights=True, mode='max', verbose=1)
reduce_lr_aug_skin = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=15, min_lr=0.00000001, verbose=1)
model_checkpoint_aug_skin = ModelCheckpoint('best_augmented_skin_lesion_model.keras', monitor='val_auc', save_best_only=True, mode='max', verbose=1)

print("\n--- Training Augmented Skin Lesion Model ---")
history_augmented_skin = augmented_skin_lesion_model.fit(
    augmented_train_generator_skin,
    steps_per_epoch=len(X_train) // 32,
    epochs=250, # Increased epochs
    validation_data=validation_generator_skin,
    validation_steps=len(X_val) // 32,
    class_weight=class_weights_dict, # Use class weights
    callbacks=[early_stopping_aug_skin, reduce_lr_aug_skin, model_checkpoint_aug_skin],
    verbose=1
)

plot_training_history(history_augmented_skin, "(Augmented Skin Lesion Model)")

# --- 3.2 Model Testing and Evaluation (Augmented Model) ---
print("\n--- Evaluating Augmented Skin Lesion Model on Validation Set ---")
augmented_eval_results_skin = augmented_skin_lesion_model.evaluate(X_val, y_val, verbose=1)
augmented_loss_skin = augmented_eval_results_skin[0]
augmented_accuracy_skin = augmented_eval_results_skin[1]
augmented_precision_skin = augmented_eval_results_skin[2]
augmented_recall_skin = augmented_eval_results_skin[3]
augmented_auc_skin = augmented_eval_results_skin[4]

print(f"\nAugmented Skin Lesion Model Validation Loss: {augmented_loss_skin:.4f}")
print(f"Augmented Skin Lesion Model Validation Accuracy: {augmented_accuracy_skin:.4f}")
print(f"Augmented Skin Lesion Model Validation Precision: {augmented_precision_skin:.4f}")
print(f"Augmented Skin Lesion Model Validation Recall: {augmented_recall_skin:.4f}")
print(f"Augmented Skin Lesion Model Validation AUC: {augmented_auc_skin:.4f}")

# Get predictions for augmented model
y_pred_probs_augmented_skin = augmented_skin_lesion_model.predict(X_val)
y_pred_augmented_skin = np.argmax(y_pred_probs_augmented_skin, axis=1)

# Classification Report (Augmented Model)
print("\nClassification Report (Augmented Skin Lesion Model):")
print(classification_report(y_true_val, y_pred_augmented_skin, target_names=class_names))

# Confusion Matrix (Augmented Model)
conf_matrix_augmented_skin = confusion_matrix(y_true_val, y_pred_augmented_skin)
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix_augmented_skin, annot=True, fmt='d', cmap='Reds',
            xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix (Augmented Skin Lesion Model)')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# ROC Curve and AUC (Augmented Model)
fpr_aug_skin, tpr_aug_skin, thresholds_aug_skin = roc_curve(y_true_val, y_pred_probs_augmented_skin[:, 1])
roc_auc_aug_skin = auc(fpr_aug_skin, tpr_aug_skin)

plt.figure(figsize=(6, 5))
plt.plot(fpr_aug_skin, tpr_aug_skin, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc_aug_skin:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve (Augmented Skin Lesion Model)')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

5.8. Real-time Inference Simulation

This section demonstrates how the trained model would perform a rapid prediction on a new, unseen dermoscopic image, simulating its use in a real-time clinical screening scenario.

Python

# --- Real-time Inference Simulation ---

# Load the best augmented model for inference
try:
    final_skin_lesion_model = load_model('best_augmented_skin_lesion_model.keras')
    print("Loaded best augmented Skin Lesion model for inference.")
except Exception as e:
    print(f"Could not load 'best_augmented_skin_lesion_model.keras'. Using the last trained augmented model. Error: {e}")
    final_skin_lesion_model = augmented_skin_lesion_model # Fallback to the last trained model if checkpoint fails

# Define a helper function to predict from a preprocessed array (common in real-time)
def predict_from_array(model, img_array, class_names):
    img_array_expanded = np.expand_dims(img_array, axis=0) # Add batch dimension
    predictions = model.predict(img_array_expanded)
    predicted_class_idx = np.argmax(predictions)
    predicted_class_name = class_names[predicted_class_idx]
    return predicted_class_name, predictions[0] # Return probabilities for the single image

# Select a random image from the validation set for demonstration
random_idx_skin = np.random.randint(0, len(X_val))
sample_image_array_skin = X_val[random_idx_skin]
true_label_idx_skin = np.argmax(y_val[random_idx_skin])
true_label_name_skin = class_names[true_label_idx_skin]

# Perform prediction
predicted_class_skin, probabilities_skin = predict_from_array(final_skin_lesion_model, sample_image_array_skin, class_names)

print(f"\n--- Prediction for a Sample Dermoscopic Image ---")
print(f"True Label: {true_label_name_skin}")
print(f"Predicted Label: {predicted_class_skin}")
print(f"Prediction Probabilities: {probabilities_skin}")

# Visualize the sample image and its prediction
plt.figure(figsize=(7, 7))
plt.imshow(sample_image_array_skin)
plt.title(f"True: {true_label_name_skin}\nPredicted: {predicted_class_skin} (Conf: {probabilities_skin[np.argmax(probabilities_skin)]:.2f})")
plt.axis('off')
plt.show()

print("\n--- Real-time System Workflow Simulation for Skin Cancer Screening ---")
print("1. Clinician captures a dermoscopic image of a suspicious lesion using a connected device.")
print("2. The image is instantly transferred to the local AI inference engine.")
print("3. The AI model preprocesses and classifies the image in milliseconds.")
print("4. The result (e.g., 'Likely Benign', 'Refer for Specialist' with confidence) is displayed to the clinician.")
print("5. The clinician uses this real-time insight to decide on immediate patient management (e.g., reassure, monitor, or refer for biopsy/dermatologist consultation).")

6. Real-time Project Architecture for Clinical Deployment

For a robust, real-time skin lesion classification system in a clinical environment, the architecture would typically involve:

Dermoscope/Smartphone Integration:
- Directly Connected Dermoscope: Professional dermoscopes with digital output directly feed images to the processing unit.
- Smartphone App with Dermoscope Attachment: For wider accessibility, a smartphone app coupled with a dermoscope attachment can capture images. The app would then send images to the local inference engine.
Edge AI Processing Unit:
- A powerful, dedicated computing device (e.g., a mini-PC with an NVIDIA GPU like Jetson, or a high-end tablet) located in the clinic.
- Advantages of Edge: Minimal latency (no internet dependency), enhanced data privacy (images stay local), and reliability in areas with poor internet connectivity.
- Software Stack: Includes image acquisition drivers, a lightweight inference server (e.g., TensorFlow Lite, ONNX Runtime), and the trained CNN model.
Image Preprocessing Module: Automated steps to:
- Crop the lesion from the background.
- Remove artifacts (e.g., hair, ruler marks) using image processing techniques or a dedicated sub-model.
- Resize and normalize the image to the model’s input dimensions.
Real-time Inference Engine: The core of the system. The pre-trained best_augmented_skin_lesion_model.keras would be loaded into memory for rapid prediction.
User Interface (UI):
- A clean, intuitive display for clinicians to view the dermoscopic image, the AI’s classification result (e.g., “Benign Nevus,” “Melanoma”), and a confidence score.
- Crucially, it should also offer Explainable AI (XAI) visualizations (e.g., heatmaps or saliency maps) to show which parts of the lesion contributed to the AI’s decision, enhancing trust and clinical utility.
Secure Database & Reporting: Local or cloud-based secure storage for images, AI predictions, and clinician’s confirmed diagnoses. Generates customizable reports for patient records and referral purposes.
Integration with EHR/EMR: Seamless connectivity with existing Electronic Health Record/Electronic Medical Record systems for patient data integration.

Challenges in Real-time Deployment:

Model Generalization: Ensuring the model performs well on images from diverse dermoscopes, skin types, and lighting conditions.
Artifact Handling: Robustly dealing with hair, air bubbles, ink marks, and other common artifacts in dermoscopic images without degrading performance.
Regulatory Approval: Obtaining necessary medical device certifications (e.g., FDA, CE Mark) requires extensive clinical trials and validation.
Ethical Considerations: Clearly defining the AI’s role as a diagnostic aid, not a replacement for human dermatologists, and managing patient expectations.
Data Privacy (HIPAA/GDPR): Strict adherence to regulations when handling sensitive patient health information.

7. Conclusions and Future Work

This project demonstrates a robust framework for leveraging CNNs in the real-time classification of skin lesions, specifically for distinguishing melanoma from benign nevi.

Outcomes and Insights Gained:
- AI for Early Detection: CNNs prove highly effective in identifying subtle, crucial features in dermoscopic images indicative of melanoma. This directly supports the primary goal of early detection.
- Preprocessing and Augmentation are Key: Careful image preprocessing (e.g., resizing, normalization) and aggressive data augmentation are essential to make the model robust to variations in real-world dermoscopic images and to mitigate overfitting, especially given the typically imbalanced nature of melanoma datasets.
- Addressing Imbalance: The use of class_weight during training is crucial for ensuring the model learns equally well from both majority (benign) and minority (melanoma) classes, preventing bias towards the more common class.
- Critical Evaluation Metrics: Beyond accuracy, emphasis on metrics like Recall (Sensitivity) for the melanoma class (to minimize false negatives, which are clinically catastrophic) and AUC (Area Under the Receiver Operating Characteristic Curve) is paramount. A high AUC indicates excellent discriminative power between classes. (Upon execution, we would report specific metrics: “The augmented skin lesion model achieved a validation AUC of X and a recall for Melanoma of Y%, demonstrating strong performance in identifying malignant cases while maintaining reasonable specificity.”)
- Real-time Potential: The model’s architecture is designed for rapid inference, making it suitable for integration into real-time clinical workflows, enabling immediate feedback to clinicians.
Future Enhancements:
- Transfer Learning with Pre-trained Models: Employ advanced architectures like EfficientNet, ResNeXt, or Vision Transformers pre-trained on large image datasets, then fine-tune them. This can significantly boost performance and reduce training time.
- Multi-class Classification: Expand the model to classify all eight or more lesion types provided in ISIC datasets, offering a more comprehensive diagnostic aid. This would require more sophisticated handling of multi-class imbalance.
- Lesion Segmentation and Analysis: Integrate an initial segmentation step (e.g., U-Net) to precisely delineate the lesion from the background, potentially improving classification accuracy by focusing the CNN on relevant pixels.
- Ensemble Modeling: Combine predictions from multiple CNN models or different model architectures to further improve robustness and accuracy.
- Meta-learning/Few-shot Learning: For very rare skin conditions, explore techniques that allow the model to learn effectively from limited examples.
- Longitudinal Monitoring: Develop features to track changes in a specific lesion over time, aiding in the detection of evolving moles.
- Hardware Optimization: Optimize the model for deployment on specific edge computing hardware (e.g., using quantization with TensorFlow Lite) for maximal efficiency and speed in portable devices.

This AI-powered skin lesion classification system holds tremendous promise for transforming dermatological screening, empowering clinicians with a powerful tool for rapid, accurate, and potentially life-saving early melanoma detection.

Category:

Medical Imaging