Hyperparameter tuning in Model Selection

Saba Shahrukh January 14, 2026 0

Hyperparameter tuning is a crucial step in the model selection process. After choosing a machine learning model, its performance heavily depends on the hyperparameters, which are parameters set before the training process begins and control how the model learns. Finding the optimal set of hyperparameters is essential for achieving the best possible performance and generalization.

Here’s how hyperparameter tuning fits into model selection:

1. Optimizing Model Performance:

Fine-tuning the Learning Process: Hyperparameters directly influence the learning process of a model. For example, in a neural network, the learning rate determines the step size during weight updates. An inappropriate learning rate can lead to slow convergence or overshooting the optimal solution.
Controlling Model Complexity: Hyperparameters often control the complexity of the model, which directly impacts the bias-variance trade-off. For instance, the max_depth in a decision tree or the number of hidden layers in a neural network determine the model’s capacity to learn complex patterns. Tuning these helps in finding the right balance to avoid underfitting or overfitting.
Maximizing Evaluation Metrics: The goal of hyperparameter tuning is to find the hyperparameter configuration that maximizes the chosen evaluation metric (e.g., accuracy, F1-score, AUC) on a validation set. This ensures that the selected model performs well on unseen data.

2. Comparing Different Models:

Fair Comparison: To fairly compare different machine learning models, each model should be tuned to its optimal hyperparameter settings. Comparing models with their default or arbitrarily chosen hyperparameters might lead to incorrect conclusions about their relative performance.
Identifying the Best Potential: Hyperparameter tuning reveals the true potential of each model. A model that initially appears to perform worse than another might outperform it after proper tuning of its hyperparameters.

3. Avoiding Overfitting and Underfitting:

Regularization Strength: Many models have hyperparameters that control the strength of regularization techniques (e.g., alpha in Ridge/Lasso regression, dropout_rate in neural networks). Tuning these hyperparameters is critical for preventing overfitting by penalizing model complexity.
Early Stopping: For iterative training algorithms like gradient boosting or neural networks, early stopping is a hyperparameter that determines when to stop training based on the performance on a validation set. This helps in preventing overfitting by stopping the training before the model starts to learn noise in the training data.

Common Hyperparameter Tuning Techniques:

Grid Search: Exhaustively searches through a predefined subset of the hyperparameter space. It evaluates the model’s performance for every possible combination of hyperparameter values. While thorough, it can be computationally expensive, especially with a large number of hyperparameters and a wide range of values.
Random Search: Randomly samples hyperparameter combinations from a defined search space and evaluates the model’s performance for each sampled combination. It’s often more efficient than grid search, especially when only some hyperparameters significantly affect performance.
Bayesian Optimization: Uses probabilistic models to guide the search for the optimal hyperparameters. It iteratively evaluates hyperparameter configurations and updates the probability model to suggest promising new configurations to explore. This method is often more efficient than grid or random search, especially for expensive-to-evaluate models.
Sequential Model-Based Optimization (SMBO): A framework that includes Bayesian Optimization and other techniques that sequentially build models to approximate the performance of different hyperparameter settings.
Population-Based Training (PBT): Trains a population of models in parallel, periodically exploiting the better performing models to improve the hyperparameters of the rest.
Hyperband: A more efficient approach to random search that adaptively allocates resources to promising configurations and aggressively eliminates poorly performing ones.

In summary, hyperparameter tuning is an integral part of the model selection process. It ensures that each candidate model is evaluated at its peak performance, allowing for a more accurate and meaningful comparison. By finding the optimal hyperparameters, we can build models that generalize well to unseen data and effectively solve the intended machine learning problem.

examples for how hyperparameter tuning is done in different ML models

Alright, let’s dive into some examples of how hyperparameter tuning is done for different machine learning models. We’ll touch upon common hyperparameters and how you might go about tuning them.

1. Support Vector Machines (SVM) with a Radial Basis Function (RBF) Kernel:

Key Hyperparameters:
- C (Regularization parameter): Controls the trade-off between achieving a low training error and a low testing error.¹ Smaller values of C emphasize a smooth decision surface (higher bias, lower variance), while larger values aim to classify all training examples correctly (lower bias, higher variance).²
- gamma (Kernel coefficient): Defines the influence of a single training example.³ Low values mean a larger radius of influence, and points far away can still have an effect.⁴ High values mean a smaller radius of influence, and the model focuses more on points close to the decision boundary.⁵
Hyperparameter Tuning Process:You might use Grid Search or Random Search with Cross-Validation.
- Grid Search Example (Conceptual): Pythonfrom sklearn.svm import SVC from sklearn.model_selection import GridSearchCV, StratifiedKFold from sklearn.datasets import make_classification X, y = make_classification(n_samples=100, n_features=10, random_state=42) param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [0.01, 0.1, 1, 'scale', 'auto']} cv = StratifiedKFold(n_splits=5) grid_search = GridSearchCV(SVC(), param_grid, cv=cv, scoring='accuracy') grid_search.fit(X, y) print("Best parameters:", grid_search.best_params_) print("Best accuracy:", grid_search.best_score_) In this example, GridSearchCV would train and evaluate an SVM model for every combination of C and gamma values specified in param_grid using 5-fold stratified cross-validation. The best combination based on accuracy would be reported.
- Random Search Example (Conceptual): You would define a range or distribution for C and gamma and randomly sample combinations to evaluate. This can be more efficient than grid search when the hyperparameter space is large and not all hyperparameters are equally important.⁶

2. Decision Trees:

Key Hyperparameters:
- max_depth: The maximum depth of the tree.⁷ Limits the complexity of the tree (lower max_depth leads to higher bias, lower variance).
- min_samples_split: The minimum number of samples required to split an internal node.⁸ Higher⁹ values prevent the tree from creating very specific splits based on small subsets of data (reduces variance).
- min_samples_leaf: The minimum number of samples required to be at a leaf node.¹⁰ Similar to min_samples_split, it¹¹ helps prevent overfitting.
- max_features: The number of features to consider when looking for the best split.¹² Reducing this can introduce more randomness and potentially reduce overfitting.
Hyperparameter Tuning Process:Again, Grid Search or Random Search with Cross-Validation are common.
- Grid Search Example (Conceptual): Pythonfrom sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV, KFold from sklearn.datasets import load_iris iris = load_iris() param_grid = {'max_depth': [None, 5, 10, 15], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 3, 5], 'max_features': ['sqrt', 'log2', None]} cv = KFold(n_splits=5) grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=cv, scoring='accuracy') grid_search.fit(iris.data, iris.target) print("Best parameters:", grid_search.best_params_) print("Best accuracy:", grid_search.best_score_) Here, we’re exploring different depths, minimum samples for splitting and at leaves, and the number of features to consider at each split.

3. Random Forest:

Key Hyperparameters:
- n_estimators: The number of trees in the forest.¹³ Generally, more trees lead to better performance and stability, but with diminishing returns and increased computational cost.¹⁴
- max_depth: The maximum depth of individual trees (similar to decision trees).¹⁵
- min_samples_split: Minimum samples to split an internal node (similar to decision trees).
- min_samples_leaf: Minimum samples at a leaf node (similar to decision trees).
- max_features: The number of features to consider when looking for the best split in each tree.
- bootstrap: Whether to use bootstrap samples when building trees.
Hyperparameter Tuning Process: Due to the larger number of hyperparameters, Random Search is often preferred over Grid Search for Random Forests to explore a wider range of values more efficiently.16 Bayesian Optimization can also be effective.
- Random Search Example (Conceptual): Pythonfrom sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import RandomizedSearchCV, KFold from scipy.stats import randint, uniform iris = load_iris() param_distributions = {'n_estimators': randint(50, 200), 'max_depth': [None, 10, 20, 30, 40, 50], 'min_samples_split': randint(2, 11), 'min_samples_leaf': randint(1, 6), 'max_features': ['sqrt', 'log2', None], 'bootstrap': [True, False]} cv = KFold(n_splits=3) random_search = RandomizedSearchCV(RandomForestClassifier(random_state=42), param_distributions, n_iter=10, cv=cv, scoring='accuracy', random_state=42) random_search.fit(iris.data, iris.target) print("Best parameters:", random_search.best_params_) print("Best accuracy:", random_search.best_score_) Here, RandomizedSearchCV samples 10 different combinations of hyperparameters from the specified distributions.

4. Gradient Boosting Machines (e.g., XGBoost, LightGBM):

Key Hyperparameters:
- n_estimators: The number of boosting rounds (number of trees to build).
- learning_rate (eta in XGBoost): Step size shrinkage to prevent overfitting.¹⁷ Smaller values require more trees.
- max_depth: Maximum depth of individual trees.¹⁸
- min_child_weight: Minimum sum of instance weight needed in a child. Controls overfitting.
- subsample: Fraction of samples to be used for fitting the individual base learners.
- colsample_bytree: Fraction of features to be considered when building each tree.
- Regularization parameters (alpha for L1, lambda for L2).
Hyperparameter Tuning Process:Given the number of interacting hyperparameters, techniques like Random Search and Bayesian Optimization are often more effective than Grid Search. Libraries like Optuna or scikit-optimize are specifically designed for efficient hyperparameter optimization.
- Example using Optuna (Conceptual): Pythonimport optuna from xgboost import XGBClassifier from sklearn.model_selection import StratifiedKFold from sklearn.datasets import make_classification from sklearn.metrics import accuracy_score X, y = make_classification(n_samples=100, n_features=10, random_state=42) cv = StratifiedKFold(n_splits=3)

def objective(trial):

xgb_params = {

‘n_estimators’: trial.suggest_int(‘n_estimators’, 50, 200),

‘learning_rate’: trial.suggest_float(‘learning_rate’, 0.01, 0.3),

‘max_depth’: trial.suggest_int(‘max_depth’,19 3, 10),

‘subsample’: trial.suggest_float(‘subsample’, 0.7, 1.0),

‘colsample_bytree’: trial.suggest_float(‘colsample_bytree’, 0.7, 1.0),

‘random_state’: 42

}

fold_accuracies = []

for train_index, val_index in cv.split(X, y):

X_train, X_val = X[train_index], X[val_index]20

y_train, y_val = y[train_index], y[val_index]

model = XGBClassifier(**xgb_params)

model.fit(X_train, y_train)

preds = model.predict(X_val)

fold_accuracies.append(accuracy_score(y_val, preds))

return sum(fold_accuracies) / len(fold_accuracies)

    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=50)
    print("Best parameters:", study.best_params)
    print("Best accuracy:", study.best_value)
    ```
    Optuna systematically explores the hyperparameter space based on the results of previous trials to find the optimal configuration.

5. Artificial Neural Networks (Deep Learning):

Key Hyperparameters:
- Number of hidden layers.
- Number of neurons per layer.
- Activation functions (e.g., ReLU, sigmoid, tanh).
- Learning rate for the optimizer (e.g., Adam, SGD).
- Batch size during training.
- Dropout rate for regularization.
- Weight initialization methods.
- Optimizer choice.
Hyperparameter Tuning Process:Tuning neural networks can be computationally expensive. Techniques like Random Search, Bayesian Optimization, and more advanced methods like Population-Based Training (PBT) and Neural Architecture Search (NAS) are used. Frameworks like Keras Tuner, Hyperopt, and Optuna provide tools for this.
- Example using Keras Tuner (Conceptual):

import keras_tuner as kt

import tensorflow as tf

    def build_model(hp):
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
        for i in range(hp.Int('num_layers', 2, 5)):
            model.add(tf.keras.layers.Dense(units=hp.Int(f'units_{i}', 32, 128, step=32),
                                           activation='relu'))
        model.add(tf.keras.layers.Dense(10, activation='softmax'))
        optimizer = hp.Choice('optimizer', values=['adam', 'sgd', 'rmsprop'])
        model.compile(optimizer=optimizer,
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])
        return model

    tuner = kt.RandomSearch(
        build_model,
        objective='val_accuracy',
        max_trials=10,
        executions_per_trial=3,
        directory='my_dir',
        project_name='mnist_tuning')

    (img_train, label_train), (img_test, label_test) = tf.keras.datasets.mnist.load_data()
    img_train = img_train.astype('float32') / 255.0
    img_test = img_test.astype('float32') / 255.0

    tuner.search(img_train, label_train, epochs=5, validation_data=(img_test, label_test))
    best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
    print(f"Best hyperparameters found: {best_hps}")
    ```
    Keras Tuner allows you to define a search space for the network architecture and training parameters, and it systematically explores this space to find the best configuration.

These examples illustrate how hyperparameter tuning is applied across different machine learning models. The specific hyperparameters and the tuning strategies employed depend on the model’s complexity, the size of the dataset, and the available computational resources. Remember that proper cross-validation is crucial during hyperparameter tuning to get a reliable estimate of the model’s generalization performance.

Category:

Selection & Trade-offs