# Uncomment this cell if running in Google Colab
# !pip install clinicadl==0.2.1

Train your own models

This section explains how to train a CNN on OASIS data that were processed in the previous sections.

Warning

If you do not have access to a GPU, training the CNN may require too much time. However, you can execute this notebook on Colab to run it on a GPU.

If you already know the models implemented in clinicadl, you can directly jump to the last section to implement your own custom experiment!

import torch

# Check if a GPU is available
print('GPU is available', torch.cuda.is_available())
Data used for training:

Because they are time-costly, the preprocessing steps presented in the beginning of this tutorial were only executed on a subset of OASIS-1, but obviously two participants are insufficient to train a network! To obtain more meaningful results, you should retrieve the whole OASIS-1 dataset and run the training based on the labels and splits performed in the previous section.

Of course, you can use another dataset, but then you will have to perform again the extraction of labels and data splits on this dataset.

Introduction

Input types

In this notebook the goal is to train a classifier to differentiate the AD label from the CN label based on the associated T1-MR images. These images were preprocessed according to the section on preprocessing.

According to the literature review done in (Wen et al., 2020), 4 main paradigms were implemented to train a CNN to perform a classification task between AD and CN labels based on T1-MRI. These paradigms depend of the input given to the network:

  • 2D slices,

  • 3D patches,

  • ROIs (3D patches focused on a region of interest),

  • 3D images.

Architectures

There is no consensus in the literature on the architecture of the models that should be used for each input category. Part of the studies reviewed in (Wen et al., 2020) used custom models, and the others reused architectures that led to good results on natural images (VGGNet, ResNet, GoogleNet, DenseNet…).

We chose to find custom architectures by running a search on the main architecture components (more details can be found in section 4.3 of (Wen et al., 2020)). This architecture search was run only on the train + validation sets, and no test sets were used to choose the best architectures. This is crucial, as the use of a test set during architecture search can lead to biased, over-optimistic, results.

Training networks with clinicadl

The training tasks can be performed using clinicadl:

clinicadl train <input_type> <network_type> <caps_directory> \
                <preprocessing> <tsv_path> <output_directory> <architecture>

where:

  • input_type is the type of input used. Must be chosen between image, patch, roi and slice.

  • network_type is the type of network trained. The possibilities depend on the input_type, but the complete list of options is autoencoder, cnn or multicnn.

  • caps_directory is the input folder containing the results of the t1-linear pipeline.

  • preprocessing corresponds to the preprocessing pipeline whose outputs will be used for training. Current version only supports t1-linear. t1-extensive will be implemented in next versions of clinicadl.

  • tsv_path is the input folder of a tsv file tree generated by clinicadl tsvtool {split|kfold}.

  • output_directory is the folder where the results are stored.

  • architecture is the name of the architecture used (for example, Conv5_FC3).

Computational resources:

Depending on your computational resources, you should adapt the parameters in the Computational resources argument group:

  • -cpu forces the system to use CPU instead of looking for a GPU.
  • --nproc N sets the number of workers (parallel processes) in the DataLoader to N.
  • --batch_size B will set the batch size to B. If the batch size is too high, it can raise memory errors.

Tip

You can increase the verbosity of the command by adding -v flag(s).

2D slice-level CNN

An advantage of the 2D slice-level approach is that existing CNNs which had huge success for natural image classification, e.g. ResNet (He et al., 2016) and VGGNet (Simonyan and Zisserman, 2014) , can be easily borrowed and used in a transfer learning fashion. Other advantages are the increased number of training samples as many slices can be extracted from a single 3D image, and a lower memory usage compared to using the full MR image as input.This paradigm can be divided into two different frameworks:

  • single-CNN: one CNN is trained on all slice locations.

  • multi-CNN: one CNN is trained per slice location. For multi-CNN the sample size is smaller (equivalent to image level framework), however the CNNs may be more accurate as they are specialized for one slice location.

The 2D slice-level CNN in clinicadl is a slight modification of the ResNet-18 network used to train natural images on the ImageNet dataset:

  • One fully-connected layer was added at the end of the network to reduce the dimension of the output from 1000 to 2 classes (purple dotted box).

  • The last five convolutional layers and the last FC of ResNet are fine-tuned (green dotted box).

  • All other layers have their weights and biases fixed.

During training, the gradients update are done based on the loss computed at the slice level. Final performance metric are computed at the subject level by combining the outputs of the slices of the same subject.

Transfer learning:

The layers of this network (except the last one) are initialized with the weights of the ResNet-18 trained on ImageNet.

It is also possible to build autoencoders (see 3D patch-level & Roi based models section) but it is not possible on the only model provided in ClinicaDL for slices: ResNet-18

# 2D-slice single-CNN training
!clinicadl train slice cnn -h
#!clinicadl train slice cnn <caps_directory> "t1-linear" data/labels_lists/train results/slice_cnn resnet18 --n_splits 5
# 2D-slice multi-CNN training
!clinicadl train slice cnn -h
#!clinicadl train slice multicnn <caps_directory> "t1-linear" data/labels_lists/train results/slice_cnn resnet18 --n_splits 5
ll data

3D patch-level & ROI-based models

The 3D patch-level models compensate the absence of 3D information in the 2D slice-level approach and keep some of its advantages (low memory usage and larger sample size).

ROI-based models are similar to 3D-patch but take advantage of prior knowledge on Alzheimer’s disease. Indeed most of the patches are not informative as they contain parts of the brain that are not affected by the disease. Methods based on regions of interest (ROI) overcome this issue by focusing on regions which are known to be informative: here the hippocampi. In this way, the complexity of the framework can be decreased as fewer inputs are used to train the networks.

These two paradigms can be divided into two different frameworks:

  • single-CNN: one CNN is trained on all patch locations / all regions.

  • multi-CNN: one CNN is trained per patch location / per region. For multi-CNN the sample size is smaller (equivalent to image level framework), however the CNNs may be more accurate as they are specialized for one patch location / for one region.

The 3D patch-level and ROI-based CNNs in clinicadl have the same architecture, including:

  • 4 convolutional layers with kernel 3x3x3,

  • 4 max pooling layers with stride and kernel of 2 and a padding value that automatically adapts to the input feature map size.

  • 3 fully-connected layers.

As for the 2D slice-level model, the gradient updates are done based on the loss computed at the patch level. Final performance metrics are computed at the subject level by combining the outputs of the patches or the two hippocampi of the same subject.

Transfer learning:

It is possible for these categories to train an autoencoder derived from the CNN architecture. The encoder will share the same architecture as the CNN until the fully-connected layers (see the bakground section for more details on autoencoders construction).

Then the weights of the encoder will be transferred to the convolutions of the CNN to initialize it before its training. This procedure is called autoencoder pretraining.

It is also possible to transfer weights between two CNNs with the same architecture.

For 3D-patch multi-CNNs specifically, it is possible to initialize each CNN of a multi-CNN:

  • with the weights of a single-CNN,
  • with the weights of the corresponding CNN of a multi-CNN.

Transferring weights between CNNs can be useful when performing two classification tasks that are similar. This is what has been done in (Wen et al., 2020): the sMCI vs pMCI classification network was initialized with the weights of the AD vs CN classification network.

Warning

Transferring weights between tasks that are not similar enough can hurt the performance!

3D-patch level models

See definition of patches in the background section on neuroimaging.

# 3D-patch autoencoder pretraining
!clinicadl train patch autoencoder -h
#!clinicadl train patch autoencoder <caps_directory> "t1-linear" data/labels_lists/train results/patch_autoencoder Conv4_FC3 --n_splits 5
# 3D-patch single-CNN training
!clinicadl train patch cnn -h

# With autoencoder pretraining
#!clinicadl train patch cnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_single-cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/patch_autoencoder
# Without pretraining
#!clinicadl train patch cnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_single-cnn Conv4_FC3 --n_splits 5
# 3D-patch multi-CNN training
!clinicadl train patch multicnn -h

# With autoencoder pretraining
#!clinicadl train patch multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/patch_autoencoder
# With single-CNN pretraining
#!clinicadl train patch multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/patch_single-cnn
# Without pretraining
#!clinicadl train patch multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5

ROI-based models

ROI inputs correspond to two patches of size 50x50x50 manually centered on each hippocampus. They cannot be pre extracted with clinicadl preprocessing extract-tensor and are computed from the whole MR volume.

Coronal views of ROI inputs

# ROI-based autoencoder pretraining
!clinicadl train roi autoencoder -h
#!clinicadl train roi autoencoder <caps_directory> "t1-linear" data/labels_lists/train results/roi_autoencoder Conv4_FC3 --n_splits 5
# ROI-based CNN training
!clinicadl train roi cnn -h

# With autoencoder pretraining
#!clinicadl train roi cnn <caps_directory> "t1-linear" data/labels_lists/train results/roi_cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/roi_autoencoder
# Without pretraining
#!clinicadl train roi cnn <caps_directory> "t1-linear" data/labels_lists/train results/roi_cnn Conv4_FC3 --n_splits 5
# ROI-based multi-CNN training
!clinicadl train roi multicnn -h

# With autoencoder pretraining
#!clinicadl train roi multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/patch_autoencoder
# With single-CNN pretraining
#!clinicadl train roi multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5 --transfer_learning_path results/patch_single-cnn
# Without pretraining
#!clinicadl train patch multicnn <caps_directory> "t1-linear" data/labels_lists/train results/patch_multi-cnn Conv4_FC3 --n_splits 5

3D-image level model

In this approach, the whole MRI is used at once and the classification is performed at the image level. The advantage is that the spatial information is fully integrated, and it may allow the discovery of new knowledge on the disease. However, it requires more computational resources (especially GPU with higher memory capacity).

The 3D image-level CNN in clinicadl is designed as follows:

  • 5 convolutional layers with kernel 3x3x3,

  • 5 max pooling layers with stride and kernel of 2 and a padding value that automatically adapts to the input feature map size.

  • 3 fully-connected layers.

Depending on the preprocessing, the size of the fully connected layers must be adapted. This is why two models exist in clinicadl:

  • Conv5_FC3 adapted to t1-linear preprocessing,

  • Conv5_FC3_mni adapted to t1-extensive preprocessing (coming soon in next releases of clinicadl).

Transfer learning:

It is possible for this category to train an autoencoder derived from the CNN architecture, or to transfer weights between CNNs. See the section on patches for more details on this topic!

# 3D-image level autoencoder pretraining
!clinicadl train image autoencoder -h
#!clinicadl train image autoencoder <caps_directory> "t1-linear" data/labels_lists/train results/image_autoencoder Conv5_FC3 --n_splits 5
# 3D-image level autoencoder pretraining
!clinicadl train image cnn -h

# With autoencoder pretraining
#!clinicadl train image cnn <caps_directory> "t1-linear" data/labels_lists/train results/image_cnn Conv5_FC3 --n_splits 5 --transfer_learning_path results/image_autoencoder
# Without pretraining
#!clinicadl train image cnn <caps_directory> "t1-linear" data/labels_lists/train results/image_cnn Conv5_FC3 --n_splits 5

Customize your experiment!

You want to train your custom architecture, with a custom input type or preprocessing on other labels? Please fork and clone the github repo to add your changes.

Add a custom model

Write your model class in clinicadl/tools/deep_learning/models and import it in clinicadl/tools/deep_learning/models/__init__.py.

Autoencoder transformation:

Your custom model can be transformed in autoencoder in the same way as predefined models. To make it possible, implement the convolutional part in features and the fully-connected layer in classifier. See predefined models as examples.

Add a custom input type

Input types that are already provided in clinicadl are image, patch, roi and slice. To add a custom input type, please follow the steps detailed below:

  • Choose a mode name for this input type (for example default ones are image, patch, roi and slice).

  • Add your dataset class in clinicadl/tools/deep_learning/data.py as a child class of the abstract class MRIDataset.

  • Create your dataset in return_dataset by adding:

elif mode==<mode_name>:
    return <dataset_class>(
        input_dir,
        data_df,
        preprocessing=preprocessing,
        transformations=transformations,
        <custom_args>
    )
  • Add your custom subparser to train and complete train_func in clinicadl/cli.py.

Add a custom preprocessing

Define the path of your new preprocessing in the _get_path method of MRIDataset in clinicadl/tools/deep_learning/data.py. You will also have to add the name of your preprocessing pipeline in the general command line by modifying the possible choices of the preprocessing argument of train_pos_group in cli.py.

Change the labels

You can launch a classification task with clinicadl using any labels. The input tsv files must include the columns participant_id, session_id and diagnosis. If the column diagnosis does not contain the labels described in this tutorial (AD, CN, MCI, sMCI, pMCI), you can add your own label name associated to a class value in the diagnosis_code of the class MRIDataset in clinicadl/tools/deep_learning/data.py.

Suggestion!

Do not hesitate to ask for help on GitHub or propose a new pull request!