0. Classification of Alzheimer’s disease diagnosis¶

The goal of this lab session is to train a network that will perform a binary classification between control participants and patients that are affected by Alzheimer’s disease. The input of the network is a neuroimaging modality: the T1 weighted MRI. In this project we use the Pytorch library.

import torch
import numpy as np
import pandas as pd
from torch import nn
from time import time
from os import path
from torchvision import transforms
import random
from copy import deepcopy

Database¶

In this session we use the images from a public research project: OASIS-1. Two labels exist in this dataset:

CN (Cognitively Normal) for healthy participants.
AD (Alzheimer’s Disease) for patients affected by Alzheimer’s disease.

The original images were preprocessed using Clinica: a software platform for clinical neuroimaging studies. Preprocessed images and other files are distributed in a tarball, run the following commands to download and extract them.

! wget --no-check-certificate --show-progress https://aramislab.paris.inria.fr/files/data/databases/DL4MI/OASIS-1-dataset_pt_new.tar.gz

--2022-01-20 18:25:26--  https://aramislab.paris.inria.fr/files/data/databases/DL4MI/OASIS-1-dataset_pt_new.tar.gz
Résolution de aramislab.paris.inria.fr (aramislab.paris.inria.fr)… 128.93.101.229
Connexion à aramislab.paris.inria.fr (aramislab.paris.inria.fr)|128.93.101.229|:443… connecté.
Avertissement : impossible de vérifier l’attribut aramislab.paris.inria.fr du certificat, émis par «CN=TERENA SSL CA 3,O=TERENA,L=Amsterdam,ST=Noord-Holland,C=NL» :
  Impossible de vérifier localement l’autorité de l’émetteur.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 1387416064 (1,3G) [application/octet-stream]
Enregistre : «OASIS-1-dataset_pt_new.tar.gz»


          OASIS-1-d   0%[                    ]       0  --.-KB/s               

         OASIS-1-da   1%[                    ]  21,29M   102MB/s

        OASIS-1-dat   3%[                    ]  44,66M   109MB/s

       OASIS-1-data   5%[>                   ]  67,02M   110MB/s

      OASIS-1-datas   6%[>                   ]  88,68M   110MB/s

     OASIS-1-datase   8%[>                   ] 111,01M   110MB/s

    OASIS-1-dataset  10%[=>                  ] 133,37M   110MB/s

   OASIS-1-dataset_  11%[=>                  ] 151,55M   108MB/s

  OASIS-1-dataset_p  12%[=>                  ] 169,51M   105MB/s

 OASIS-1-dataset_pt  14%[=>                  ] 187,38M   104MB/s

OASIS-1-dataset_pt_  15%[==>                 ] 204,07M   102MB/s

ASIS-1-dataset_pt_n  16%[==>                 ] 220,66M  99,8MB/s

SIS-1-dataset_pt_ne  17%[==>                 ] 226,04M  80,6MB/s

IS-1-dataset_pt_new  18%[==>                 ] 242,96M  80,8MB/s    tps 13s

S-1-dataset_pt_new.  19%[==>                 ] 259,84M  81,1MB/s    tps 13s

-1-dataset_pt_new.t  20%[===>                ] 276,67M  81,3MB/s    tps 13s

1-dataset_pt_new.ta  22%[===>                ] 293,48M  80,4MB/s    tps 13s

-dataset_pt_new.tar  23%[===>                ] 310,37M  79,3MB/s    tps 13s

dataset_pt_new.tar.  24%[===>                ] 325,92M  77,8MB/s    tps 12s

ataset_pt_new.tar.g  26%[====>               ] 344,58M  75,8MB/s    tps 12s

taset_pt_new.tar.gz  27%[====>               ] 360,82M  74,7MB/s    tps 12s

aset_pt_new.tar.gz   28%[====>               ] 377,30M  73,6MB/s    tps 12s

set_pt_new.tar.gz    29%[====>               ] 394,63M  71,9MB/s    tps 12s

et_pt_new.tar.gz     31%[=====>              ] 411,88M  71,8MB/s    tps 11s

t_pt_new.tar.gz      32%[=====>              ] 429,30M  72,0MB/s    tps 11s

_pt_new.tar.gz       33%[=====>              ] 445,80M  71,2MB/s    tps 11s

pt_new.tar.gz        35%[======>             ] 463,79M  71,9MB/s    tps 11s

t_new.tar.gz         36%[======>             ] 480,77M  72,5MB/s    tps 11s

_new.tar.gz          37%[======>             ] 497,13M  84,6MB/s    tps 10s

new.tar.gz           38%[======>             ] 513,43M  84,2MB/s    tps 10s

ew.tar.gz            40%[=======>            ] 529,69M  84,2MB/s    tps 10s

w.tar.gz             41%[=======>            ] 546,10M  84,2MB/s    tps 10s

.tar.gz              42%[=======>            ] 562,77M  84,1MB/s    tps 10s

tar.gz               43%[=======>            ] 579,43M  84,4MB/s    tps 9s

ar.gz                44%[=======>            ] 595,15M  83,6MB/s    tps 9s

r.gz                 46%[========>           ] 611,57M  83,5MB/s    tps 9s

.gz                  47%[========>           ] 631,33M  84,6MB/s    tps 9s

gz                   49%[========>           ] 650,57M  85,4MB/s    tps 9s

z                    50%[=========>          ] 667,07M  85,0MB/s    tps 8s

                     51%[=========>          ] 685,30M  85,2MB/s    tps 8s

                  O  53%[=========>          ] 703,44M  85,7MB/s    tps 8s

                 OA  54%[=========>          ] 721,88M  85,9MB/s    tps 8s

                OAS  55%[==========>         ] 740,10M  86,3MB/s    tps 8s

               OASI  57%[==========>         ] 758,43M  86,8MB/s    tps 7s

              OASIS  58%[==========>         ] 776,96M  87,5MB/s    tps 7s

             OASIS-  59%[==========>         ] 793,52M  87,8MB/s    tps 7s

            OASIS-1  61%[===========>        ] 810,18M  87,6MB/s    tps 7s

           OASIS-1-  62%[===========>        ] 827,22M  88,0MB/s    tps 7s

          OASIS-1-d  63%[===========>        ] 845,51M  88,6MB/s    tps 6s

         OASIS-1-da  65%[============>       ] 864,51M  89,4MB/s    tps 6s

        OASIS-1-dat  66%[============>       ] 881,94M  90,0MB/s    tps 6s

       OASIS-1-data  67%[============>       ] 898,32M  88,9MB/s    tps 6s

      OASIS-1-datas  69%[============>       ] 914,74M  88,1MB/s    tps 6s

     OASIS-1-datase  70%[=============>      ] 932,19M  88,1MB/s    tps 5s

    OASIS-1-dataset  71%[=============>      ] 950,02M  88,1MB/s    tps 5s

   OASIS-1-dataset_  73%[=============>      ] 967,51M  88,0MB/s    tps 5s

  OASIS-1-dataset_p  74%[=============>      ] 984,99M  87,8MB/s    tps 5s

 OASIS-1-dataset_pt  75%[==============>     ]   1002M  87,3MB/s    tps 5s

OASIS-1-dataset_pt_  76%[==============>     ]   1019M  86,8MB/s    tps 4s

ASIS-1-dataset_pt_n  78%[==============>     ]   1,01G  86,2MB/s    tps 4s

SIS-1-dataset_pt_ne  79%[==============>     ]   1,03G  86,3MB/s    tps 4s

IS-1-dataset_pt_new  80%[===============>    ]   1,04G  86,4MB/s    tps 4s

S-1-dataset_pt_new.  82%[===============>    ]   1,06G  86,5MB/s    tps 4s

-1-dataset_pt_new.t  83%[===============>    ]   1,08G  86,3MB/s    tps 3s

1-dataset_pt_new.ta  84%[===============>    ]   1,09G  85,6MB/s    tps 3s

-dataset_pt_new.tar  85%[================>   ]   1,11G  85,1MB/s    tps 3s

dataset_pt_new.tar.  87%[================>   ]   1,13G  85,7MB/s    tps 3s

ataset_pt_new.tar.g  88%[================>   ]   1,15G  86,9MB/s    tps 3s

taset_pt_new.tar.gz  90%[=================>  ]   1,17G  87,3MB/s    tps 2s

aset_pt_new.tar.gz   91%[=================>  ]   1,19G  88,0MB/s    tps 2s

set_pt_new.tar.gz    93%[=================>  ]   1,20G  88,5MB/s    tps 2s

et_pt_new.tar.gz     94%[=================>  ]   1,22G  89,3MB/s    tps 2s

t_pt_new.tar.gz      96%[==================> ]   1,24G  90,1MB/s    tps 2s

_pt_new.tar.gz       97%[==================> ]   1,26G  90,5MB/s    tps 0s

pt_new.tar.gz        98%[==================> ]   1,28G  91,0MB/s    tps 0s

OASIS-1-dataset_pt_ 100%[===================>]   1,29G  91,4MB/s    ds 15s     

2022-01-20 18:25:41 (86,1 MB/s) - «OASIS-1-dataset_pt_new.tar.gz» enregistré [1387416064/1387416064]

! tar xf OASIS-1-dataset_pt_new.tar.gz -C ./

One crucial step before training a neural network is to check the dataset. Are the classes balanced? Are there biases in the dataset that may differentiate the labels?

Here we will focus on the demographics (age, sex and level of education) and two cognitive scores:

The MMS (Mini Mental State), rated between 0 (no correct answer) to 30 (healthy subject).
The CDR (Clinical Dementia Rating), that is null if the participant is non-demented and of 0.5, 1, 2 and 3 for very mild, mild, moderate and severe dementia, respectively.

Let’s explore the data:

# Load the complete dataset
OASIS_df = pd.read_csv(
    'OASIS-1_dataset/tsv_files/lab_1/OASIS_BIDS.tsv', sep='\t',
    usecols=['participant_id', 'session_id', 'alternative_id_1', 'sex',
             'education_level', 'age_bl', 'diagnosis_bl', 'laterality', 'MMS',
             'cdr_global', 'diagnosis']
)
# Show first items of the table
print(OASIS_df.head())
# First visual inspection
_ = OASIS_df.hist(figsize=(16, 8))

   participant_id session_id alternative_id_1 sex  education_level  age_bl  \
sub-OASIS10001    ses-M00    OAS1_0001_MR1   F              2.0      74   
sub-OASIS10002    ses-M00    OAS1_0002_MR1   F              4.0      55   
sub-OASIS10003    ses-M00    OAS1_0003_MR1   F              4.0      73   
sub-OASIS10004    ses-M00    OAS1_0004_MR1   M              NaN      28   
sub-OASIS10005    ses-M00    OAS1_0005_MR1   M              NaN      18   

  diagnosis_bl laterality   MMS  cdr_global diagnosis  
         CN          R  29.0         0.0        CN  
         CN          R  29.0         0.0        CN  
         AD          R  27.0         0.5        AD  
         CN          R  30.0         NaN        CN  
         CN          R  30.0         NaN        CN  

From these graphics, it’s possible to have an overview of the distribution of the data, for the numerical values. For example, the educational level is well distributed among the participants of the study. Also, most of the subjects are young (around 20 years old) and healthy (MMS score equals 30 and null CDR score).

The next cell will create (and run) a function (characteristics_table) that highlights the main features of the population in the dataset. We will use it later.

# Study the characteristics of the AD & CN populations (age, sex, MMS, cdr_global)
def characteristics_table(df, merged_df):
    """Creates a DataFrame that summarizes the characteristics of the DataFrame df"""
    diagnoses = np.unique(df.diagnosis.values)
    population_df = pd.DataFrame(index=diagnoses,
                                columns=['N', 'age', '%sexF', 'education',
                                         'MMS', 'CDR=0', 'CDR=0.5', 'CDR=1', 'CDR=2'])
    merged_df = merged_df.set_index(['participant_id', 'session_id'], drop=True)
    df = df.set_index(['participant_id', 'session_id'], drop=True)
    sub_merged_df = merged_df.loc[df.index]

    for diagnosis in population_df.index.values:
        diagnosis_df = sub_merged_df[df.diagnosis == diagnosis]
        population_df.loc[diagnosis, 'N'] = len(diagnosis_df)
        # Age
        mean_age = np.mean(diagnosis_df.age_bl)
        std_age = np.std(diagnosis_df.age_bl)
        population_df.loc[diagnosis, 'age'] = '%.1f ± %.1f' % (mean_age, std_age)
        # Sex
        population_df.loc[diagnosis, '%sexF'] = round((len(diagnosis_df[diagnosis_df.sex == 'F']) / len(diagnosis_df)) * 100, 1)
        # Education level
        mean_education_level = np.nanmean(diagnosis_df.education_level)
        std_education_level = np.nanstd(diagnosis_df.education_level)
        population_df.loc[diagnosis, 'education'] = '%.1f ± %.1f' % (mean_education_level, std_education_level)
        # MMS
        mean_MMS = np.mean(diagnosis_df.MMS)
        std_MMS = np.std(diagnosis_df.MMS)
        population_df.loc[diagnosis, 'MMS'] = '%.1f ± %.1f' % (mean_MMS, std_MMS)
        # CDR
        for value in ['0', '0.5', '1', '2']:
          population_df.loc[diagnosis, 'CDR=%s' % value] = len(diagnosis_df[diagnosis_df.cdr_global == float(value)])

    return population_df

population_df = characteristics_table(OASIS_df, OASIS_df)
population_df

	N	age	%sexF	education	MMS	CDR=0	CDR=0.5	CDR=1	CDR=2
AD	73	77.5 ± 7.4	63.0	2.7 ± 1.3	22.7 ± 3.6	0	45	26	2
CN	304	44.0 ± 23.3	62.2	3.5 ± 1.2	29.7 ± 0.6	124	0	0	0

Preprocessing¶

Theoretically, the main advantage of deep learning methods is to be able to work without extensive data preprocessing. However, as we have only a few images to train the network in this lab session, the preprocessing here is very extensive. More specifically, the images encountered:

Non-linear registration.
Segmentation of grey matter.
Conversion to tensor format (.pt).

As mentioned above, to obtain the preprocessed images, we used some pipelines provided by Clinica and ClinicaDL in order to:

Convert the original dataset to BIDS format (clinica convert oasis-2-bids).
Get the non-linear registration and segmentation of grey mater (pipeline t1-volume).
Obtain the preprocessed images in tensor format (tensor extraction using ClinicaDL, clinicadl extract).

The preprocessed images are store in the CAPS folder structure and all have the same size (121x145x121). You will find below a class called MRIDataset which allows easy browsing in the database.

from torch.utils.data import Dataset, DataLoader, sampler
from os import path

class MRIDataset(Dataset):

    def __init__(self, img_dir, data_df, transform=None):
        """
        Args:
            img_dir (str): path to the CAPS directory containing preprocessed images
            data_df (DataFrame): metadata of the population.
                Columns include participant_id, session_id and diagnosis).
            transform (callable): list of transforms applied on-the-fly, chained with torchvision.transforms.Compose.
        """
        self.img_dir = img_dir
        self.transform = transform
        self.data_df = data_df
        self.label_code = {"AD": 1, "CN": 0}

        self.size = self[0]['image'].shape

    def __len__(self):
        return len(self.data_df)

    def __getitem__(self, idx):

        diagnosis = self.data_df.loc[idx, 'diagnosis']
        label = self.label_code[diagnosis]

        participant_id = self.data_df.loc[idx, 'participant_id']
        session_id = self.data_df.loc[idx, 'session_id']
        filename = 'subjects/' + participant_id + '/' + session_id + '/' + \
          'deeplearning_prepare_data/image_based/custom/' + \
          participant_id + '_' + session_id + \
          '_T1w_segm-graymatter_space-Ixi549Space_modulated-off_probability.pt'

        image = torch.load(path.join(self.img_dir, filename))

        if self.transform:
            image = self.transform(image)

        sample = {'image': image, 'label': label,
                  'participant_id': participant_id,
                  'session_id': session_id}
        return sample

    def train(self):
        self.transform.train()

    def eval(self):
        self.transform.eval()

To facilitate the training and avoid overfitting due to the limited amount of data, the network won’t use the full image but only a part of the image (size 30x40x30) centered on a specific neuroanatomical region: the hippocampus (HC). This structure is known to be linked to memory, and is atrophied in the majority of cases of Alzheimer’s disease patients.

To improve the training and reduce overfitting, a random shift was added to the cropping function. This means that the bounding box around the hippocampus may be shifted by a limited amount of voxels in each of the three directions.

class CropLeftHC(object):
    """Crops the left hippocampus of a MRI non-linearly registered to MNI"""
    def __init__(self, random_shift=0):
        self.random_shift = random_shift
        self.train_mode = True
    def __call__(self, img):
        if self.train_mode:
            x = random.randint(-self.random_shift, self.random_shift)
            y = random.randint(-self.random_shift, self.random_shift)
            z = random.randint(-self.random_shift, self.random_shift)
        else:
            x, y, z = 0, 0, 0
        return img[:, 25 + x:55 + x,
                   50 + y:90 + y,
                   27 + z:57 + z].clone()

    def train(self):
        self.train_mode = True

    def eval(self):
        self.train_mode = False

class CropRightHC(object):
    """Crops the right hippocampus of a MRI non-linearly registered to MNI"""
    def __init__(self, random_shift=0):
        self.random_shift = random_shift
        self.train_mode = True
    def __call__(self, img):
        if self.train_mode:
            x = random.randint(-self.random_shift, self.random_shift)
            y = random.randint(-self.random_shift, self.random_shift)
            z = random.randint(-self.random_shift, self.random_shift)
        else:
            x, y, z = 0, 0, 0
        return img[:, 65 + x:95 + x,
                   50 + y:90 + y,
                   27 + z:57 + z].clone()

    def train(self):
        self.train_mode = True

    def eval(self):
        self.train_mode = False

Visualization¶

Here we visualize the raw, preprocessed and cropped data.

import matplotlib.pyplot as plt
import nibabel as nib
from scipy.ndimage import rotate

subject = 'sub-OASIS10003'
preprocessed_pt = torch.load(f'OASIS-1_dataset/CAPS/subjects/{subject}/ses-M00/' +
                    f'deeplearning_prepare_data/image_based/custom/{subject}_ses-M00_' +
                    'T1w_segm-graymatter_space-Ixi549Space_modulated-off_' +
                    'probability.pt')
raw_nii = nib.load(f'OASIS-1_dataset/raw/{subject}_ses-M00_T1w.nii.gz')

raw_np = raw_nii.get_fdata()

def show_slices(slices):
    """ Function to display a row of image slices """
    fig, axes = plt.subplots(1, len(slices))
    for i, slice in enumerate(slices):
        axes[i].imshow(slice.T, cmap="gray", origin="lower")

slice_0 = raw_np[:, :, 78]
slice_1 = raw_np[122, :, :]
slice_2 = raw_np[:, 173, :]
show_slices([slice_0, rotate(slice_1, 90), rotate(slice_2, 90)])
plt.suptitle(f'Slices of raw image of subject {subject}')
plt.show()

slice_0 = preprocessed_pt[0, 60, :, :]
slice_1 = preprocessed_pt[0, :, 72, :]
slice_2 = preprocessed_pt[0, :, :, 60]
show_slices([slice_0, slice_1, slice_2])
plt.suptitle(f'Center slices of preprocessed image of subject {subject}')
plt.show()

leftHC_pt = CropLeftHC()(preprocessed_pt)
slice_0 = leftHC_pt[0, 15, :, :]
slice_1 = leftHC_pt[0, :, 20, :]
slice_2 = leftHC_pt[0, :, :, 15]
show_slices([slice_0, slice_1, slice_2])
plt.suptitle(f'Center slices of left HC of subject {subject}')
plt.show()

Deep Learning for Medical Imaging (DL4MI 2022)

0. Classification of Alzheimer’s disease diagnosis

Contenu

0. Classification of Alzheimer’s disease diagnosis¶

Database¶

Preprocessing¶

Visualization¶

1. Cross-validation¶

2. Model¶

Reminder on CNN layers¶

Feature maps¶

Convolutions (`nn.Conv3d`)¶

Batch Normalization (`nn.BatchNorm3d`)¶

Activation function (`nn.LeakyReLU`)¶

Pooling function (`PadMaxPool3d`)¶

Dropout (`nn.Dropout`)¶

Fully-Connected Layers (`nn.Linear`)¶

TODO Network design¶

3. Train & Test¶

Train Classification with Left HC¶

Train Classification with Right HC¶

Soft voting¶

4. Clustering on AD & CN populations¶

Model¶

Train Autoencoder¶

Visualization¶

Clustering¶

Deep Learning for Medical Imaging (DL4MI 2022)

0. Classification of Alzheimer’s disease diagnosis

Contenu

0. Classification of Alzheimer’s disease diagnosis¶

Database¶

Preprocessing¶

Visualization¶

1. Cross-validation¶

2. Model¶

Reminder on CNN layers¶

Feature maps¶

Convolutions (nn.Conv3d)¶

Batch Normalization (nn.BatchNorm3d)¶

Activation function (nn.LeakyReLU)¶

Pooling function (PadMaxPool3d)¶

Dropout (nn.Dropout)¶

Fully-Connected Layers (nn.Linear)¶

TODO Network design¶

3. Train & Test¶

Train Classification with Left HC¶

Train Classification with Right HC¶

Soft voting¶

4. Clustering on AD & CN populations¶

Model¶

Train Autoencoder¶

Visualization¶

Clustering¶

Convolutions (`nn.Conv3d`)¶

Batch Normalization (`nn.BatchNorm3d`)¶

Activation function (`nn.LeakyReLU`)¶

Pooling function (`PadMaxPool3d`)¶

Dropout (`nn.Dropout`)¶

Fully-Connected Layers (`nn.Linear`)¶