Split Learning in Health Care
Abstract
Samenvatting
Acknowledgements
Contents
1 Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.3 Outline 2
2 M achine Learning in H ealth Care 4
2.1 The promise of machine learning in health care 4
2.2 Scientific fundamentals of machine learning 6
2.3 Main inhibiting factors 10
2.4 Conclusion 12
3 Privacy-Preserving Collaboration 13
3.1 Multi-center research 13
3.2 Secure Multi-Party Computation 13
3.3 Split Learning 14
3.4 Conclusions 18
4 Split Learning Feasibility 19
4.1 Aim 19
4.2 Methods 19
4.3 Results 24
4.4 Discussion 27
4.5 Conclusion 28
5 Split Learning Innovation 30
5.1 Aim 30
5.2 Methods 34
5.3 Results 37
5.4 Discussion 38
5.5 Conclusion 39
6 Conclusions 40
7 Bibliography 41
8 A ppendix 48
8.1 Data set and implementation details 48
8.2 Split Learning Algorithm 51
8.3 Split Learning with Local Adapters Algorithm 52
List of figures
𝑙𝑜𝑔𝐾
List of tables
List of Acronyms
List of Symbols
𝜂 𝛺 𝜏
𝑋, 𝑌
𝐹
𝑓𝑟𝑜𝑛𝑡ℎ𝑓𝑟𝑜𝑛𝑡 ℎ
𝐿
𝑛𝑛
thneural network layer
𝑋
𝑛𝑛
thneural network layer
𝛻
𝑌̂
1 Introduction 1
2 Machine Learning in Health Care 2
Figure 1: Visual examples of model fitting. Overfitted models do not generalize well for new data.
𝑥 𝑦̂ 𝑦
𝑦̂ 𝑦
𝑋
𝑦̂ 𝑦̂ 𝑦
𝑦̂ 𝑦
Figure 2: Simplified graphical representation of a deep neural network with two hidden layers.
Circles represent neurons, vertically aligned in layers. Lines denote inter layer connectivity, with darker lines suggesting varying weights. Deeper layers capture higher semantic content with ex-
amples provided below the graph. Input data is represented left, forward propagation runs left
to right. Objective function is computed right, and backpropagation runs right to left.
Figure 3: Examples of typical supervised learning tasks. a) Staging of diabetic retinopathy from fundus photographs.
74b) Segmentation of anatomy from abdominal computed tomography (CT)
scans.
75c) Determining skeletal age pediatric hand radiographs.
76Figure 4: Examples of typical unsupervised learning tasks. a) Identifying sub-populations of pa- tients with cardiovascular disease who may benefit from different medication
77. b) Positron
emission tomography (PET) image denoising
78.
“It’s not who has the best algorithm that wins.
It’s who has the most data.”
- Andrew Ng
3 Privacy-Preserving Collaboration 3
𝐹 {𝐿
0, 𝐿
1, … 𝐿
𝑁}
𝐹
𝑓𝑟𝑜𝑛𝑡, 𝐹
𝑐𝑒𝑛𝑡𝑒𝑟, 𝐹
𝑏𝑎𝑐𝑘← {𝐿
0→𝑛}, {𝐿
𝑛+1→𝑚}, {𝐿
𝑚+1→𝑁}
𝑛
𝑚
𝐺
𝑋
𝐹
𝑓𝑟𝑜𝑛𝑡𝑋
𝑛𝑋
𝑚𝑌̂ 𝑌
𝐺(𝑌̂, 𝑌)
Figure 5: Diagram of Boomerang Split Learning Three institutions named hospital A, B and C hold their own data and labels to collaboratively train a model without sharing raw data. The
training process iterates over the hospitals of which hospital A is currently training.
4 Split Learning Feasibility 4
Figure 6: Example fundus photograph from the DRC data set used to classify if
diabetic retinopathy is
present.
Figure 7: Example FLAIR MRI from the BraTS data
set used for tumor seg- mentation.
Figure 8: Example Chest X-ray sample from the CheXpert data set from which presence of several of fourteen findings are to
be established.
𝐹 𝐹
𝑓𝑟𝑜𝑛𝑡, 𝐹
𝑐𝑒𝑛𝑡𝑒𝑟𝐹
𝑏𝑎𝑐𝑘Table 1: Summary of implemented medical imaging tasks.
Figure 9: Example of an elbow radiograph from the
MURA data set
log(𝐾) 𝜌 = −0.98)
log(𝐾) log(𝐾)
𝜂 =
𝑁𝑓𝑟𝑜𝑛𝑡𝑁+𝑁𝑏𝑎𝑐𝑘𝛺, τ
𝜙 =
𝑝𝑞𝑁𝐾
+
𝜂2
η
×
τ
2) 𝛺 =
𝑞𝑣𝜏
𝛺 < 1
τ
𝑣
Table 2: Tasks and implementations summaries. Number of parameters N, percentage of param- eters that resides locally η and size of the interface layers q
𝜂
Table 3: Results of number of participating institutions on performance and convergence.
σ ρ σ ρ
± ±
± ±
± ±
±
ρ ρ
ρ
Figure 10: Scatterplot of inference performance log(K)
ρ ρ ρ
Figure 11: Scatterplots of convergence rates over 𝑙𝑜𝑔(𝐾) for each implemented task with linear trendlines.
Figure 12: The performance gain of collaboration. When a constant amount of data is split of a number of participating institutions inference performance drops steeply when not collaborating
while remaining constant when using Split Learning.
Table 4: Results on computational and communicational requirements.
0.5 0.6 0.7 0.8
50%
60%
70%
80%
0 10 20 30 40 50
AUROC
Accuracy
number of institutions K
CheXpert no collaboration DRC no collaboration CheXpert Split Learning DRC Split Learning
5 Split Learning Innovation 5
Figure 13: Example of domain shift: Two semantically similar images from different scanners.
Table 5: Example of features (F) of several patients split horizontally.
This is the case for most multi-center studies.
Table 6: Example of features (F) of several patients split vertically.
This notion of partitioning is less common for medical data.
Figure 14: Diagram of data flow in Split
Learning for vertically partitioned data
σ σ
Figure 15: Example T2 (left) and FLAIR (right) MRI scans presenting domain shift. Visualiza-
tion of glioblastoma in the T2 is based on the same physical properties as the FLAIR but the
images present a domain shift that is hard to correct using conventional preprocessing methods.
Table 7: Inference performance on trivial non-homogeneous data.
Table 8: Inference performance on real non-homogeneous data.
Figure 16: Performance for different weight sharing options.
6 Conclusions 6
7 Bibliography B
گنهرف و هناسر
یاه نیون