Introduction

Digitalization of medical records and integration of genomic methods into clinical practice resulted in an increasing amount of data.1 Machine learning (ML) and deep learning (DL), as subdomains of artificial intelligence (AI), can extract meaningful insights from complex data structures. Applications are steadily increasing, especially in hematologic diagnostics. Modern AI models have the potential to predict diseases based on genetic data,2,3 classify cells and tissue4–7 and establish completely new correlations between disease and possible causes.8,9 In this context, AI technologies, particularly ML and DL, are revolutionizing the field of hematology-oncology, providing innovative solutions for the prediction of diagnosis, risk assessment, screening and patient prognosis estimation, as well as early diagnosis and treatment selection.10

In this review, our attention is centered on the progress made in ML and DL, specifically targeted toward the prediction of diagnosis and early detection of hematologic malignancies. Rapid detection of hematologic cancers is important due to the complexities involved in identifying the early signs of blood cancers.

Advancement in AI-based tools for prediction and screening of hematologic cancers

For decades, the diagnosis of hematologic malignancies has relied almost solely on phenotypic assessment of stained peripheral blood (PB) and bone marrow (BM) samples. Cytomorphology is still crucial in hematologic diagnostics, providing an initial diagnosis and guiding further diagnostic methods such as cytogenetics, immunophenotyping or molecular genetics to substantiate the result.3 But even the most experienced hematologists may miss patterns, anomalies and correlations among the growing number of blood parameters that modern laboratories can measure.11 An immense quantity of clinical information is not fully comprehended by experienced hematologists, which supports the use of pattern recognition for processing and identifying patterns.12 ML algorithms can effortlessly manage hundreds of parameters and identify and use the interactions between these numerous attributes, making this medical field particularly suitable for ML applications.11

Machine learning: A brief look behind the scenes

In order to computationally address a problem, the input (e.g., microscopic images, mutation profiles) is processed into the desired output (e.g., cell type classification, prognostic score) by executing a series of instructions (i.e., algorithm). Different operations are needed, depending on the type of task.3 For most tasks in hematology, rigid programming of all rules and exceptions is too demanding. Defining rules for the automatic classification of blood cells would quickly become burdensome, while for other tasks, such as the interpretation of genetic variants, a lack of knowledge would prevent the definition of a comprehensive set of rules to solve the problem. Here, ML can play an important role by generating predictive models that are trained directly from the original data without the need for explicit instructions.13

ML-based models

ML as a potential tool to perform clinical predictions for hematologic diseases has been increasingly explored in recent years.11,14 The capability of hematologic predictive ML algorithms based on patient symptoms or blood samples has been demonstrated. Gunčar et al. (2018) hypothesized that the unique “signature” of a specific hematologic disease, as reflected in blood test results, could be enough for an ML-based predictive model to recommend a probable diagnosis.11 For the first time, this study showed that ML, specifically using a random forest smart blood analytics (SBA) algorithm trained on extensive sets of hematologic disease laboratory blood values, can predict disease with accuracy comparable to experienced hematology specialists (Figure 1). Moreover, the model even outperformed internal medicine specialists by a significant margin. Considering the first five most likely disease predictions, the classification accuracy of the SBA model reached approximately 87%, which is striking because only a small subset of data was used in the study for predictive model building versus the data typically required for usual diagnosis. The study also found that many laboratory tests were not essential for correct diagnosis, highlighting significant information redundancy and interdependency in current laboratory tests. Thus, ML could help in early hematologic disease diagnoses using fewer laboratory tests.

Figure 1
Figure 1.Comparison of the accuracy of internal medicine specialists with machine learning (ML)-predictive models (SBA-HEM61 and SBA-HEM181).

A, Accuracy of the six hematology specialists compared with both predictive models when considering the five most likely predicted diseases. B, Accuracy of the six hematology specialists compared to both predictive models when considering only the most likely predicted disease. C, Accuracy of the eight non-hematology internal medicine specialists compared with both predictive models when considering the most likely predicted disease. SBA, smart blood analytics. Adapted from Gunčar et al. 2018.11

Pan et al. (2017) also introduced a ML-based model in childhood acute lymphoblastic leukemia (ALL) to predict recurrences and categorize patients into standard, intermediate and high-risk groups using 103 clinical variables and four classification algorithms.14 The study reached acceptable practical application levels, as it demonstrated a high area under the curve (AUC) predictive level of 0.904. In addition, a single-center, retrospective cohort study of 74 Philadelphia chromosome-positive (Ph-positive) ALL patients who received allogeneic hematopoietic stem cell transplantation (allo-HCT) found that ML can predict post-transplant relapses in Ph-positive ALL.15 This pilot study reported by Afanaseva et al. (2023) showed that, in some cases, minor fusion gene BCR::ABL1 transcript fluctuations when under tyrosine kinase inhibitor (TKIs) prophylaxis do not cause a relapse, especially in patients with chronic graft-versus-host disease (GvHD). However, even minor BCR::ABL1 levels demand increased therapy if there’s no chronic GvHD.

Hauser et al. (2021) conducted a study on 1,623 patients to assess whether chronic myelogenous leukemia (CML) could be predicted before diagnosis using complete blood count (CBC) test results and ML algorithms (e.g., XGBoost and LASSO).16 The study considered factors like CBC test results, patients’ age, gender and the frequency of their clinic visits. Findings from this study showed that blood cell counts taken up to five years before the diagnosis of CML could accurately anticipate the BCR::ABL1 test outcome, which implies that an ML-based model using blood cell counts might help detect CML earlier than traditional methods.

Furthermore, Padmanabhan et al. (2023) recently used ML to help understand the complex immune changes in chronic lymphocytic leukemia (CLL) reflected in routine CBC alone.17 The authors reported 97−98% accuracy, concluding that ML-based methods using blood test data could lead to quicker medical responses.

DL-based models

DL is a subset of ML that uses densely interconnected computing neural networks with many layers (deep neural networks) to analyze various factors and aspects of data to learn and make decisions, mimicking the human brain. DL networks are highly flexible regarding the data input and output, as well as the architectural and parameter design, and, hence, are able to fit vast quantities of heterogeneous and unstructured data.3 DL prediction methods have been evaluated in recent years to uncover non-linear multi-omic fingerprints associated with a wide range of clinical and molecular features of tumor samples, including hematologic cancers, associated with patient survival (Figure 2).18 Until today, interpretation of the models is limited as the models are vulnerable to over-fitting if they are not trained on large, representative data sets. The issues of limited transparency, fit and standardized assessment procedures are further reasons for the limited implementation of ML-based models in clinical practice.19–21 Nevertheless, despite its limitations, DL has open the door to efficient analysis of these data and holds promise for progress in clinical hematology and oncology.13,22

Figure 2
Figure 2.Schematic representation of the deep learning prediction method that can be used to uncover non-linear multi-omic patterns associated with a wide range of clinical and molecular features of blood cancers that predict patient survival.

MAUI, multi-omics autoencoder integration. Adapted from Uyar et al. 2021.18

Matek et al. (2021) provided proof-of-concept for the classification problem of single BM cells in their work on differentiation of bone marrow cell morphologies using deep neural networks on large image datasets.22 They applied convolutional neural networks (CNNs) to a large data set of more than 170,000 microscopic cytological images taken from BM smears from 945 patients diagnosed with a broad range of hematologic diseases and trained high-quality classifiers of leukocyte cytomorphology that identify a wide range of diagnostically relevant cell species with high precision and recall. Their CNNs outcompete previous feature-based approaches and represent a reference for future artificial intelligence-based approaches to BM cytomorphology.

Hassouneh et al. (2019) showed that a deep neural network (DNN) prediction model can outperform ML in predicting leukemia patients’ survival chances.23 The final DNN model consisted of six hidden layers with 45 neurons each and a 25% dropout activation, using leukemia patients’ records for modeling. Research conducted by Carreras et al. (2022) used a multilayer perceptron artificial neural network (ANN) to discover new biomarkers for predicting the overall survival of patients with mantle cell lymphoma (MCL).24 The ANN approach was able to identify ten genes, five of which were associated with lower survival rates and the other five linked to improved survival outcomes. In addition, a single-center study evaluated 882 cases (457 hematologic malignancy and 425 hematologic non-malignancy cases) using seven different ML models.25 Among these, the ANN model outperformed the others, achieving the highest accuracy, precision, recall and AUC values (82.8%, 82.8%, 84.9% and 93.5% ± 2.6, respectively). These results suggest that the ANN algorithm could be an effective tool for clinical laboratory screening of hematologic malignancies and underscores the potential of applying ML more broadly in clinical practice.

Sirinukunwattana et al. (2020) used a DL approach to detect and delineate megakaryocyte features using reactive/non-neoplastic bone marrow trephines (BMT).26 DNN was used to predict the locations of these cells in a sample, demonstrating a balance between accuracy, computational complexity and speed, creating bounding boxes and scoring the presence of megakaryocytes within each box. Following detection, image segmentation was necessary to divide the images into different regions and identify the boundaries of the megakaryocytes. Using this method, tissue diagnosis of myeloproliferative neoplasms (MPN) was achievable with remarkable predictive accuracy (with an AUC equal to 0.95), demonstrating the promising ability to differentiate between significant MPN subtypes. This technique visually presented abstracted features of megakaryocytes in relation to the analyzed patient groups, which enhanced the understanding and tracking of samples, surpassing traditional methods. Furthermore, this automated BMT phenotyping method holds great promise as a supplementary tool to regular genetic and molecular testing in confirmed or potential MPN patients.26

The detection of genetic aberrations is crucial for early therapeutic decisions in acute myeloid leukemia (AML) and is recommended for all patients; however, genetic testing is expensive and time-consuming. Cost-effective, fast and broadly accessible tests are needed to predict these aberrations. Eckardt et al. (2022) were among the first to report a multistep DL that could accurately predict the diagnosis of AML and molecular mutations.27 Using only image data from bone marrow smears, it distinguished between AML samples and healthy controls with an area under the receiver operating characteristic (AUROC) of 0.9699 and predicted the mutation status of nucleophosmin 1 (NPM1) with an AUROC of 0.92. Furthermore, they observed so far unreported morphologic cell features such as a pattern of condensed chromatin and perinuclear lightening zones in myeloblasts of NPM1-mutated AML and prominent nucleoli in wild-type NPM1 AML, which enabled the DL model to provide accurate class predictions. Only recently, Kockwelp et al. (2023) demonstrated that a fully automated end-to-end deep learning pipeline can predict genetic aberrations directly from single-cell images from scans of conventionally stained bone marrow smears only on the day of diagnosis.28 They used a pipeline to compile a multiterabyte dataset of >2,000,000 single-cell images from the diagnostic samples of 408 patients with AML. Using these images CNNs for the prediction of various therapy-relevant genetic alterations were trained. The models were able to significantly predict subgroups and hold the potential to be used as a fast and inexpensive automated tool to screen patients with AML for therapy-relevant genetic aberrations directly from bone marrow smears without time delay. This may lead to similar approaches to other bone marrow disorders in the future.

DNN for predicting and screening hematologic malignancies in everyday clinical applications is still in its infancy. However, the Lung Cancer Prediction Convolutional Neural Network (LCP-CNN) for forecasting malignancy in lung nodules found by computed tomography (CT) scans has already gained U.S. Food and Drug Administration (FDA) clearance and is the first AI-powered support software being implemented in a clinical context to help clinicians make optimal decisions in early-stage lung cancer diagnosis.29,30

Differential diagnosis of hematologic cancers

Since flow cytometry is the preferred method for the immunophenotypic characterization of leukemia and lymphoma, Moraes et al. (2019) proposed a decision-tree ML approach for the differential diagnosis of distinct World Health Organization (WHO) categories of B-cell chronic lymphoproliferative disorders using flow cytometry data.31 Their suggested decision tree consisted of nodes functioning through logistic operations that bifurcate across the tree into groups of (potential) unique leukemia/lymphoma diagnoses. This approach was tested on blood and bone marrow samples from 283 patients with mature lymphoid leukemias/lymphomas, achieving a 95% accuracy rate, with 61% providing a single diagnosis and 34% giving multiple possible diagnoses. Furthermore, this method was proven accurate in an out-of-sample validation and reached final diagnoses through up to seven binary transparent decision nodes.31

In a recent literature review, ML was shown to accurately differentiate between samples from individuals with mature B-cell cancers and healthy people using computer-aided flow cytometry analysis.32 Duetz et al. (2020) reported that by combining flow cytometry with clustering techniques (like FlowSOM) and ML methods (like random forest), diagnostic precision in different hematologic malignancies such as acute myeloid leukemia (AML) can be enhanced.32

The International Staging System (ISS) and the Revised International Staging System (R-ISS) are often used to predict outcomes in multiple myeloma (MM).33 Mosquera Orgueira et al. (2022) found that combining ML with the ISS or R-ISS offers a promising way to improve MM classification.33 The ML-based grouping was based on data from three clinical trials from the Spanish Myeloma Group, namely GEM05 under 65 years, GEM05 over 65 years and GEM2012 under 65 years. The biomarkers used included cytogenetic analysis/fluorescence in situ hybridization (FISH) on whole bone marrow or CD138-selected plasma cells, including t(4;14), t(14;16) and 17p deletion, immunoglobulin light and heavy chain type, Durie-Salmon staging, monoclonal spike in blood and urine, hemoglobin, creatinine, albumin, albumin-adjusted calcium, ß2-microglobulin, elevated LDH and percentage of plasma cells in the bone marrow aspirate smear. The authors could show that ML-based grouping was more accurate than the ISS and R-ISS scores, particularly in classifying medium-risk MM patients. Furthermore, older patients in the low-risk group showed improved survival when treated with a certain medication.33

ML and DL: A new hope or mock giant?

While the enthusiasm for ML and DL in the field of hematology and oncology is generally high, there are still relevant challenges to address, such as the overfitting of data and poor model transportability. General limitations such as data quality and availability, generalization and overfitting, interpretability and explainability, as well as the integration in clinical workflows with its regulatory and ethical considerations, demand that AI models undergo rigorous validation and approval processes before being used in patient care. This includes diverse and representative data of the entire population to help minimize biases, as well as a multidisciplinary collaboration of hematologists, data scientists, ethicists and patients in the model development process.

Research by Chekroud et al. (2024) highlights a significant challenge in that the model achieves perfect performance in one dataset but often fails when applied to truly independent clinical trials.34 This indicates a crucial need for rigorous testing on diverse and independent patient samples to ensure that AI models in healthcare are reliable and generalizable. As an example in hematology, the use of computer vision for cell classifiers is influenced by preclinical handling, such as the preparation and staining of the blood smear and the microscopic setup. This can produce different quality data input, and the model must be able to adjust for it, for example, by using normalization on the brightness of pictures.

Conclusions

ML is emerging as a powerful tool for predicting hematologic diseases. Recent studies have shown that ML algorithms can analyze patient symptoms and blood records to accurately detect disease, often matching or exceeding the accuracy of experienced specialists. These algorithms can also identify redundant tests, potentially reducing the number of laboratory tests needed for diagnosis. In addition, ML models have successfully predicted disease recurrence, categorized risk groups and anticipated test outcomes. Furthermore, DL methods have been shown to outperform traditional ML methods in predicting patient survival rates in leukemia and discovering new biomarkers for MCL. One issue that remains to be solved is the control of the correct training and validation of the algorithms to prevent biases in the training data that lead to unequal or unfair results, especially if they are not adequately tested for diversity and representativeness. The generalizability of models is an urgent need in the field of AI and AI applications should undergo the same validation criteria as new therapies or classic diagnostic procedures. As research continues, ML and DL technologies hold immense promise for significantly enhancing our ability to detect, diagnose and predict outcomes in hematologic malignancies, leading to earlier diagnoses and more personalized and effective treatment strategies.

Key message

AI methods such as ML and DL will substantially change the daily practice in hematology and oncology in the future.


Conflict of interest

The authors have declared that the manuscript was written in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

The authors have declared that no financial support was received from any organization for the submitted work.

Author contributions

All authors have approved the final manuscript.