Breast Imaging Goes Deep: A Look At Deep Learning

Breast Imaging Goes Deep: A Look At Deep Learning

Deep Learning graphic

Introduction: Definition of terms

Deep learning falls under the general category of artificial intelligence (AI). AI is the simulation of human intelligence processes by machines, particularly computer systems. While AI encompasses the idea of a machine that can mimic human intelligence, machine learning (ML) aims to teach a machine how to perform a specific task and yield accurate results by identifying patterns. On the other hand, deep learning (DL) models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions. A convolutional neural network (CNN) is a form of deep learning that is used primarily for image recognition, due to its ability to recognize patterns in images. In summary, AI is the largest circle and encompasses all of the other circles including ML, DL, and CN. Therefore, CNN is a subset of DL which is a subset of ML which is a subset of AI or artificial intelligence.

Of all the fields in medicine, radiology, and particularly breast imaging is well suited for the development of algorithms for AI. Most breast imaging exams are classified in a binary fashion (e.g., benign vs malignant) and almost all studies have an easily identified truth (e.g., histopathology or negative imaging follow-up). Thus far, smaller studies have shown that AI tools increase diagnostic accuracy, improve breast cancer risk assessment, and predict response to cancer therapy. AI is also being used to improve image reconstruction so that higher quality images can be obtained with lower radiation doses, as well as shorter acquisition times for breast MRI. Going forward, AI may be used to automate simple radiologic tasks such as removing completely normal exams from the radiology work list. With 20 million screening mammograms done each year in the US, over 99% of them are completely normal. DL models could be used to serve as an independent reader of these ultra-normal mammograms. This would allow the radiologist more time to tackle more challenging cases. Fortunately, computer algorithms do not suffer from fatigue or distraction and are uniquely suited for basic repetitive tasks. “AI can identify complex patterns and imaging data that are not appreciated by the human eye…”

Deep Learning hierarchy graphic

“Traditional machine learning, a subfield of AI, was used in the 1990s and 2000 to develop computer-aided detection (CAD) software for mammography”. Though it received FDA approval in 1998 and became widely utilized, more recent larger studies have demonstrated many false positives and do not seem to improve diagnostic accuracy. Because of this, it has fallen largely out of favor. DL first gained attention in 2012. Since 2016 DL has exploded onto the scene in its application to radiology in general and to breast imaging in particular. “DL models not only classify input images as positive or negative, but they figure out which imaging features are needed to perform this classification without expert input”. Most DL algorithms for medical imaging use millions of CNN variables. Most DL models are trained using a supervised learning technique which means that they are trained using many labeled examples. Once a CNN is trained by the scientists it is tested using tests that are not used during the training period. Under ideal circumstances, a CNN performance is further validated by a data set from an outside institution. The more data loaded into an AI system the better the results. “Since DL networks learn complex representations of images not appreciated by the human eye, they have the potential to identify new unseen patterns in data, transcending our current knowledge of disease diagnosis and treatment…”

Though screening mammography decreases breast cancer mortality by 20-35%, is far from perfect. The diagnostic accuracy of mammography varies widely even among experts, with sensitivity and specificity ranging from 67 to 99% and 71 to 97% respectively. “DL has the potential to improve these metrics, both increasing cancer detection rates and decreasing unnecessary callbacks. Several retrospective and reader studies have already shown AI model performance at or beyond the level of expert radiologists.” Digital breast tomography (digital mammography) is much more complex but has the potential to improve AI performance even further. It takes the average radiologist 50% more time to interpret a digital mammogram compared to a two-dimensional mammogram. Consequently, AI tools are being developed not just to detect more cancers but to improve clinical efficiency.

One of the most important AI mammography-digital mammography studies to date was written by Lotter (Nat Med 2021; 27:244-49). Their DL model outperformed five expert breast imagers in a reading study, for both 2D and 3D mammography and was validated using data from several national and one international site. Calcifications are also important in mammography and yet more than half of calcifications are classified as suspicious yield benign pathology. Several small studies have noted an increase in diagnostic accuracy using DL algorithms though the data sets were small and larger studies are needed.

Currently, risk assessment models such as Tyra-Cuzick are used to determine whether a woman is at high risk for developing breast cancer. Radiologists have known for many years that certain breast patterns are associated with a higher risk of a woman developing breast cancer. Several studies including Yala, (Sci Transl Med 2021; 13: 578) have developed DL models using mammograms or MR images that outperform the Tyra-Cuzick model and have been validated on large sets of images from the US, Europe, and Asia. A prospective trial called ScreenTrustMRI is currently recruiting patients to predict the risk of future breast cancer. DL tools have also been developed to assess mammographic density and have been implemented at both academic and clinical radiologic centers.

Ultrasound graphic showing different between malignant and benign tumors.

Though breast ultrasound can be used as a supplemental screening modality, increasing cancer detection rates of up mammography alone, in many cases it demonstrates low specificity and prompts unnecessary biopsies. The ability of DL to segment the breast by ultrasound or MRI is currently state-of-the-art. Segmentation of breast tissue by ultrasound or MRI is a two-step process with the breast area must be separated from the chest wall and then breast tissue segmented into fibroglandular, fatty tissue, and tumor. Zhou, (Radiology 2020; 294:19-28) used ultrasound images of the primary tumor to predict the presence of axillary node metastases with an accuracy of approximately 90%.

Currently, several papers show that DL methods for breast MRI outperform traditional machine learning. However, three publications on DL for breast MRI have shown mixed results when comparing DL to a breast image train radiologist. Additional large studies are needed to determine how DL compares to human experts. To date, the largest number of studies include only a couple of thousand breast MRI exams. In works similar to that done with mammography, DL has been used to predict five-year breast cancer risk from breast MRI MIP images directly, outperforming the standard of care Tyra-Cuzick model. In smaller studies, DL with breast MRI images has been developed to predict breast tumor molecular subtypes, Oncotype Dx recurrence score, and axillary nodal status.

Finally, DL techniques to predict the response to neoadjuvant chemotherapy are an area of great interest. The I-SPY2 trial found that combinations of MRI features can predict pathological treatment responses using basic logistic regression analysis (this is a classification algorithm used to predict binary outcomes such as benign versus malignant).


“DL tools for breast imaging interpretation are being developed at a rapid pace and are likely to transform the clinical landscape of breast imaging over the coming years. Notably, DL mammography tools for breast cancer detection and breast cancer risk assessment demonstrate performance at or above human-level, and prospective trials are warranted to pave the way for clinical translation”.

Dr. Alan Stolier, MD, FACS, clinical breast oncologist

Dr. Alan Stolier, MD, FACS, clinical breast oncologist, shares his expert medical perspective with a series of educational and scientific articles.