Abstract

Artificial intelligence (AI) has become a critical innovation in healthcare, offering new opportunities for diagnosis, treatment and patient care. This article explores the impact of AI in medical imaging, particularly in breast cancer screening, where AI-driven tools have shown promise in improving cancer detection rates while reducing radiologists’ workload. By analysing large data sets, AI can identify subtle patterns and early-stage cancers that may be missed by human experts. Recent studies have shown that AI-assisted screening improves detection rates without increasing the number of false positives and also helps to minimise unnecessary biopsies. The integration of AI into traditional risk assessment models further improves prediction accuracy and paves the way for more personalised and effective screening strategies. However, the adoption of AI in medicine faces challenges, including public scepticism, privacy concerns and ethical considerations. This article discusses these challenges and highlights the importance of building trust and transparency in AI applications to maximise their potential in healthcare. Future research should focus on refining AI algorithms, building patient and provider trust, and ensuring that ethical standards are upheld in AI-driven medical practices.

Introduction

AI is a rapidly advancing field that enables machines and computers to mimic human intelligence and therefore perform tasks that previously required human supervision. The term itself was first coined in the 1950s but the technology has only become mainstream in recent years, with websites such as ChatGPT generating its popularity. AI has now infiltrated almost all areas of modern-day life, and medicine is no different. However, there is significant scepticism amongst both the general population and professionals regarding the reliability and safety of using this technology. This in itself poses the following question: how can we increase public trust in the software so we can fully utilise AI in medicine?

AI is not a new concept. However, unlike previous breakthroughs, which focused on computer vision, the most recent leap forward is in natural language processing (NLP). This is the software that gives AI its human-like nature, enabling it not only to understand and generate human language, but also to analyse and synthesise various other data types, such as images and videos. This side of AI can be found on multiple social media platforms and has particularly caused the public to question the safety of the software. There have already been several scandals surrounding the ethics of AI, as life-like text, images and even videos have been published, emphasising the difficulty in identifying artificially produced data. Furthermore, many studies have questioned how private data will be stored and looked after. The application of AI in medicine is controversial. Medicine is a professional field that involves personal data at every level, and includes a huge amount of trust between patient and doctor. It is therefore vital that patients also trust any software being used, but currently there is a real fear of the software.

Cancer is a disease in which some body cells grow uncontrollably and spread to other parts of the body. Cancer can develop almost anywhere in the human body, which is made up of trillions of cells. Normally, human cells grow and multiply to form new cells. When cells age or become damaged, they die and new cells take their place. Sometimes this orderly process breaks down and abnormal or damaged cells grow and multiply even though they should not. These cells can form tumours, which are lumps of tissue. Tumours can be cancerous or non-cancerous (benign). Understanding the nature of cancer and its impact on the body is crucial in advancing effective diagnostic methods. An example of AI’s uses can be found in imaging analysis models. AI-driven tools are able to assist healthcare providers by using algorithms to examine CT scans, X-rays and MRIs. The hope is that they may be able to identify lesions and abnormalities that human radiologists would overlook. One area of application for AI is in cancer detection. In a recent study, AI-assisted mammogram readings detected 20% more breast cancers compared to those interpreted by radiologists alone. This randomised control trial looked at the mammograms of 80,000 women in Sweden and assigned each to either have the current standard screening, involving two radiologists checking images, or to the AI group, in which one of the radiologists was replaced with an AI software. Crucially, the false positive rate was equal in both groups. Furthermore, the screen-reading workload was reduced by 44%, which significantly sped up the processing time for these patients. The study has not yet concluded, as the aim is to screen 100,000 women in total. Additionally, results regarding reductions in “interval cancers” (defined as “cancer diagnosed between screenings”) have not yet been analysed, and these results will demonstrate whether there is a true increase in detection. The following sections of this article will take a closer look at the integration of AI in medicine, focusing in particular on its application in breast cancer screening. The article will explore the promising results of recent studies and the potential benefits of AI-human collaboration in improving diagnostic accuracy, as well the challenges and ethical concerns that have fuelled public scepticism, including issues of misdiagnosis, privacy and the rapid advances in AI.

Another study analysed nearly 1.2 million mammograms and found that using AI alongside radiologists improved cancer detection rates by 2.6% compared to radiologists working independently. Additionally, AI is helping to minimise unnecessary biopsies. The iBRISK tool, highlighted in Radiology: Artificial Intelligence, has shown the ability to accurately predict whether suspicious areas identified during mammograms are likely to be benign or malignant. By refining the decision-making process, AI can reduce the number of benign biopsies, which account for approximately 80% of all breast biopsies. 

Another field AI is revolutionising is in individual risk prediction models. Traditional risk models, such as the Breast Cancer Surveillance Consortium (BCSC) risk model, use factors such as age, family history, breast density, and previous biopsy results. Whilst the BCSC risk model has been described as a “well-validated risk assessment tool”, a study published in June 2023 found that AI outperformed the BCSC model in predicting five-year breast cancer risk. Furthermore, when AI was combined with the BCSC model, the predictive accuracy was even greater, offering a more personalised and better-performing assessment of risk.

Trials have been carried out to assess the effectiveness of the use of AI in breast screenings, testing its ability to not only detect cancers but also to do so with a low false positive rate. The most commonly used AIs in this field are deep learning AI as opposed to RNN (Recurrent Neural Networks) and GAN (Generative Adversarial Networks), which are not as widely used.

False positives, in the context of breast cancer, refer to the detection of what is believed to be a cancerous lesion, but is later determined to be benign. These present significant challenges in breast cancer screening, leading to unnecessary anxiety, additional tests, and invasive procedures. Moreover, false positives have been shown to reduce future compliance with breast cancer screening services. AI has shown promise in addressing this issue. A 2020 study demonstrated that AI reduced the rate of false positives by nearly 6% in the U.S. and 1.2% in the U.K. This reduction could spare many women from the stress and potential complications of further testing and biopsies. It may also encourage more trust and thus better adherence in breast cancer screening globally.

Increasing detection of early-stage breast cancer is crucial. Early-stage cancers are often subtle and difficult to identify. However, the earlier breast cancer is detected, the better patient outcomes are, and so it is vital that we utilise technology that enables these earlier detection rates to improve disease survival. AI’s ability to recognise patterns in vast datasets allows it to spot these early signs that might be missed by even the most experienced radiologists. As demonstrated, utilising a combination of AI and human expertise enhances the accuracy of breast cancer diagnoses, and supports the use of the software in clinical practice.

The Collaboration of Human and AI: The Decision Referral Approach

The leading idea for AI’s inclusion in breast cancer screenings is a partnership between AI and radiologists. Leibig and colleagues proposed a decision referral approach to using AI (Leibig et al.,2022). The algorithm makes predictions based on the level of uncertainty in the result. Screenings where the AI has a high level of certainty are automatically done, while those with lower certainty are referred to radiologists for further examination. They evaluated the performance of the AI in this approach and its performance as a standalone system, while also comparing it with the decision of the radiologists. For this enquiry, two data sets of digital mammograms carried out for over a decade between 2007 and 2020 were used: the internal-test data set containing 19,997 normal mammography exams and 1,670 detected cancer exams, and an external-test data set containing 80,058 normal exams and 2,793 cancer-containing exams. All screenings were taken from eight sites participating in the German national breast-cancer screening programme. Each method was evaluated by sensitivity and specificity. The AI on its own had an average sensitivity of 84.4% and an average specificity of 90.4% between the two data sets. Accuracy in this test was lower than the average unaided radiologist, but the decision referral approach had improved sensitivity by 2.6% and specificity by 1.0%. Further testing is required to determine the degree of AI aid required to surpass a lone radiologist, but there is a great potential for improved efficiency in such a collaboration. The method could again reduce the workload for radiologists by having less severe cases be handled by AI so the more serious cases can be more thoroughly reviewed.

A Swedish study by K. Dembrower et al. (2023) aimed to examine how AI affects cancer detection and false positive findings in paired-reader examinations due to a lack of prospective studies. All of the roughly 56,000 participants involved were women aged 40-74 and participating in population-based screening in the area. Screenings were all carried out over a year between 2021 and 2022. Assessments were done by comparing the standard double reading by radiologists, double readings done by one radiologist with AI aid, and comparing single readings carried out by AI combined with a triple reading by two radiologists and two double readings carried out by two radiologists. In both cases, cancer detections were similar regardless of the  involvement of AI in comparison to readings done without. There was a 4% increase in non-inferior cancer detection rates from the double read readings. As all cases were non-inferior, it can be implied that integration of AI would be beneficial due its increased work speeds. This method also includes human judgement in the process of the screenings which can offer reassurance to patients unsure of the validity of the results.

A study by K. Lang et al. (2023) had 80,020 participants between ages 40-80 be randomly assigned standard double screenings done by radiographers (the control group) or AI-supported screening. Of the 46,345 AI supported screen readings, 0.526% had detected cancers and another 1.86% were recalled. For the standard screenings, 83,231 readings were taken with 0.293% having detected cancer and 0.982% being recalled. Both readings involved 40,000 (rounded to two significant figures) participants with a mean age of 54 with the same false positive rate of 1.5%. The screen reading workload was reduced by 44.3% using AI. The results show that the percentage of cancers detected and recalled in the readings done by AI are roughly double the percentages of those for the standard screenings with half the number of readings done. It can be inferred that the AI requires fewer readings to detect tumours, but does so more generally as compared to standard screenings that require more accuracy due to the extensive examination of abnormalities found in screenings to be sure that they are cancerous tumours. There is promise in the use of AI for breast cancer screening with its inclusion reducing the workload of radiologists by reducing the number of screenings required to diagnose tumours. The implementation of AI in breast cancer research thus has great potential in cancer detection, being most effective when used in tandem with radiologists, with the potential to reduce workloads and efficiency in screen readings.

Why People Distrust AI: Misdiagnoses and Fast Changes in Medical Technology

In recent years, there has been a lot of growing scepticism around AI. Researchers at the National Institutes of Health (NIH) found that despite AI’s ability to diagnose medical conditions correctly, AI could not describe or explain the logic behind its diagnosis. This has led to the belief that AI still requires much evaluation before its integration into healthcare. In a survey done by the Health Foundation to understand the stance on integrating AI into the healthcare industry, 11% of NHS staff believed that its integration would reduce care quality, bringing up issues like the potential loss of human interaction and the possibility of inaccurate diagnoses. Some of these AI systems have been providing misdiagnoses to patients that human doctors could have easily avoided. For example, an article on the Agency for Healthcare Research and Quality’s PSNet felt that while AI holds potential for improving diagnostic accuracy, there are significant concerns that are based on misdiagnoses. These concerns arise from the fact that at the end of the day, AI algorithms are only as good as the data that they are trained on. When these systems are trained on limited or biased data sets, they can produce errors that lead to misdiagnosis. While human doctors might have caught these due to learning from their mistakes, AI does not (Hall & Fitall, 2019).

Various studies have shown that while AI is improving at a fast pace, many people have been worried about AI’s accuracy and how little we know about its workings (Pew Research Center, 2023). For instance, there was a 2023 survey by Pew Research Center that revealed that 60% of Americans feel uncomfortable with the idea of their healthcare provider relying on AI for their diagnoses and treatment recommendations (Pew Research Center, 2023). One notable case of AI being used in the healthcare system is IBM’s Watson for Oncology (Watson), an AI system designed to help doctors with cancer treatment. The scientists at Southwest Medical University, China had very high expectations for this AI as they expected it to revolutionise cancer care (Zho, Jie et al, 2021). However, later, a report on Watson suggested it provided inaccurate and unsafe treatment recommendations. In 2019, an in-depth evaluation of Watson’s Oncology by IEEE Spectrum was published. It talked about the ethical challenges that were faced by Watson. The report pointed out how Watson’s recommendations were often based on limited sets of data, which led to dangerous recommendations in clinical practice. Since the AI was not fully developed, it recommended chemotherapy for patients who were not suitable for that type of treatment. The National Academy of Medicine notes how the reliability of the AI is dependent on the data it is trained on. Any bias or lack of in-depth data can lead to unreliable results (National Academy of Medicine, 2023). To take this into perspective, before AI was even introduced to the healthcare system, humans were assisting humans, which made people feel more comfortable knowing that there was a human to assist, in comparison with AI (Pew Research Center, 2023). The power of human touch is also highlighted in The Journal of General Internal Medicine: “patients consistently value the humanistic qualities of their doctors – such as empathy, communication, and understanding of their individual needs over technical expertise alone” (Levinson, 2000). Even though implementing AI technology appears beneficial, doctors remain sceptical since AI was incorporated into the system all at once, without adequate preliminary testing. This led to many different issues occurring that could have been easily avoided, like IBM’s Watson for Oncology. 

Although there are many challenges in integrating AI into the healthcare system which may lead some to question its usefulness, it’s important to note that AI has been successfully integrated into other areas, such as smartphones. People use AI for their smartphones all the time, using apps like Siri, Google Assistant and Alexa, knowing AI is powering it, yet we don’t hear anybody complaining about this — why? According to Lifewire, AI has become a trusted and integral part of smartphones, with features like Siri, Google Assistant, and facial recognition being used daily by millions (George, 2024). Dr. Andrew Ng, a prominent AI researcher and co-founder of Google Brain, states that the main reason that people don’t complain about AI features is because they were introduced gradually and have improved over time, becoming a norm in individuals’ daily lives. Each time a new type of AI was introduced, it was introduced at its best form with many years of testing (Ng, 2018). Dr. Andrew Ng highlights this as a gradual development of new iterations, starting simple and progressively becoming more complex. The demonstrable change and improvement over time can present the reliability of the programming and build trust. Gradually, more AI functions were added, making people feel more comfortable with the use of AI. For example, if we look at AI-powered voice assistants, we can see that they were first able to complete simple tasks like setting your alarm or answering simple questions. Then, over time, these assistants kept on improving due to extensive testing on the systems, which allowed them to handle more complicated tasks, like controlling smart home devices or understanding natural language queries without making any errors.

However, there have also been successful cases of technology being integrated into the medical system as mentioned earlier. Initially, people needed to get used to the concept of using technology in surgeries. However, as time went on, they got more and more comfortable with it as it developed its precision, reliability, and success in minimally invasive surgeries (Barbash and Glied, 2010; Finkelstein et al., 2010; Moe, 2012). For example, patients and surgeons were hesitant about the use of the Da Vinci Surgical System, but over time, as the system proved to improve its surgical outcomes with fewer complications, reduced pain, and faster recovery times, it became more accepted in the medical community. Rather than introducing everything at a fast pace all at once, they were very careful with the claims. The product was also subjected to rigorous testing. According to the Da Vinci Surgical System case study (2000), the robot was initially designed to provide surgery remotely on battlefields. Its goal was to help surgeons with improved precisions (Intuitive Surgical, date unknown). Before the technology was allowed to be released to the public, it had to go through many forms of extensive training and testing. As an example, these trials involved performing surgeries under strict guidelines with professionally trained doctors and surgeons. Then it had to be approved by the FDA. The first time, the FDA rejected it and asked for stricter controls during the manufacturing process. In 2000, the system finally received FDA approval, but it was very limited to certain types of surgeries in the beginning. Over time, as there was more data available, the applications diversified. Dr. Atul Gawande, a prominent surgeon and author, said “better is possible. It does not take genius. It takes diligence. It takes moral clarity. It takes ingenuity. And above all, it takes a willingness to try.” This quote reflects the ongoing effort needed in healthcare innovations, including AI in breast cancer detection, where continuous improvement is key (Gawande, 2009). In terms of how this was introduced and integrated to the public, there was a lot of public education and seminars that educated both medical professionals and the general public about the benefits and risks of this robotic-assisted surgery. According to Dr. Myriam Curet, Chief Medical Officer of Intuitive Surgical, robot-assisted surgery enables a wider range of possibilities that would be unavailable through open or laparoscopic surgery (Curet, M., 2010). This enhanced capability was crucial in gaining the confidence of surgeons and patients alike, ultimately helping the technology achieve widespread acceptance​.

Conclusion

AI has near limitless potential in all of its applications. However, it still has a lot of training, altering and developing to undergo before it can be used worldwide. There is much to be done in public perception and the relationship between AI and doctors working together, but the benefits to healthcare and patient wellbeing that such a collaboration could provide is undeniable. AI can aid human radiologists in their analysis to allow for both to achieve greater results than if they were to work alone, saving more lives in the process.

Bibliography

Barbash, G.I. & Glied, S.A. (2010). New technology and health care costs—the case of robot-assisted surgery. The New England Journal of Medicine, 363(8), pp.701-704.

Curet, M. (2019). The evolution of the Da Vinci Surgical System: How surgical robotics is reshaping surgery. Intuitive Surgical Inc. Available at: www.intuitivesurgical.com.

Dembrower, K., Crippa, A., Colόn, E., Eklund, M. et al. (2013). Artificial Intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study.

Finkelstein, J., Mullett, T., Hardy, M. & Yu, X. (2010). Learning curve and surgical outcome for robotic-assisted laparoscopic prostatectomy. Surgical Endoscopy, 24(5), pp.1051-1055.

Gawande, A. (2002). Complications: A Surgeon’s Notes on an Imperfect Science. New York: Picador.

George, M. (2024). AI in Everyday Devices: A Growing Trust. Lifewire. Available at: www.lifewire.com.

Ibrahim, S.A., et al. (2021). Diagnostic Errors, Health Disparities, and Artificial Intelligence: A Combination for Health or Harm? JAMA Health Forum, 2(9), pp.1021-1030. doi:10.1001/jamahealthforum.2021.2430.

Lang, K., Josefsson, V., Larsson, A-M., Larsson, S. et al. (2023). Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study by Lang.

Leibig, C., Brehmer, M., Bunk, S., Byng, D. et al. (2022) Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis.

Levinson, W., et al. (2000). Physician-Patient Communication: The Relationship With Malpractice Claims Among Primary Care Physicians and Surgeons. Journal of the American Medical Association, 284(4), pp. 553-559.

Moe, J. (2012). The development and future of robotic surgery. Annals of the Academy of Medicine, Singapore, 41(2), pp.219-223.

National Institute of Health. (2024). NIH findings shed light on risks and benefits of integrating AI into medical decision-making. Available at: https://www.nih.gov/news-events/news-releases/nih-findings-shed-light-risks-benefits-integrating-ai-into-medical-decision-making.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), pp. 447-453.

Pew Research Center. (2023). How Americans View Use of AI in Healthcare and Medicine by Doctors and Other Providers. Pew Research Center. Available at: www.pewresearch.org.

The Health Foundation. (2024). Majority of NHS staff support using AI in patient care, major polling finds.

What is cancer?. NCI. (n.d.). https://www.cancer.gov/about-cancer/understanding/what-is-cancer.

Woolf, S.H., Masters, R.K., and Aron, L.Y. (2021). Effect of the COVID-19 pandemic in 2020 on life expectancy across populations in the USA and other high-income countries: simulations of provisional mortality data. BMJ, 373, pp.1343.