AI in Healthcare: Counteracting Algorithmic Bias

Emma Stone


Instructor’s Note

In WR152: The Philosophy and Ethics of Artificial Intelligence, students reckon with the revolutionary potential that AI promises, as well as the new threats that it poses to political and social life. We consider how AI systems are already integrated into our daily experiences, whether these be mundane, like content streaming recommendations and dating apps, or serious, such as diagnosing cancer. Students explore how AI will challenge us to reinvent the global economy, rethink how education and learning happen in the classroom, and reconsider what it means to be human—that is, of course, if the “human” does not become an obsolete concept altogether. Emma’s essay, “AI in Healthcare: Counteracting Algorithmic Bias,” surveys some of the many ways that AI is used in healthcare while also exposing its great potential for harm through algorithmic bias; as Emma so beautifully explains, these systems have the potential to entrench and exacerbate the very inequalities they are meant to mitigate. Emma’s essay is a model research paper: it devotes itself to careful and accurate description of the problem in a way that makes the topic accessible to a broad audience, and yet it never compromises expertise or depth through oversimplification. The essay ends with a strong call to action about how we can motivate fairness in medical AI, bringing all stakeholders to the table to ensure a more equitable future.

Christopher McVey

From the Writer

The significance of AI algorithms, especially in healthcare, is often overlooked despite their prevalence in our daily lives. AI algorithms revolutionize patient care and treatment in many settings from analyzing x-ray results for tumors to determining patient health risk, yet their application carries immense risk. Bias is a pervasive issue in AI systems because it exacerbates pre-existing inequities in healthcare by introducing discriminatory predictions based on socioeconomic status and race. This paper aims to explore the mechanisms of bias in healthcare AI systems and propose concrete, actionable solutions to ensure equitable treatment and care for all patients.


AI in Healthcare: Counteracting Algorithmic Bias

I. Introduction

Artificial intelligence (AI) is being universally adapted to healthcare settings. Current applications of AI are changing how patients are managed, diagnosed, cared for, and treated. AI is being utilized to improve radiology and medical imaging, assist with diagnosis, improve clinician workflows, and enhance hospital administration and operation (Liefgreen et al 2023). It is a revolutionary technology that is increasing efficiency and reducing health care costs extensively across the medical field. Yet, the rapid integration of AI into healthcare is deeply concerning. Inequities, disparities, and discrimination in patient care, treatment, and health outcomes are rampant in the current healthcare system, and biased AI algorithms have the potential to exacerbate these problems.

The World Health Organization reports that social determinants of health such as education, employment, food security, and income can account for up to 55% of health outcomes (Edwards 2022). Pervasive bias and inequity can arise when these social determinant variables are included in AI tools because algorithms function by finding correlations among variables to generate predictions. Health disparities already disproportionately affect people facing systemic barriers due to race, ethnicity, gender, socioeconomic status, sexual orientation, and geographic location (Edwards 2022). Fairness and equity must be inherent to the design of medical AI. If AI is not integrated into our healthcare system with great moral and ethical integrity and awareness of existing biases and inequities, people who are already marginalized in healthcare will experience even greater disparity. This paper introduces the mechanisms of algorithmic bias in medical AI, provides an ethical framework for examining and addressing such bias, recommends solutions, and suggests ways to motivate developers to leverage these solutions.

II. Mechanisms of bias in AI

AI tools are built from algorithms that draw correlations from large data sets of many variables to generate decisions and predictions that are more accurate and more reliable. However, AI algorithms and the data sets they are trained on tend to be very complex and opaque, which allows for strong implicit biases and discrimination in AI generated predictions. This is called algorithmic bias. Algorithmic bias can be defined as inequality of algorithmic outcomes between two groups of different morally relevant reference classes such as gender, race, or ethnicity. Algorithmic bias occurs when the outcome of the algorithm’s decision-making treats one group better or worse without good cause.

It is important to distinguish between types of bias in medical AI, so that developers, stakeholders, and clinicians are aware of the many ways bias can be introduced into these tools. Defining the differences in algorithmic bias allows for targeted and comprehensive prevention and mitigation. There are many mechanisms of bias in AI. This paper will focus on the most common forms of algorithmic bias: minority bias, missing data bias, technical bias, and label bias.

Minority bias occurs when minority groups are under or overrepresented in a dataset (Ueda et al 2023). When minority groups are represented disproportionately in the dataset, there will be insufficient data for the algorithm to generate accurate correlations and predictions for specific groups within the dataset. Minority bias is very common in healthcare applications of AI, as large diverse sets of health data are difficult to collect due to privacy protections and regulations (Liefgreen 2023).

Missing data bias occurs when data is missing from a dataset in a nonrandom way (Ueda et al 2023). This is particularly relevant for big data, which is data categorized in matrix-like form. Big data contains many classifiers and inputs; if big data is missing values in a non-random way, then the algorithm can correlate between missing data and other classifiers to skew predictions. This may arise when there is human input involved in a data set, such that there are consistent errors in data entry (Saxena et al 2021).

Technical bias refers to bias that occurs when the features of detection are not as reliable for some groups as they are for others (Ueda et al 2023). This is mainly due to a failure or limitation of detection tools or methods for gathering data. A familiar example of technical bias is that melanoma is harder to detect in darker skin than it is in lighter skin because it is easier to recognize discoloration in fair skin. When the bias of detection methods and tools are translated into an algorithm, it is considered technical bias.

Label bias refers to bias that leads to inaccurate decisions due to ill-defined labels (Ueda et al 2023). AI algorithms use classifiers and labels to draw correlations and make predictions. If labels, classifiers, and parameters are not used appropriately, AI can generate biased outcomes by using certain labels such as zip code as a proxy for social classifiers such as race. When labels are not thoughtfully considered, algorithms can use incorrect correlations as causal predictors.

III. Fairness and Ethical frameworks for AI

Fairness must be inherent to the design of medical applications of AI to prevent algorithmic bias and its downstream consequences of discrimination and inequity. Ueda et al. define fairness in healthcare as, “a multidimensional concept that includes the equitable distribution of resources, opportunities, and outcomes among diverse patient populations ” (Ueda et al 2023).  Fairness refers to striving for equity in all stages of healthcare, which can be built into AI by applying globally accepted ethical principles. In a global meta-analysis of private and public guidelines for ethical AI, Jobin et al found eleven global ethical principles for AI algorithms: transparency, justice and fairness, non-maleficence, responsibility, privacy, beneficence, freedom and autonomy, trust, dignity, sustainability, and solidarity (Jobin et al 2019).

However, these principles alone do not create an adequate ethical framework for developing fair AI tools in healthcare. Medicine is an ever-evolving field and much of medicine’s advancements are yet to be discovered. Therefore, medical decisions contain varying degrees of ambiguity. Medical practice frequently involves using correlations, not causation, of symptoms to treat and diagnose. Unlike other fields, such as mathematics, there is not always black and white truth or principles to support medical decisions such as diagnosis, treatment, and patient care. Starke et al explain how bias can be introduced in healthcare when limitations of current medical knowledge require assumptions or reliance on correlations. They assert, “Most biological differences only become meaningful in medicine if they are correlated with symptoms and complaints—a process that is by definition highly conventional and ultimately also pragmatic” (Starke et. al 2021). Therefore, we must consider the limitations of scientific knowledge in our examination of biased AI. AI algorithms in many cases must rely on correlations of different classifiers that are not proven to be causal to predict medical outcomes. Therefore, all scenarios of medical AI applications must be examined individually to determine what classifiers are necessary and useful for the specific medical application. Developers must balance a reliance on correlative data with the risk of introducing or perpetuating bias and inequity. Fairness must be integrated into healthcare AI tools through this versatile and adaptable ethical framework that pragmatically balances correlative prediction and bias while upholding the global principles for ethical AI.

IV. Bias Mitigation

I propose three actionable solutions to mitigate algorithmic bias and achieve fairness under a pragmatic ethical framework: 1) use datasets that sample from diverse populations 2) pre-process big data and 3) label datasets with suitable social category classifiers. Diverse datasets can be used to address and prevent minority and technical bias. Liefgreen et al argue, “Training datasets should ideally be diverse along three dimensions: (i) individual, considering different biological factors, such as age, sex, and race; (ii) population, reflecting diverse disease prevalence, access to healthcare, and cultural factors; and (iii) technical, containing data originating from different types of medical machinery, using various acquisition or reconstruction parameters” (2023). It is difficult to gather data that samples from populations that are diverse on all these dimensions. Data privacy laws and regulations protecting patients can hinder the collection and accumulation of such a dataset. Still, AI algorithms should contain as much diversity as possible. When algorithms appropriately represent minority groups, accuracy of predictions for that group increases and the general performance and accuracy of the entire algorithm is also improved. Diversity across all three dimensions may be difficult to achieve in all scenarios, but developers can look to the specific applications of their AI to determine what the scenario requires. Developers may have to compromise diversity along one dimension to ensure that diversity across another more crucial dimension is upheld. For example, AI algorithms used for detecting anomalies in X-ray may not need to focus on technical diversity if different brands of X-ray machines do not produce varying results. With this example in mind, diversity specifically along the technical dimension will also reduce technical bias. If datasets contain data from diverse detection methods, such as different modes of measurement or classes or brands of tools, then algorithms will be less likely to correlate between data due to method of detection.

Pre-processing big data will address missing data bias. Big data is increasingly being used in AI tools to develop more robust and accurate predictive models. Williams et al describe big data simply with the following illustration of big data as a piece of graph paper that represents data for one person (2018). The columns represent different classifiers of that person and the rows in each column represent all possible values for that classifier. The square that matches the person’s value for a classifier is checked off in each column. The pattern of checked off squares on the graph paper represents important information about that person. When the papers of every person in a dataset are pulled together, AI algorithms can draw correlations from the patterns on the pages to generate predictions (Williams et al 2018). Big data is extremely useful for developing accurate AI tools, but it is very complex in scale and organization and requires careful management. Saxena et. al explain, big data is ” so diverse and complex in scale that it cannot be managed and analyzed by existing data base management systems and thus requires new architectural framework, algorithms for its management” (Saxena et al 2021).  If big data is not managed appropriately, missing data bias can be introduced into AI algorithms. Missing data bias can be mitigated by appropriately processing big data before training. If this data is not appropriately curated through filtering, cleaning, and labeling, AI can generate inaccurate and biased predictions. Saxena et al argue, “The most common reasons for bias during data curation could be corruption of data, redundant or missing records, missing values, etc., which, cumulatively, increases over the process of structuring, processing, and analyzing which could result in false predictions” (Saxena et al 2021). As these degradations are amplified, algorithms may use non-random corruptions of the data as correlations for generating decisions. This must be mitigated by inspecting and pre-processing data before it is used in algorithm training or deployment.

Organizing datasets with appropriate classifiers and parameters addresses label bias. A common way that developers have addressed label bias in the past is to exclude social category data such as gender, race, and ethnicity to avoid discrimination and inequality in algorithmic outcomes. However, simply excluding social category does not eliminate or reduce bias. The datasets that AI algorithms are trained on are collected in a biased and inequitable world. As mentioned in the introduction, health outcome disparities exist between groups of different social categories. There is unequal access to healthcare, discriminatory treatment, and uneven diagnosis among social categories. As such, datasets implicitly carry bias regardless of whether social category information is included because datasets exist in the context of an unfair world. Due to this implicit bias, when social category classifiers are excluded, AI algorithms can use other classifiers, called proxy variables, as approximations for social category classifiers. An example of a proxy variable for race is zip code (Williams et al 2018). Due to historic discrimination, zip code can be used as an accurate predictor of a person’s race. If zip code is included as a classifier in a training dataset, but race is not, discrimination can be unintentionally built into an algorithm. AI algorithms must be organized with appropriate social category classifiers to explicitly avoid discrimination and reduce bias.

The work of Obermeyer et al illustrates the importance of using appropriate classifiers to reduce bias. They studied the accuracy of a healthcare algorithm that excluded social category classifiers. Obermeyer et al explain the healthcare algorithm they examined, “The algorithm’s stated goal is to predict complex health needs for the purpose of targeting an intervention that manages those needs “(Obermeyer et al 2019). This algorithm is an example of common commercial risk prediction tools for high-risk care management programs. This particular algorithm is applied to approximately 200 million people in the United States each year. Obermeyer et al. studied the accuracy of this model by comparing the risk score the algorithm assigned to the patient to the actual future health outcomes of the patient by examining medical records far after risk score was assigned. They found that since social category information, such as race, was excluded from the dataset the algorithm was trained and deployed on, the algorithm unintentionally used healthcare cost as a proxy variable for race. Black patients systemically have lower healthcare costs because there is unequal access to care for black and white patients and less money is spent on treatment of black patients. As the algorithm relied on healthcare cost as a metric for risk, black patients were assigned the same risk scores as white patients when they were much sicker. The researchers adjusted the assignments based on the actual health outcomes of patients and found that there should have been a 46% increase in enrollment of black patients into care coordination programs this algorithm is used for. Obermeyer et al assert, “Because labels are the key determinant of both predictive quality and predictive bias, careful choice can allow us to enjoy the benefits of algorithmic predictions while minimizing their risks” (2019).

Finally, to address all forms of algorithmic bias we must recognize that these AI tools do not function in a vacuum. Developers, stakeholders, and clinicians must all work together to mitigate and control algorithmic bias. In practice, doctors must make informed final decisions by combining their medical knowledge and training with AI predictions. In the high-risk care algorithm, “realized enrollment decisions largely reflect how doctors respond to algorithmic predictions, along with other administrative factors related to eligibility” (Obermeyer et al 2019). Clinicians must be well educated on the algorithm, including its methods and its accuracy, to maintain integrity in their medical decision making and further work to reduce bias and discrimination.

V. Motivating Fairness in Medical AI

The mechanisms of algorithmic bias have been known to developers for some time. The groundbreaking study, “How Algorithms Discriminate Based on Data They Lack: Challenges, Solutions, and Policy Implications” by Williams et al was published in 2018. Unfortunately, raising awareness alone will not motivate stakeholders and developers to address the issue of algorithmic bias. Other methods must be employed in conjunction with raising awareness to drive stakeholders and developers to regard fairness and accuracy highly and establish low tolerances for bias. Liefgreen et al explain, “to ensure sustained behavior change, one needs to engage with the target subjects in ways that elicit their own goals, interests, and plans so that they can develop their value for, and interest in, the positive behavior ” (Liefgreen 2023). To ensure algorithmic bias is addressed, mitigation must be aligned with the interests of developers and stakeholders. Fairness needs to be promoted in AI design solutions by drawing from the values and norms of developers and stakeholders in alignment with the framework of ethical principles and balancing correlative risk.

However, aligning developers’ and stakeholders’ values with algorithmic bias mitigation does not guarantee action in alignment with those values either. There is often a discrepancy between people’s values and their actions. A given situation may not cause a person to recognize how they may apply their values, or the demands of the situation may not allow the person to express their values causing them to act in opposition. In addition to aligning bias mitigation with developer and stakeholder values, incentives must be employed to create fair and unbiased AI. Liefgreen et al suggest, “An effective motivational strategy could involve highlighting the alignment of ethical principles of fairness and transparency with positive outcomes, such as increased trust in the AI system, enhanced reputation of the organization, and improved stakeholder satisfaction” (2023). Additionally, fairness can be financially advantageous. Implementing fairness in the design phase is more cost effective than implementing fairness in the testing or deployment phase. The cost of auditing for bias initially is small compared to addressing limitations later in development. If developers and stakeholders can recognize that fairness in their algorithms aligns with their values, improves accuracy, and increases revenues, then discrimination and bias in medical AI can be reduced in a meaningful way.

VI. Conclusion

Algorithmic bias in medical applications of AI is a nuanced issue. AI is incredibly advantageous in healthcare as it can reduce healthcare costs, enhance efficiency, and improve patient care, diagnostics, and treatment. Yet, algorithmic bias in these tools threatens to exacerbate pre-existing social and racial inequities in healthcare if fairness is not inherent to AI design. Awareness of the mechanisms of bias in AI tools and the corresponding potential mitigation efforts have been known for some time, but application of this knowledge requires incentivized adoption of a versatile and adaptable ethical framework by developers and stakeholders. Preventing discrimination and inequitable outcomes due to medical AI requires a multifaceted top-down approach.

Developers and stakeholders must be incentivized to adopt a design framework of global ethical principles and balanced correlative risk. Their values must be aligned with this framework by highlighting the positive outcomes of fair AI, such as increased trust, enhanced reputation, improved stakeholder satisfaction, and higher revenues. Developers and stakeholders will not take action to apply mitigation efforts to address algorithmic bias and prevent discrimination and inequity unless motivated by their independent values and external incentives. The issue is not how to address algorithmic bias, but how to ensure mitigation efforts are practically applied and action is taken to prevent inequity and discrimination actively and explicitly in the healthcare system. Researchers, developers, and healthcare professionals must work together to ensure that AI is a force for good. AI should reduce disparities and improve health outcomes of all individuals regardless of their gender, race, or socioeconomic background. Progress requires diligence, ethical commitment, and thoughtful consideration of AI’s impact on patients and communities.

Works Cited

Alowais, Shuroug A., et al. “Revolutionizing Healthcare: The role of Artificial Intelligence in Clinical Practice.” BMC Medical Education, vol. 23, no. 1, 2023, https://doi.org/10.1186/s12909-023-04698-z.

Edwards, Jennifer M. “Health Equality, Equity, and Justice: Know the Difference and Why They Matter.” Healthline, Healthline Media, 28 Nov. 2022, www.healthline.com/health/what-is-health-equality#health-equity.

Grote, Thomas, and Geoff Keeling. “On Algorithmic Fairness in Medical Practice.” Cambridge Quarterly of Healthcare Ethics, vol. 31, no. 1, 2022, pp. 83–94, https://doi.org/10.1017/S0963180121000839.

Jobin, Anna, et al. “The global landscape of AI ethics guidelines.” Nature Machine Intelligence, vol. 1, no. 9, 2019, pp. 389–399, https://doi.org/10.1038/s42256-019-0088-2.

Liefgreen, Alice, et al. “Beyond ideals: why the (medical) AI industry needs to motivate behavioural change in line with fairness and transparency values, and how it can do it.” AI & Society, 2023, https://doi.org/10.1007/s00146-023-01684-3.

Nebeker, Camille, et al. “Building the case for actionable ethics in digital health research supported by artificial intelligence.” BMC Medicine, vol. 17, no. 1, 2019, pp. 137–137, https://doi.org/10.1186/s12916-019-1377-7.

Obermeyer, Ziad, et al. “Dissecting racial bias in an algorithm used to manage the health of populations.” Science (American Association for the Advancement of Science), vol. 366, no. 6464, 2019, pp. 447–453, https://doi.org/10.1126/science.aax2342. 

Saxena, Ankur, and Shivani Chandra. Artificial Intelligence and Machine Learning in Healthcare. Singapore : Springer Singapore : Imprint: Springer, 2021.

Starke, Georg, et al. “Towards a pragmatist dealing with algorithmic bias in medical machine learning.” Medicine, Health Care, and Philosophy, vol. 24, no. 3, 2021, pp. 341–349, https://doi.org/10.1007/s11019-021-10008-5.

Ueda, Daiju, et al. “Fairness of artificial intelligence in healthcare: review and recommendations.” Japanese Journal of Radiology, 2023, https://doi.org/10.1007/s11604-02301474-3.

Williams, Betsy Anne, et al. “How Algorithms Discriminate Based on Data They Lack: Challenges, Solutions, and Policy Implications.” Journal of Information Policy (University Park, Pa.), vol. 8, 2018, pp. 78–115, https://doi.org/10.5325/jinfopoli.8.2018.0078.


Emma Stone is a junior studying Biomedical Engineering at Boston University’s College of Engineering. As an aspiring engineer, she is passionate about solving complex problems and proposing actionable solutions. She is particularly interested in the challenges posed by AI algorithms. While these algorithms have the potential to revolutionize many fields, including healthcare, their complexity and opacity can also introduce unique challenges, such as bias. With scrutiny and due diligence, she is confident that we can harness the transformative potential of AI across various fields, including healthcare.