Diversity

Promoting ethnic diversity in health data: bridging the gap in healthcare inequities

Finding ways to diversify ethnic representation in clinical research is crucial for creating a more equitable, accessible, and impactful scientific future.


The COVID-19 pandemic has highlighted the importance of collecting and interpreting health data. Not only does it help shape our understanding of infection but it allows us to track health trends and devise effective disease management strategies that can positively advance public health. But health policies and measures are only as effective as the diversity of the data that they are informed by. If certain demographic groups are not included or underrepresented in research datasets, they risk being underserved by the solutions, medication or technology that the research underpinning them intends to develop.

Unfortunately, health inequities are perpetuated by a lack of diversity in clinical research, which is further compounded by unconscious biases in the development of new medical devices and technologies. However, there are measures that can be taken to ensure that no one is left behind. 

In this article, I will focus on the ways to improve the ethnic diversity of clinical data. It's important to note that this focus on ethnicity does not diminish the importance of other identity determinants such as gender, sexuality, or neurodivergence. Each of these deserves its own dedicated piece in isolation as there is much to explore on each topic.

What did COVID-19 teach us? 

The COVID-19 pandemic highlighted the uncomfortable truth of ethnic inequalities in healthcare. A recent UK analysis of 17 million primary care and death records revealed that individuals of Black or South Asian ancestry had a disproportionately higher risk of death, with an approximate twofold increase compared to those who identified as White (after adjusting for age and sex differences). 

And while the magnitude of disparities underscored by this COVID-19 dataset may be challenging to accept, collecting data along ethnicity lines has provided valuable insights into the underlying issue and it is hoped that this could lead to the creation of new policies and practices to address these inequalities in the future.

At a surface level, the pandemic exposed the need for better data collection practices. A research study conducted by the Nuffield Trust revealed that ‘poor data about ethnicity has obscured the true extent of ethnic disparities in the impact of the pandemic’. The study further discovered that 13% of patient records in the UK do not contain a valid ethnic group classification, while 8.5% are categorised as 'not stated', and 8.8% as 'other', which are legal categories but not useful for analytical purposes.

Gaps in patient data like this have far-reaching implications, including the efficacy of clinical trials. Research has shown that ethnicity can play a significant role in the effectiveness of medicines. However, for COVID-19 vaccine trials, ethnicity data was not considered a priority. A study showed that out of 20,692 US-based trials with reported results representing ∼4.76 million enrollees, only 43% reported any ethnicity data. Moreover, where the data was available, the majority of enrollees were White (79.7%), while Black participants accounted for just 3% of major vaccine trial participation, despite representing 21% of COVID-19 related deaths. 

An all-too-familiar challenge

Unfortunately, the COVID-19 pandemic starkly revealed a longstanding and systematic failure within clinical data to accurately represent ethnicity. Let’s consider ischemic heart disease (IHD) as an example. IHD stands as the most common cause of premature death in the UK; and of those that die from IHD, diabetes is the main contributing factor. However, our understanding of these conditions is largely derived from seminal studies like the Framingham Heart Study, the Physicians Health Study, and the Nurses Health Study, wherein 90% of participants were categorised as ‘White’.

Perhaps more alarming, is a review of 12 diabetes trials conducted in the UK. It revealed that despite comprising 11.2% of the UK diabetes population, South Asians accounted for only 5.5% of the participants. Additionally, it is concerning that ethnicity was not even reported in four of these studies. While diabetes is a complex condition that cannot be generalised based on observations in one population, it is often the case that biased views of what is normal' are established. To better advance our understanding of the disease and mitigate the risk for a broader range of the world's population, it is crucially important to consider diverse populations and avoid making universal assumptions.

Similarly the 'triglyceride paradox' vividly highlights the oversight within metabolic syndrome criteria — a set of indicators used to identify individuals at risk of type 2 diabetes and heart disease. Typically, metabolic syndrome encompasses at least three of the following biomarkers: elevated triglyceride levels, low high-density lipoprotein levels (HDL), increased waist circumference, high blood pressure, and fasting hyperglycemia. However, during the formulation of these criteria, the unique physiology of Black individuals was disregarded.

As a consequence, the distinct lack of association between insulin resistance and hypertriglyceridemia in Black individuals went unrecognised. Subsequently, it's now understood that hypertriglyceridemia, a hallmark of insulin resistance, occurs less frequently in Black individuals, leading to a lower prevalence of metabolic syndrome among this ethnicity. In essence, a critical diagnostic test for predicting the onset of type 2 diabetes and heart disease proves inadequate for Black individuals, resulting in under-diagnosis and heightened long-term health risks.

The dangers of health data uniformity are not limited to diabetes and metabolic conditions. For instance, a primary preventative method for heart disease in the UK involves the use of a validated risk-prediction algorithm, the QRISK score, to guide treatment decisions for individuals. This score synthesises various risk factors, including blood pressure, lipids, and ethnicity, to generate a 'risk score' for developing heart disease, and people who reach a certain threshold are identified for treatment. However, the QRISK score is based on large epidemiological datasets and, while effective at predicting risk at a population level, it cannot account for all the unique factors that impact an individual's risk of disease.

Incorporating genetic data into risk scores can enable a more personalised and precise approach. Large genome-wide association studies (GWAS) conducted over the last two decades have confirmed that cardiovascular disease is polygenic, meaning it is caused by the coexistence of thousands of gene variants. However, the polygenic risk scores that have been developed so far are derived mainly from a cohort of European ancestry. This is because Europeans represent 78% of people in GWAS studies, while only 2% of participants are of African descent. Consequently, polygenic risk scores are 4.5 times more accurate for people with European ethnicity than African.

Race correction in algorithms

Ethnic and racial categories are socially constructed, and are poor proxies for the relationship between geographical ancestry and genetic makeup. Indeed, studies have shown that there is more genetic variation within ethnic groups than between them. When race is conflated with genetic makeup, it can lead to subtle biases in medical practices, resulting in widening health disparities and worsened outcomes for ethnically diverse people. This issue is particularly noticeable within diagnostic algorithms. 

A notable example can be observed in nephrology, where the CKD-EPI algorithm had been considered a reliable measure of estimated glomerular filtration rate (eGFR), an important marker for kidney health. The algorithm used race, along with age, gender, and levels of creatinine, and regularly reported higher eGFR levels in Black people, suggesting they had better kidney function. This finding was originally justified with evidence of higher than average serum creatinine concentrations among Black people, incorrectly explained by the idea that they are more muscular. However, studies have since cast doubt on this explanation, leading to one analysis that found removing race from the algorithm may increase the prevalence of Black adults with chronic kidney disease from 14.9% to 18.4%. After a review of the evidence, the National Institute for Health and Care Excellence (NICE) recommended the removal of race correction factors from the CKD-EPI algorithm.

Algorithms must take into account the impact of socioeconomic factors on health conditions and outcomes. Social deprivation is often correlated with ethnicity and may be part of the causal nexus linking ethnicity to poor health. For instance, in the United States, the 'Vaginal Birth After Caesarean' (VBAC) algorithm is designed to evaluate the risk associated with attempting a vaginal delivery for women who have previously undergone a caesarean section. The algorithm predicts that, all other factors being equal, African American or Hispanic women have a lower chance of a successful delivery.

As a result, these women are less likely to choose (or be offered) a vaginal birth. However, in its predictive analysis, the algorithm neglects crucial factors such as marital status and medical insurance, both of which significantly influence a woman's overall health and her access to quality healthcare. This oversight can directly impact their birthing experience. Conversely, White women, who statistically belong to higher socioeconomic groups, are deemed at lower risk. This is particularly noteworthy because vaginal deliveries offer distinct health advantages, including reduced rates of surgical complications, shorter recovery periods, and fewer complications during subsequent pregnancies. By inadequately accounting for socioeconomic variables, the VBAC algorithm inadvertently exacerbates health disparities among different ethnic groups, rather than mitigating them.

Charting a more diverse pathway forward

  1. Analyse all the causes
    To address a problem, it is crucial to first understand it. In the post-COVID-19 era, there seems to be a growing appetite for analysing the gaps in healthcare data. A major piece of analysis conducted by the UK Office for National Statistics (ONS) in partnership with the Race Equality Foundation and Wellcome Trust highlights the extent of the challenge. Among the problems identified were the lack of standardised definitions for specific ethnicities, patient reluctance to provide their ethnicity, and the absence of a mechanism to audit the data. Regarding the first point, the Nuffield Trust has issued guidance on how to improve ethnicity coding in health data, emphasising the need for a collaborative effort among all NHS organisations, including providers, commissioners, GP practices, and the Care Quality Commission.

  2. Tackle the trust deficit
    Data coding is meaningless if people are not willing to share their ethnicity. It is important to acknowledge that ethnic minority groups often have a deep-rooted mistrust of public institutions. In its response to the Commission on Race and Ethnic Disparities, the UK government recognised that ‘there is clearly still a trust deficit which some groups have towards the UK and many of its institutions’. Education plays a significant role in addressing this issue. Health systems have a dual responsibility to train healthcare providers and practitioners about the significance of ethnicity data and how to communicate this message to their patients in a culturally sensitive manner. Deloitte's recent white-paper offers further insight on this point.

  3. Police the data
    It is important to recognise that quantity does not equal quality when it comes to ethnicity data. Just having access to this information does not necessarily mean it should be used. Clinicians must become better at determining when ethnicity data is relevant and avoid falling prey to biases and stereotypes. Instead, they should rely more on specific drivers of health and the patient's medical history, as well as that of their family.

  4. Encourage research participation
    Research plays a crucial role in addressing health disparities. As previously outlined, treatments and medications developed using an undiverse pool of participants risk being less effective for those ethnicities that are not sampled. Unfortunately, some groups are systematically excluded from research studies, making it difficult to obtain representative data. To address this issue, NHS England has published a useful guide for researchers engaging with underrepresented communities.

    Additionally, the advent of Digital Health Technologies, like remote diagnostics are making strides in terms of accessibility. By removing physical barriers to participation, remote diagnostics are powering a decentralisation trend within clinical trials, which has the potential to greatly improve inclusivity for medical research.

  5. Stop being so WEIRD
    Genomic data primarily relies on samples from Western, Educated, Industrialised, Rich, and Democratic (WEIRD) countries, leading to the creation of biased and imbalanced datasets. This further widens the gap in equitable healthcare as not all clinical insights are equally accurate or relevant for people from different regions or ethnic backgrounds. This underscores the need for more diverse and inclusive genomic research to ensure that all individuals benefit from the advances in healthcare equally.

    However, it is equally crucial to emphasise the importance of augmenting diverse genomic datasets with additional contextual data. By factoring in people's conditions and varied life circumstances, we can gain a better understanding of how genetics and social factors interact with each other. Genomics England is at the forefront of this initiative and is an excellent resource for those seeking further information on the subject.

  6. Evaluate the use of race in clinical algorithms
    Instead of relying on race, algorithms should prioritise genetics datasets, provided they are diverse enough to avoid bias and imbalances. However, when race is used, clinicians must proceed with caution. They should consider upfront whether race or ethnicity is biologically linked to the clinical outcome of interest. If there is a biological link, it is critical to understand why this is the case. While true genetic differences could explain the link, it is unlikely due to the heterogeneity within as opposed to between groups. More plausibly there could be social and socioeconomic factors affecting certain groups, leading to variations in clinical outcomes. Therefore, it is crucial to be mindful of these factors when using race or ethnicity in clinical algorithms and take a cautious and informed approach.

    The New England Journal of Medicine (NEJM) provides a helpful framework for determining how much weight race should be given. Asking a set of questions like:

    • Is the need for race correction based on robust evidence and statistical analyses? 

    • Is there a plausible causal mechanism for the racial difference that justifies the race correction?

    • Would implementing this race correction relieve or exacerbate health inequities?

     

    The NEJM also rightly emphasises the need for clinicians to take responsibility for the algorithms they use and ‘discern whether the correction is likely to relieve or exacerbate inequities. By critically evaluating the use of race in clinical algorithms, clinicians can ensure that they are not perpetuating health inequities and that they are providing equitable healthcare to all individuals, regardless of their race or ethnicity.

Conclusion

COVID-19 brought to light the failure of clinical data to accurately represent ethnicity. The implications of perpetuating these failures extend far beyond our ability to manage one pandemic. When ethnic groups are excluded from clinical research and go unrecorded in medical data, they are underserved by healthcare initiatives. This includes the potential of polygenic risk scores to provide personalised risk predictions with precision. 

It is essential to address these issues by ensuring that clinical data accurately represents ethnic diversity. By doing so, we can improve our understanding of how diseases manifest in different populations and tailor healthcare initiatives to meet the needs of all individuals equally".


Researchers and clinicians alike must be prepared to call out these shortcomings for what they really are; systemic racism. Accepting the current levels of healthcare inequality is unacceptable. We must hold ourselves accountable, ensure diverse representation in data and research practices, and advocate for policies that promote diversity and inclusion.

The good news is that we are not ignorant to the issues surrounding clinical data bias and inaccuracy. We know what needs to be done to address it, and the key is collaboration. The problems and solutions are multifaceted, and we need to bring together experts from various domains, including researchers, practitioners, data scientists, public health bodies, social inclusion organisations, and businesses with innovative incentives in the remote diagnostics space. Collaboration is crucial to achieving our shared goals, and we must continue to work together to address these issues and promote diversity, equity, and inclusion in healthcare.

Get in touch with a member of the Thriva Solutions team today to learn more.

Similar posts

Get notified on new healthcare news and insights 

Be the first to know about new insights in today's healthcare industry.