Developing a risk prediction model for breast cancer: a Statistical Utility to Determine Affinity of Neoplasm (SUDAN-CA Breast)
European Journal of Medical Research volume 22, Article number: 35 (2017)
Breast cancer risk prediction models are widely used in clinical settings. Although most of the well-known models were designed based on data collected from western population, yet they have been utilized for surveillance purposes in many limited-resource countries. Given the genetic variations in risk factors that exist between different races, we therefore aimed to develop and validate a tool for breast cancer risk assessment among Sudanese women.
Using cross-sectional design, 153 subjects were eligible to participate in our study. Data were collected from the only couple of tertiary centers in Sudan. They underwent multiple logistic regression using purposeful selection method to build the model. Various adjustments were made to determine significant predictors. Overall performance, calibration and discrimination were assessed by R 2, O/E ratio and c-statistic, respectively.
SUDAN predictors of breast cancer were: age, menarche, family history, vegetables and fruits weekly servings, and type of cereals that traditional cuisine is made of. Both Nagelkerke R 2 (0.495) and O/E ratio (0.78) were good. c-statistic expressed the excellent discriminatory power of the model (0.864, p < 0.001, 95% CI 0.81–0.92).
Our findings suggest that SUDAN provides a simple, efficient and well-calibrated tool to predict and classify women’s lifetime risks of developing breast cancer. Input from our model could be deployed to guide utilization of the more advanced screening modalities in resource-limited settings to maximize cost effectiveness. Consequently, this might improve the stage at which the diagnosis is usually made.
Breast cancer (BC) is the world’s most common malignancy with a constantly increasing incidence . Risk prediction models assess either:  group odds of developing breast cancer over time as BCRAT (Gail) model, or individual risks of inheriting a mutant BRCA1/2like BRCAPRO, BOADICEA, and the Myriad II prevalence tables . A practical overlap between the two objectives is present in some models. For instance, IBIS (Cuzick-Tyrer) mainly estimates the risk of breast cancer over time; furthermore, readouts of inheriting a mutant BRCA-1/2 are there. BRCAPRO serves the two goals inversely. Cuzick-Tyrer represents the most accurate prediction tool .
Modern empirical models are used mainly in defining individual risk to develop breast cancer. Their use is limited to clinical settings since they require detailed family history and genetic analysis for some of them . Applying them in population screening is not fully understood .
As group risk prediction models are used in anticipating breast cancer probability in a wide scale of population, they can help in offering benefits-harms weighted screening recommendations, managing patients, and improving risk reduction strategies. Women with 20–25% lifetime risk of BC are advised to undergo breast MRI according to American Cancer Society guidelines . US Preventive Services Task Force guidelines recommend BRCA1/2 mutation testing for women at a high risk of CA breast . FDA recommends prophylactic tamoxifen for women with 1.67% 5-years risk of BC .
Most of the known models represent analysis of data retrieved from American and British studies . This is not inclusive of other populations’ circumstances and may contradict with its generalization over other communities that differ. Moreover, models like Gail and Cuzick-Tyrer consider findings of invasive and costly interventions as biopsies to improve the overall accuracy. Inconsistently, a proven major risk factor, breast density, that could be detected by simple, cheap and non-invasive mammographic scan is not incorporated in any model . This study is trying to identify the risk factors for BC among Sudanese patients.
Study design and subjects
This is a multicenter, observational, retrospective, cross-sectional study conducted in breast clinics of Bashaier University Hospital (BUH) and Khartoum Center for Radiation and Isotopes (RICK), Khartoum, Sudan. Both are leading tertiary centers highly specialized in managing breast conditions among Sudanese and east African nationals. From October 2014 to September 2015, all eligible patients referred to either of the study facilities were considered. Inclusion criteria were: being a Sudanese female who had menarche, confirmed diagnosis with a standard triple assessment, cases of BC should be of primary type, and accepting participation without compensation. Uncertain diagnoses, breast metastasis from another primary focus, and comparison subjects with proliferative tumors were excluded. A total of 153 patients aged 32–74 years were enrolled for this study.
Variables and outcome
Data were collected using structured data forms. Height and weight were measured using stadiometer of 1 mm accuracy and calibrated 0.01 kg sensitive medical weight scale, respectively. The investigated 34 defined risk factors include demographic (age, education, ethnicity, etc.); medical (past history of benign or malignant breast disease); family history of BC; reproductive (e.g., menarche, age of marriage and birth of the first live child, durations of pregnancy and lactation, and menopause); pharmacological (either usage of hormonal contraception or risk reducing like aspirin, tamoxifen, and raloxifene); nutritional (traditional food cereals, meat and animal products; vegetables and fruits, and sugar servings); and lifestyle risks like physical exercise, smoking and alcohol intake.
Statistical analysis was performed using the Statistical Package for Social Sciences (SPSS), version 23.0 (SPSS, Chicago, IL, USA). Categorical variables were reported in numbers and percentages (N, %). Whereas, continuous variables were expressed as mean ± standard deviation (SD) at 95% confidence interval (CI).
Using SPSS, a dataset of the 153 participants was made. 2 × 2 contingency and frequency tables for the categorical and numerical variables were created to ensue occupancy of each cell. No empty cell was present. SUDAN CA Breast was built according to purposeful selection method. All risk factors individually underwent univariate analysis once-at-a-time using simple binary logistic regression. Only 13 with significant Wald χ 2 at p < 0.25 were qualified for a second phase analysis using multivariate logistic regression. Five variables with p < 0.05 were selected as predictors of the initial logit model. Parameter estimates of age, menarche, family history, and vegetables and fruits, and sorghum and millets were changed by 14.78, 8.88, 14.72, 7.67, 18.42, and 10.99%, respectively. Such levels exclude confoundings as none exceeded the 20% ceiling. Moreover, they changed insignificantly when variables dropped in phase 1 were subsequently added to the main effects model. All predictors fulfilled the assumptions of normality, absence of multicollinearity, and linearity between probability logit and age, menarche, and weekly servings of fruits and vegetables. Overall performance, discrimination and calibration of the final model were assessed for using R 2, c-statistic, and Hosmer–Lemeshow test, respectively.
Ethical clearance of the Research Ethics Committee in National Academy of Health Sciences was obtained. Institutional approvals were also received prior to commencement of the research. Our study strictly followed Declaration of Helsinki. Informed consents were obtained from all participants after ensuring anonymity.
Demographic and clinical features of participants
A total of 184 patients were referred to our facilities during study period. After assessing eligibility, 153 females were incorporated in the model (Fig. 1). 63 (41.2%) subjects were diagnosed with the cancer. The median age of our sample was 37.00 (± 14.30) years. It was 46.89 (± 14.99) years for breast cancer patients and 32.64 (± 10.41) years for those who did not have the disease. Patients who were illiterate were more than those who had not BC, 34.92% compared to 17.78%. Low socioeconomic status was invariably predominant among respondents (N = 132; 86.27%). 120 (78.43%) of our sample came from central Sudan, whereas only 4 were from eastern states. Almost half of our African subjects had breast cancer (N = 26; 48.15%), compared to 37 (37.37%) of the remaining ethnicities.
Interestingly, a positive past history of breast diseases was seen in one-third of the comparison group, close to the 26.98% BC patients. Twenty (31.75%) of cancer patients had a familial pattern of the disease which existed in 12 (13.33%) of the counter group. Menarche of BC patients was significantly earlier than others mean, 14.77 ± 1.70 compared to 14.05 ± 1.92 (p = 0.016). Conversely, menopause of the former at 48.09 (± 5.49) years was insignificantly delayed from counterparts’ mean (M = 26.11 ± 2.57, p = 0.312). Married BC patients were 56 (88.89%), making the figure eightfold the number of singles (7, 11.11%). They experience marriage at 20.73 (± 6.98) years, one and half year prior to unaffected participants 22.14 (± 6.21, p = 0.245) years. Most participants with cancer have given birth to single or multiple offspring (52; 82.54%) at the age of 21.25 (± 6.28) years.
Contraception was used by only 18 (28.57%) patients, 14 of them received contraceptive pills. This was even less than those used contraception and do not gain the disease (28; 31.11%). Regular use of the prophylactic acetylsalicylic acid was used by only 4 (6.35%) and 5 (5.56%) subjects with and without malignancy. Tamoxifen and raloxifene were taken by none.
Two of the four nutritional patterns that have been studied were significant factors for breast cancer, cereals and fruits and vegetables servings. Sudanese frequently eat their traditional meals that are rich in starchy ingredients. Types of cereals, meals are made of, were investigated. Among the 23 (15.03%) subjects who consume millet, 69.57% harbor BC. Most of our samples eat vegetables and fruits throughout the week with an indiscriminate mean between healthy and ill individuals, 6.62 (± 0.958) and 6.79 (± 0.679) servings per week, respectively.
Uni- and multivariate analysis
Table 1 contains the results of the univariate analysis of the 34 known risk factors resulted in only 13 significant at p < 0.25. When incorporated into a single multivariate logistic regression, the 5 with p < 0.05 were the predictors (Table 2).
A logistic regression was performed to assess the effects of age, menarche, family history, vegetables and fruits weekly servings, and type of cereals that traditional cuisine is made of in causing BC. The logistic regression model was statistically significant (χ 2 = 70.027, p < 0.001). SUDAN had successfully classified 49.5% of variance existed between the two groups (Nagelkerke R 2 = 0.495). Age has a significant positive relation with breast cancer. An increase in age by 1 year increases the odds of BC by 1.103 compared to the preceding year. Conversely, menarche is inversely related to the disease with each year of delay that minimizes lifetime risks by 0.273. Positive family history of BC triples likelihood of developing malignancy during lifetime. Increasing the number of weekly servings of vegetables and fruits by a single digit reduces the risk of BC by 0.232. Classical Sudanese gastronomy is based on three main cereals. Using wheat as a reference point, subjects who depend on sorghum had a threefold increase in their odds of breast cancer. Even more, millet tops risk up by five times.
SUDAN was assessed in terms of overall performance, discrimination and calibration using R 2, c-statistic, and Hosmer–Lemeshow Goodness of fit (GOF) test, respectively. With a Nagelkerke R 2 of 0.495, the model is capable of justifying almost half of the variance in breast cancer. Discriminatory power measured by c-statistic of AUC equals to 0.864 (p < 0.001, 95% CI 0.81–0.92). Hosmer–Lemeshow GOF test showed an insignificant difference between observed and predicted probabilities making the model well calibrated (χ 2 = 7.159, 8 df, p = 0.520). Internal validation of the model revealed an O/E ratio of 0.78.
To the best of our knowledge, SUDAN is the pioneering tool to assess risk of developing BC in Sudan, Africa, and Middle East. Moreover, it is the first ever of its kind been built using purposeful selection technique. It represents a multivariate logistic regression analysis of data collected in a cross-sectional study, which specified five predictors of breast cancer: age, menarche, family history, vegetables and fruits weekly servings, and type of cereals used. The former three are shared by Gail and Cuzick-Tyrer models. Claus, BRCAPRO, and BOADICEA have the same triad except menarche . None of our nutritional factors was a predictor in any of the previous models. However, the relationship between nutrition and BC has been established in many studies. Kamath et al.  stated that strict vegetarians are three times less likely to have the disease (OR 2.80, 95% CI 1.15–6.81). Another study found a significant association between starchy food, like cereals, and certain categories of BC, ER-ve subtypes . By increasing weekly servings of vegetables and fruits and replacing millet and sorghum by wheat, risks get minimized in favor of primary prevention.
The two most important predisposing factors for BC are increasing age and positive family history . The presence of menarche among predictors can be expressed molecularly. Association between reproductive risk factors of BC, including menarche, and ER+ tumors, has been previously described [14, 15]. Fadl Elmoula et al.  reported a similar prevalence of ER+ among Sudanese and western women. On the contrary, since ER+ status is much less among Asians [17,18,19], risk prediction model for Thai appeared without menarche. Exceptionally, as receptor status of Koreans and western women are comparable , four out of KoBCRAT predictors were reproductive risks.
With an OR of 3.13, familial factors appear to play a key role in causing breast cancer. Genetically, breast cancer has autosomal dominant inheritance, predominantly for BRCA1/2 genes. Elnour et al.  found that BRCA1/2 mutations to be common in Sudan. Awadelkarim et al.  reached similar conclusions; moreover, they highlighted that 28% of their patients’ germline mutations to be novel. Furthermore, 24.3% of identified point mutations studied by Biunno et al.  poses unknown clinical significance, and 42.4% of them are unique to Africa.
The significance of family history and consequently BRCA1/2 predisposition may be attributed to the widespread consanguineous marriages in Sudan. The nation has the highest global rate of matrimonies involving couples who are first degree cousins (45%) . This was even higher in Saha et al.’s  study (49.5%), with additional 13% of participants married to relative husbands of further degrees.
SUDAN has a concordance index of 0.864, which is described as excellent . This was much higher than the reported c-statistic of Gail Model in a study among western women (meta-analyzed c-statistic = 0.63; 95% CI 0.59–0.67) . KoBCRAT has a c-statistics of 0.63 for < 50 year women and 0.65 for others . This high discriminatory accuracy qualifies our model to meet screening purposes and define population at high-risk, which better redirect our limited resources .
SUDAN model O/E ratio of 0.78 makes it second to Cuzick-Tyrer only (0.81; 95% CI 0.62–1.08), according to Amir et al. . Corresponding values for Claus, BRCAPRO, and Gail were 0.56 (95% CI 0.43–0.75), 0.49 (95% CI 0.37–0.65), and 0.48 (95% CI 0.37–0.64), respectively. Since our overall O/E ratios of less than 1.00, subjects that really had the disease were less than expected. This raises false positives; however, it is acceptable as it increases catchment and, consequently, screening purposes which represent the principle aim of our model. O/E ratio will improve with an increase in the sample size as it better assesses predictors.
This study has some limitations which are addressed here. Firstly, since the cross-sectional design was used, causality relationship for significant variables could not be proven. Secondly, the model offers group predictions in the form of OR. Individual relative risks (RR) could not be calculated as the study design was not longitudinal follow-up. Thirdly, although regression sample size is an issue that is not agreed upon, however, it is relatively small though fulfilling rule of ten.
Conclusion and recommendations
Breast cancer constitutes a real problem in Sudan. Early onset, late presentation, and limited resources are the characteristics unique to our national context of BC. This triad could be better dealt with by inventing a prediction tool. SUDAN considers risk factors in women of different ages and ranks them accordingly. This makes them aware of their odds and subsequently improves early detection. Cost effectiveness will be enhanced further by recommending screening measures for high-risk individuals. Our model showed good calibration and excellent discriminatory power.
Factors influence the role of family history and the prevalent BRCA1/2 predisposition are results of the high rates of consanguineous marriage. However, further genetic studies on the penetrance of BRCA1/2 and predisposition to other mutant genes should be considered. Utilization of purposeful selection method is advisable particularly when sample size used is relatively small. The authors support previous recommendations on establishing a national registry for cancer in Sudan. This will offer a potential data source for externally validating this model and developing it further to better fit Sudanese women.
Korean Breast Cancer Risk Assessment Tool
- O/E ratios:
- SUDAN-CA Breast:
Statistical Utility to Determine Affinity of Neoplasm (SUDAN Model)
Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin. 2005;55:74–108.
Evans DG, Howell A. Breast cancer risk-assessment models. Breast Cancer Res. 2007;9(5):213–20.
Frank TS, Deffenbaugh AM, Reid JE, et al. Clinical characteristics of individuals with germline mutations in BRCA1 and BRCA2: analysis of 10,000 individuals. J Clin Oncol. 2002;20:1480–90.
Bellcross C. Approaches to applying breast cancer risk prediction models in clinical practice. Comm Oncol. 2009;6(8):373–82.
Nelson HD, Huffman LH, Fu R, et al. Genetic risk assessment and BRCA mutation testing for breast and ovarian cancer susceptibility: systematic evidence review for the U.S. Preventive Services Task Force. Ann Intern Med. 2005;143(5):362–79.
Saslow D, Boetes C, Burke W, et al. American Cancer Society guidelines for breast screening with MRI as an ad- junct to mammography. CA Cancer J Clin. 2007;57:75–89.
US Preventive Services Task Force (USPSTF). Genetic Risk Assessment and BRCA Mutation Testing for Breast and Ovarian Cancer Susceptibility. Ann Intern Med. 2005;143:355–61.
Fisher B, Costantino JP, Wickerham DL, et al. Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-1 study. J Natl Cancer Inst. 1998;90:1371–88.
Gail MH, Mai PL. Comparing breast cancer risk assessment models. JNCI. 2010;102(10):665–8.
Friedenson B. Assessing and managing breast cancer risk: clinical tools for advising patients. Med Gen Med. 2004;6(1):8.
Kamath R, Mahajan KS, Ashok L, Sanal TS. A study on risk factors of breast cancer among patients attending the tertiary care hospital, Udupi district. Indian J Community Med. 2013;38(2):95–9. doi:10.4103/0970-0218.112440.
Laious M, Boutron-Ruault MC, Fabre A, Clavel-Chapelon F, Romieu I. Carbohydrate intake, glycemic index, glycemic load, and risk of postmenopausal breast cancer in a prospective study of French women. Am J Clin Nutr. 2008;87(5):1384–91.
Dumitrescu RG, Cotarla I. Understanding breast cancer risk-where do we stand in 2005? J Cell Mol Med. 2005;9(1):208–21.
Althuis MD, Fergenbaum JH, Garcia-Closas M, Brinton LA, Madigan MP, Sherman ME. Etiology of hormone receptor—defined breast cancer: a systematic review of the literature. Cancer Epidemiol Biomarkers Prev. 2004;13:1558–68.
Tsakountakis N, Sanidas E, Stathopoulos E, Kafousi M, Anogiannaki N, Georgoulias V, Tsiftsis DD. Correlation of breast cancer risk factors with HER-2/neu protein overexpression according to menopausal and estrogen receptor status. BMC Women’s Health. 2005;5(1):1. doi:10.1186/1472-6874-5-1.
Fadl Elmoula RM, Farag Alla HEH. The association of ki67 proliferation marker with estrogen, progesterone and HER2 markers among Sudanese females with breast cancer. J Biomed Pharmaceut Res. 2014;3(6):125–30.
Chuthapisith S, Permsapaya W, Warnnissorn M, Akewanlop C, Sirivatanauksorn V, Osoth PP. Breast cancer subtypes identified by the ER, PR and HER-2 status in Thai women. Asian Pac J Cancer Prev. 2012;13:459–62.
Telli ML, Chang ET, Kurian AW, Keegan TH, McClure LA, Lichtensztajn D, Ford JM, Gomez SL. Asian ethnicity and breast cancer subtypes: a study from the California Cancer Registry. Breast Cancer Res Treat. 2011;127:471–8. doi:10.1007/s10549-010-1173-8.
Wiechmann L, Sampson M, Stempel M, Jacks LM, Patil SM, King T, Morrow M. Presenting features of breast cancer differ by molecular subtype. Ann Surg Oncol. 2009;16:2705–10. doi:10.1245/s10434-009-0606-2 .
Bae YK, Gong G, Kang J, Lee A, Cho EY, Lee JS, Suh KS, Lee DW. Hormone receptor expression in invasive breast cancer among Korean women and comparison of 3 antiestrogen receptor antibodies: a multi-institutional retrospective study using tissue microarrays. Am J Surg Pathol. 2012;36(12):1817–25. doi:10.1097/PAS.0b013e318267b012.
Elnour AM, Elderdery AY, Mills J, Mohammed BA, ElbietAbdelaal D, Mohamed AO, Elhassan KE, Hady A, Wahab A, Cooper A. BRCA 1 & 2 mutations in Sudanese secondary school girls with known breast cancer in their families. Int J Health Sci (Qassim). 2012;6(1):63–71.
Awadelkarim KD, Aceto G, Veschi S, Elhaj A, Morgano A, Mohamedani AA, Eltayeb EA, Abuidris D, Di Dioacchino M, Battista P, Verginelli F, Cama A, Elwali NE, Mariani-Costantini R. BRCA1 and BRCA2 status in a Central Sudanese series of breast cancer patients: interactions with genetic, ethnic and reproductive factors. Breast Cancer Res Treat. 2007;102(2):189–99.
Biunno L, Aceto G, Awadelkarim KD, Morgano A, Elhaj A, Eltayeb EA, Abuidris DO, Elwali NE, Spinelli C, De Blasio P, Rovida E, Mariani-Costantini R. BRCA1 point mutations in premenopausal breast cancer patients from Central Sudan. Fam Cancer. 2014;13(3):437–44. doi:10.1007/s10689-014-9717-4.
Al-Gazali L, Hamamy H, Al-Arrayad S. Genetic disorders in the Arab world. BMJ. 2006;333(7573):831–4. doi:10.1136/bmj.38982.704931.AE.
Saha N, Hamad RE, Mohamed S. Inbreeding Effects on Reproductive Outcome in a Sudanese Population. Hum Hered. 1990;40(4):208–12. doi:10.1159/000153932.
Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken: John Wiley & Sons, Inc.; 2013. p. 177.
Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat. 2012;132:365–77.
Park B, Ma SH, Shin A, Chang MC, Choi JY, Kim S, Han W, Noh DY, Ahn SH, Kang D, Yoo KY, Park SK. Korean risk assessment model for breast cancer risk prediction. PLoS ONE. 2013;8(10):e76736. doi:10.1371/journal.pone.0076736.
Gail MH. Twenty-five Years of Breast Cancer Risk Models and Their Applications. JNCI J Natl Cancer Inst. 2015;107(5):1–6. doi:10.1093/jnci/djv042.
Amir E, Evans DG, Shenton A, Lalloo F, Moran A, Boggis C, Wilson M, Howell A. An evaluation of breast cancer risk assessment packages in the family history evaluation and screening programme. J Med Genet. 2003;40:807–14.
AS conceptualized and designed the study, analyzed and interpreted data, and drafted the manuscript. AY helped with statistical interpretation. DA, MA, AY, and MN revised the draft. All authors read and approved the final manuscript.
The authors would like to thank Prof. Mustafa Numairi, Professor of Biostatistics, Faculty of Medicine, Neelain University, for his endless scientific contributions.
The authors declare that they have no competing interests.
Availability of data and materials
Authors will keep data as they are working on it to develop an application software.
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Salih, A.M., Alam-Elhuda, D.M., Alfaki, M.M. et al. Developing a risk prediction model for breast cancer: a Statistical Utility to Determine Affinity of Neoplasm (SUDAN-CA Breast). Eur J Med Res 22, 35 (2017). https://doi.org/10.1186/s40001-017-0277-6