Skip Navigation
Skip to contents

J Korean Acad Nurs : Journal of Korean Academy of Nursing

OPEN ACCESS

Articles

Page Path
HOME > J Korean Acad Nurs > Ahead-of print articles > Article
Research Paper
Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey
Hyun Kyoung Kimorcid

DOI: https://doi.org/10.4040/jkan.25086
Published online: October 28, 2025

Department of Nursing, Kongju National University, Gongju, Korea

Corresponding author: Hyun Kyoung Kim Department of Nursing, Kongju National University, 56 Gongjudaehak-ro, Gongju 32588, Korea E-mail: hkk@kongju.ac.kr
• Received: June 23, 2025   • Revised: September 2, 2025   • Accepted: September 2, 2025

© 2025 Korean Society of Nursing Science

This is an Open Access article distributed under the terms of the Creative Commons Attribution NoDerivs License (http://creativecommons.org/licenses/by-nd/4.0) If the original work is properly cited and retained without any modification or reproduction, it can be used and re-distributed in any format and medium.

  • 63 Views
  • 2 Download
  • Purpose
    This study aimed to develop a machine learning model to predict exclusive breastfeeding during the first 3 months after birth and to explore factors affecting breastfeeding outcomes.
  • Methods
    Data from 2,579 participants in the Korean Early Childhood Education & Care Panel between March 1 and June 3, 2025 were analyzed using Python version 3.12.8 and Colab. The dataset was split into training and testing sets at an 80:20 ratio, and five classifiers (random forest, logistic regression, decision tree, AdaBoost, and XGBoost) were trained and evaluated using multiple performance metrics and feature importance analysis.
  • Results
    The confusion matrix of the random forest classifier model demonstrated strong performance, with a precision of 86.6%, accuracy of 84.8%, recall of 96.8%, F1-score of 91.9%, and an area under the curve of 86.0%. Twenty-one features were analyzed, from which feeding plan, breastfeeding at 1 month, marriage period, maternal prenatal weight, self-respect, alcohol consumption, grit, value placed on children, maternal age, and depression emerged as important predictors of exclusive breastfeeding in the first 3 months.
  • Discussion
    A robust model was developed to predict exclusive breastfeeding that identified feeding planning and breastfeeding at 1 month as the most influential predictors. The model could be implemented in clinical and community settings to guide tailored breastfeeding support strategies, coupled with the integration of maternal self-respect, grit, and the value placed on children in counseling programs to promote exclusive breastfeeding.
Exclusive breastfeeding (EBF) is defined as receiving only breast milk—no other liquids or solids—except for oral rehydration solutions, or drops or syrups containing vitamins, minerals, or medicines, meaning that breast milk serves as the sole source of nutrition for the infant during the early postpartum period [1,2]. The Healthy People 2030 Initiative set a goal to increase the 6-month EBF rate from 27.2% (based on 2021 data) to 42.4% [2]. However, South Korea has shown persistently low and declining breastfeeding rates in recent years. The EBF rate at 3 months fell from 30.5% in 2018 to 19.3% in 2024, reflecting a significant downward trend [1].
Breastfeeding offers substantial health benefits for both infants and mothers. It provides optimal nutrition for infants, supporting healthy growth and development. Additionally, breastfeeding reduces the risk of several short- and long-term illnesses. Breastfed children have a lower risk of developing asthma, obesity, type 1 diabetes mellitus, and sudden infant death syndrome. Breastfeeding is also associated with reduced maternal risk of type 2 diabetes mellitus, hypertension, breast cancer, and ovarian cancer. Accordingly, the American Academy of Pediatrics recommends EBF for the first 3 months of an infant’s life [3]. Research has underscored the importance of 3 months of EBF, linking it to a reduced risk of infantile eczema up to 2 years of age, a 60% reduction in sudden infant death syndrome, and a 62% decrease in postnatal mortality. Continuing breastfeeding until 3 months also lowered the risk of ulcerative colitis and childhood obesity [3].
Key predictors of exclusive breastfeeding at 3 months include feeding intention, initial feeding practices, high breastfeeding self-efficacy, positive attitudes toward breastfeeding, low confidence in formula feeding, and fewer concerns about insufficient milk supply [4]. EBF at 3 months has also shown positive associations with breastfeeding support following hospital discharge, perceptions that formula has limited nutritional value, experiences of mastitis, return to work, and vaginal delivery [5]. Additional influential factors include prenatal intention to breastfeed, early skin-to-skin contact, EBF at hospital discharge, maternal self-efficacy, postpartum professional support, delayed introduction of formula, and supportive partner involvement [6]. Moreover, childbirth method, prenatal decisions regarding breastfeeding, breastfeeding at 1 month, and participation in prenatal parenting education programs have been identified as significant predictors [7]. This study examined features differentiating EBF status by incorporating variables identified in previous research, using Bronfenbrenner’s ecological systems theory—a framework that conceptualizes the individual’s environment as an interconnected ecosystem. The analysis included a range of micro- to macro-level influences, including individual, social, and institutional factors [8].
The persistently low EBF rate in Korea has not been fully explained by existing research. While this study does not seek to explain causality behind low EBF rates, it focuses on distinguishing breastfeeding status through data-driven classification. In this context, machine learning techniques offer a promising approach by enabling the identification of features that contribute to accurate differentiation of EBF versus non-EBF cases. Only a few studies have applied machine learning methods: for instance, studies in China [9] and Turkey [10] reported EBF rates at 6 months of 83% and 41%, respectively, compared to only 21% in Korea, underscoring the need for a context-specific approach. Machine learning provides a robust computational tool to manage high-dimensional, nonlinear data and reveal complex patterns that traditional methods may miss [11]. Accordingly, this study aimed to develop a model to classify EBF status at 3 months postpartum using machine learning, and to identify key features that contribute to this model’s classification performance.
1. Study design
This study employed a secondary analysis using machine learning techniques to examine factors influencing EBF at the first 3 months. This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.
2. Datasets
Publicly available data from the Korean Early Childhood Education & Care Panel (K-ECEC-P) 2022, collected by the Korea Institute of Child Care and Education (KICCE) [12], were used. The target population comprised mothers of children born in 2022. When their infants were 3 months old, mothers completed an online survey. Cross-sectional data on 3,380 mothers were obtained from the 2022 K-ECEC-P dataset. Of these, 2,579 mothers reported their breastfeeding status at 3 months and were included in the analysis. Data were accessed for analysis on March 1, 2025. KICCE provided the survey codebook, instrument profile, user guide, and the dataset for public use. Survey items covered maternal demographic, obstetric, psychological, social, emotional, and behavioral data. Twenty-one features were selected for analysis: maternal age, marriage period, maternal prenatal weight, employment status, number of children, twin pregnancy, type of birth, smoking, alcohol consumption, use of rooming-in, time of first breastfeeding, skin contact with baby, use of nursery, use of babysitter, breastfeeding at 0 months, breastfeeding at 1 month, feeding plan, self-respect, grit, value placed on children, and depression.
3. Sampling
This study conducted a secondary analysis of the 2022 K-ECEC-P dataset, an anonymized public dataset provided by the KICCE and accessed without modification from its official website. KICCE identified the target population based on the most recent national census data available at the time of the sampling design, utilizing the 2019 birth and death statistics from Statistics Korea, and employed quota sampling using a list of medical institutions with delivery records as the sampling frame. The sampling approach included both appropriate sampling units and frames to ensure representative selection. The sample included 143 obstetric clinics and hospitals nationwide, covering both regional and local institutions. Pregnant women were recruited through in-person interviews during visits to these facilities. Baseline survey I was administered using tablet-assisted personal interviews at initial contact.
Participants were mothers of children born between January and August 2022 who had completed Baseline Survey I of the K-ECEC-P. Exclusion criteria included childbirth prior to 2022 (i.e., preterm births), cases of miscarriage or infant death, and situations where the mother’s health was severely compromised. Additional exclusions applied to participants who refused to participate, demonstrated repeated non-cooperation, or were unable to complete the survey due to missing or invalid contact information. Of the 3,380 eligible participants, 594 were excluded due to non-response to the postpartum survey, 11 were excluded for not responding to the main survey, and 196 were removed because of excessive missing data. The final analytic sample comprised 2,579 participants, representing 76.3% of the initial eligible population. Within the final sample, 495 mothers reported EBF, while 2,084 mothers did not exclusively breastfeed, including 1,611 who used formula feeding only and 473 who practiced mixed feeding (Figure 1). For the purposes of the machine learning analysis, the final dataset was dichotomized into two groups: EBF (n=495) and non‑EBF (n=2,084).
4. Measurements

1) Label: exclusive breastfeeding

EBF was measured using a single question: “Please indicate the feeding method according to your baby’s age in months,” with response options “1=exclusive breastfeeding,” “2=mixed (breast and formula),” and “3=formula only.” Responses of 2 and 3 were recoded to 0 for dichotomization.

2) Features

(1) Self-respect

Self-respect was assessed using the Korean version of the Self-Respect Scale [13], adapted from the original Rosenberg Self-Esteem Scale [14]. This 10-item scale uses a 4-point Likert response format ranging from “not at all” (1) to “very true” (4). Cronbach’s alpha was .77 in the original and .83 in this study.

(2) Grit

Grit was measured using the Short Grit Scale (GRIT–S), developed by Duckworth and Quinn [15] and validated in Korean [16]. This 8-item scale uses a 5-point Likert format (“not at all”=1 to “very true”=5). Cronbach’s alpha was .77 in the original and .83 in this study.

(3) Value placed on children

The Value Placed on Children Scale assesses the importance parents place on children, with eight items covering two subdomains: emotional value (four items) and instrumental value (four items) [17]. The scale uses a 5-point Likert response from “not at all true” (1) to “very true” (5). Cronbach’s alpha was .88 in the original and .95 in this study.

(4) Depression

Depression was measured using the Korean version of the Edinburgh Postpartum Depression Scale (K-EPDS) [18], based on the 10-item EPDS [19]. Each item is scored on a 4-point Likert scale (0–3). Cronbach’s alpha was .85 in the original and .79 in this study.

(5) General and obstetric characteristics

Personal characteristics included maternal age (years), marriage period (months), maternal prenatal weight (kg), employment status (employed/unemployed), smoking (yes/no), and alcohol consumption (yes/no), all self-reported. Obstetric characteristics included number of children, twin pregnancy (yes/no), skin contact with baby (yes/no), type of birth (normal spontaneous vaginal delivery / cesarean section, combining elective and emergency cases), time of first breastfeeding (hours after birth), use of rooming-in (yes/no), use of nursery (yes/no), use of babysitter (yes/no), breastfeeding at 0 months, breastfeeding at 1 month, and feeding plan (yes/no), all self-reported.
5. Analysis
Supervised machine learning models were used to predict EBF, employing Python ver. 3.12.8 (https://www.python.org/) and the Google Colab environment ver. 1.2.0 (Colaboratory Chrome Extension; Google LLC). Python libraries used included Pandas, NumPy, Matplotlib, and Scikit-Learn, with specific imports such as matplotlib.pyplot as plt, seaborn as sns, plotly.express as px, sklearn, pandas as pd, and numpy as np for exploratory data analysis (EDA) and data processing. EDA included examining variable data types, assessing the distribution and normality of continuous variables, and applying standardization using the StandardScaler preprocessing method—specifically for the variable “marriage period.” The process also involved identifying missing values, detecting outliers coded as “9999” for maternal prenatal weight, and computing frequency counts for categorical variables. The K-ECEC-P provided a pre-cleaned dataset, and missing values were handled according to variable type: numerical values were imputed with the mean, and categorical values with the mode where applicable [20]. After preprocessing, the final analytic sample comprised 2,579 participants. Missing and outlier values were managed according to variable type: means were used to impute missing numerical data, and the mode was used for categorical data where applicable [20]. However, no categorical variables in this dataset contained missing values. Specifically, 18 missing values for marriage period were imputed with the mean. One missing value for the 3-month EBF variable was excluded from the analysis. As a result, the final analytic sample remained at 2,579 cases. The dataset was split into training and test sets in an 80:20 ratio.
After preprocessing, the cleaned dataset was used to train and evaluate random forest, logistic regression, decision tree, AdaBoost, and XGBoost classifier models. The performance of the five models was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), as well as confusion matrices. Accuracy was defined as the proportion of correct predictions (both positive and negative) out of all predictions. Precision was calculated as the proportion of true positives among all positive predictions. Recall was calculated as the proportion of true positives among all actual positives. The F1-score balances precision and recall, representing their harmonic mean. AUC-ROC assesses the model's ability to discriminate between classes [21,22]. In addition, 95% confidence intervals (CIs) for model performance metrics were estimated using the bootstrapping method. GridSearchCV was employed during model development to perform hyperparameter tuning by systematically searching for the optimal combination of parameter values to improve model performance. After model selection, important factors influencing EBF were assessed using absolute feature importance, derived differently across models: for the decision tree (Supplementary Figure 1) and random forest, importance was calculated based on the reduction in Gini impurity; for logistic regression, it was determined using the absolute values of standardized coefficients; for AdaBoost, it was obtained from the weighted impurity decrease across weak learners; and for XGBoost, it was evaluated using the ‘gain’ metric, representing each variable’s contribution to improving model performance [11,22].
6. Ethical statements
This study received approval from the Institutional Review Board (IRB) of KICCE (approval no., KICCEIRB-2022-01), which permitted the use of the dataset without additional IRB review. All study procedures conformed to the Declaration of Helsinki, and informed consent was obtained from all participants.
1. Data characteristics
The study population had a mean maternal age of 33.5±4.16 years, with an average marriage period of 24.33±13.91 months, a mean maternal prenatal weight of 68.05±12.17 kg, and an average number of children of 1.45±0.65. Employed mothers accounted for 50.7%, twin births for 5.1%, cesarean sections for 61.3%, alcohol consumption for 48.0%, smoking for 2.9%, and use of rooming-in for 70.6%. The timing of first breastfeeding was categorized as follows: ≤1 hour (4.6%), >1–24 hours (24.3%), >24–48 hours (20.8%), >48 hours–7 days (35.9%), and none (14.4%). Skin contact with the baby was reported by 47.3%. Use of nursery was reported by 86.2% of mothers and 52.2% reported use of a babysitter. Breastfeeding only at 0 months accounted for 24.6%, breastfeeding only at 1 month for 23.3%, and feeding plan for 24.2%. The mean score was 30.18±5.01 for self-respect, 22.26±4.25 for grit, 25.84±4.86 for value placed on children, and 7.78±5.65 for depression. Breastfeeding at 3 months included EBF (19.2%) and non-EBF (80.8%) (Table 1).
2. Model performance comparison
A confusion matrix was generated to compare five machine learning models—decision tree, random forest, logistic regression, AdaBoost, and XGBoost—using a confusion matrix. Model performance metrics included accuracy, precision, recall, F1-score, and AUC-ROC. The highest accuracy was observed with the random forest model (84.8%; 95% CI, 83.4–86.2), while the lowest was with the decision tree classifier (73.5%; 95% CI, 72.2–75.6). AdaBoost demonstrated the highest precision (88.3%; 95% CI, 87.1–89.6), with random forest showing the lowest (86.6%; 95% CI, 85.3–87.9). The random forest achieved the highest recall (96.8%; 95% CI, 96.1–97.5), while the decision tree had the lowest recall (78.7%; 95% CI, 77.1–80.3). The highest F1-score was also seen in the random forest (91.9%; 95% CI, 89.0–95.7), with the lowest in the decision tree (82.5%; 95% CI, 80.8–86.2). The highest AUC-ROC was recorded by both XGBoost and random forest (86.0%; 95% CI, 84.7–87.3). Thus, except for precision, overall model performance was superior in the random forest (Table 2).
3. Feature importance
The top 10 features important for predicting EBF were identified using the random forest. The most influential predictors at 3 months were feeding plan (.12; 95% CI, 0.09–0.14), breastfeeding at 1 month (.11; 95% CI, 0.09–0.13), marriage period (.06; 95% CI, 0.05–0.07), maternal prenatal weight (.06; 95% CI, 0.05–0.07), self-respect (.05; 95% CI, 0.05–0.06), alcohol consumption (.05; 95% CI, 0.04–0.07), grit (.05; 95% CI, 0.04–0.05), value placed on children (.05; 95% CI, 0.05–0.06), maternal age (.05; 95% CI, 0.04–0.05), and depression (.04; 95% CI, 0.04–0.05) (Table 3).
This study developed a predictive model to classify mothers practicing EBF at 3 months postpartum and identified key determinants of EBF. Grounded in Bronfenbrenner’s ecological systems theory, this study framework identified key contributors to distinguishing between EBF and non-EBF groups. The most influential predictors were feeding plan and breastfeeding at 1 month. Microsystem-level factors included marriage period, maternal age, and employment status, while health-related variables such as maternal prenatal weight and alcohol consumption also contributed substantially. Macro-system attributes—including grit, self-respect, the value placed on children, and depression—further improved the model’s classification performance.
In this study, the random forest demonstrated superior performance among the five machine learning models. Feature importance differed between models; in the random forest, importance was calculated by measuring how much each feature reduces impurity at each node. The random forest is widely recognized for its superior predictive accuracy and robustness, particularly in handling complex, high-dimensional data. It reduces overfitting by aggregating multiple decision trees through bootstrap sampling and random feature selection, thereby enhancing generalizability across datasets [23].
Existing studies have primarily addressed breastfeeding outcomes at 6 months or during the immediate postpartum period. Previous machine learning studies for breastfeeding prediction have reported accuracies ranging from 0.70 to 0.90 [9,10,24-26]. For instance, Açikgöz et al. [9] and Choi et al. [24] developed models to predict EBF at 6 months using the random forest algorithm, with reported accuracies of 0.72 and 0.76, respectively. Liu et al. [10] also focused on 6-month outcomes using the random forest algorithm, reporting accuracies ranging from 0.77 to 0.90, and further examined predictors including breastfeeding self-efficacy, intention, social support, and postpartum depression. Oliver-Roig et al. [25] and Walle et al. [26] analyzed breastfeeding initiation and early cessation during the in-hospital postpartum stay, reporting accuracies of 0.84 and 0.83, using XGBoost and random forest, respectively. Therefore, this study addresses a gap in the literature by developing a machine learning model specifically aimed at predicting EBF at 3 months postpartum. It further contributes by developing a predictive model tailored to the context of South Korea, where the 3-month EBF rate remains relatively low [1].
The top predictors spanned both micro-system factors, such as personal and demographic characteristics, and macro-system factors, such as values [8]. Unlike a previous study during the in-hospital postpartum period, which found older maternal age and normal body mass index increased EBF [25], this study identified longer marriage duration—often accompanying delayed marriage and advanced maternal age in South Korea—as associated with lower EBF rates. Although no prior studies have directly examined marriage duration, an analysis of Korean data showed that women aged 40–49 years had significantly lower odds of breastfeeding than those aged 19–29 years (odds ratio, 0.47) [27]. Similarly, research in the United Kingdom has linked maternal obesity to reduced breastfeeding outcomes, potentially due to physiological and psychological factors such as delayed lactogenesis and low body confidence [28]. Feeding plan and breastfeeding at 1 month were the strongest predictors, suggesting that early plans and experiences critically influence sustained EBF. This underscores the importance of prenatal counseling and immediate postpartum support.
This study also identified unique predictors not emphasized in prior machine learning research [9,10,24-26], including alcohol consumption, maternal grit, value placed on children, self-respect, and depression. Compared to non-drinkers, those who quit, reduced, or resumed drinking had lower odds of breastfeeding [29]. Depression negatively affects breastfeeding self-efficacy, while social support and positive attitudes enhance it [30]. Although self-respect and value placed on children have been less studied, alignment with family-centered norms has been linked to stronger breastfeeding beliefs and empowerment [31]. Grit, marked by persistence despite adversity, has emerged as a distinguishing trait among breastfeeding mothers [32]. Drawing on emotional availability theory, maternal psychological well-being—including self-respect and depression—may influence breastfeeding through emotional attunement and sensitivity toward the infant [33]. Macro-system characteristics such as self-respect, grit, and value placed on children were also critical predictors, highlighting the need for emotionally attuned, resilience-focused breastfeeding interventions. Previous machine learning studies have analyzed various predictors of breastfeeding, including maternal health problems and drinking water access [9], self-efficacy [10], maternal diabetes mellitus [24], neonatal weight, skin contact with baby, and prior maternal breastfeeding experience [25], as well as maternal age, cesarean section, and access to healthcare facilities [26]. While sociodemographic variables were commonly included across studies, the present study additionally incorporated behavioral factors (feeding plan, breastfeeding at 1 month) and psychological factors (self-respect, grit, value placed on children), which demonstrated strong predictive power for EBF at 3 months postpartum. These findings suggest the model’s effectiveness in early identification of mothers at risk of early breastfeeding cessation. The inclusion of such multidimensional factors highlights the strengths of machine learning in capturing complex interactions beyond traditional biomedical predictors, supporting the appropriateness of the variable selection in this study.
This study has several limitations. First, although the K-ECEC-P dataset is large and nationally representative, its cross-sectional design limits causal inference between predictors and EBF outcomes. Longitudinal data would allow for more robust predictive modeling and temporal interpretation. Second, EBF was measured using a single self-reported item at 3 months postpartum, which may introduce recall and social desirability bias, potentially affecting classification accuracy. Third, although this study incorporated a range of psychosocial, demographic, and obstetric variables, key predictors identified in previous research—such as breastfeeding self-efficacy, skin contact with baby, workplace breastfeeding support, and history of lactation consultation—were not available in the secondary dataset, potentially limiting the model’s scope and comprehensiveness. Fourth, the imputation of missing values using means or modes, although necessary, may have introduced bias. Finally, these findings are contextually based on Korean mothers and healthcare settings, so caution should be exercised when generalizing to other populations. Further external validation using diverse and longitudinal cohorts is needed to confirm the robustness and applicability of the model.
This study developed and evaluated machine learning models to predict EBF at 3 months postpartum using data from the K-ECEC-P. Among the five models tested, the random forest demonstrated the best overall performance, with high accuracy, precision, and AUC-ROC, making it a suitable tool for identifying key predictors of EBF. The analysis showed that early breastfeeding behaviors—particularly feeding plan and breastfeeding at 1 month—were the strongest predictors of EBF. Additionally, maternal psychological factors such as self-respect, grit, and value placed on children had significant effects on sustained breastfeeding. These findings underscore the importance of early intervention during the prenatal and early postpartum periods to support and encourage exclusive breastfeeding. Healthcare providers should prioritize enhancing maternal psychological readiness and reinforcing positive breastfeeding intentions and behaviors immediately after birth. By applying machine learning to maternal and infant health data, this study provides a data-driven framework for targeted interventions aimed at improving breastfeeding outcomes. Future research should employ longitudinal models and diverse populations to enhance generalizability and support the development of personalized breastfeeding support programs.

Conflicts of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgements

None.

Funding

This work was supported by the research grant of Kongju National University in 2025 and the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MIST) (No. RS-2023-00239284).

Data Sharing Statement

Please contact the corresponding author for data availability.

Supplementary Data

Supplementary data to this article can be found online at https://doi.org/10.4040/jkan.25086.

Supplementary Figure 1. Visualization of the model using a decision tree classifier. EBF, exclusive breastfeeding.

jkan-25086-Supplementary-Figure-1.pdf

Author Contributions

HKK participated in the conception, design of the study, the acquisition of data, drafted the first and final manuscript and funding acquisition.

Fig. 1.
Flow of study participants.
jkan-25086f1.jpg
Table 1.
Characteristics of datasets (N=2,579)
Characteristic Category Value
Maternal age (yr) 33.5±4.16 (17–49)
Marriage period (mo) 24.33±13.91 (6–62)
Maternal prenatal weight (kg) 68.05±12.17 (45–125)
Employment status Employed 1,308 (50.7)
Unemployed 1,271 (49.3)
No. of children 1.45±0.65 (1–4)
Twin pregnancy Yes 131 (5.1)
No 2,448 (94.9)
Type of birth Normal delivery 997 (38.7)
Cesarean section 1,582 (61.3)
Alcohol consumption Yes 1,237 (48.0)
No 1,342 (52.0)
Smoking Yes 75 (2.9)
No 2,504 (97.1)
Use of rooming-in Yes 1,820 (70.6)
No 759 (29.4)
Time of first breastfeeding ≤1 hr 119 (4.6)
>1–24 hr 627 (24.3)
>24–48 hr 537 (20.8)
>48 hr–7 day 925 (35.9)
None 371 (14.4)
Skin contact with baby Yes 1,221 (47.3)
No 1,358 (52.7)
Use of nursery Yes 2,222 (86.2)
No 357 (13.8)
Use of babysitter Yes 1,346 (52.2)
No 1,233 (47.8)
Breastfeeding at 0 mo Breastfeeding only 635 (24.6)
Mixed 1,602 (62.1)
Formula only 342 (13.3)
Breastfeeding at 1 mo Breastfeeding only 600 (23.3)
Mixed 1,171 (45.4)
Formula only 808 (31.3)
Feeding plan Yes 625 (24.2)
No 1,954 (75.8)
Self-respect 30.18±5.01(12–40)
Grit 22.26±4.25 (11–40)
Value placed on children 25.84±4.86 (9–40)
Depression 7.78±5.65 (0–29)
Breastfeeding at 3 mo EBF 495 (19.2)
Non-EBF 2,084 (80.8)

Values are presented as mean±standard deviation (minimum–maximum) or number (%).

EBF, exclusive breastfeeding.

Table 2.
Comparison of the performance of machine learning models (N=2,579)
Model Precision Accuracy Recall F1-score AUC-ROC
AdaBoost 88.3 (87.1–89.6) 82.4 (89.5–91.7) 90.6 (89.5–91.7) 89.6 (86.9–90.3) 84.0 (82.6–85.4)
XGBoost 87.5 (86.2–88.8) 84.6 (83.2–86.0) 93.6 (92.7–94.6) 90.7 (88.5–92.9) 86.0 (84.7–87.3)
Decision tree 87.0 (85.7–88.3) 73.5 (72.2–75.6) 78.7 (77.1–80.3) 82.5 (80.8–86.2) 85.0 (83.6–86.4)
Random forest 86.6 (85.3–87.9) 84.8 (83.4–86.2) 96.8 (96.1–97.5) 91.9 (89.0–95.7) 86.0 (84.7–87.3)
Logistic regression 87.4 (86.1–88.7) 83.6 (82.2–85.1) 93.5 (92.6–94.5) 86.7 (83.4–88.4) 85.0 (83.6–86.4)

Values are presented as % (95% confidence interval).

F1-score, harmonic mean of precision and recall; AUC-ROC, area under the receiver operating characteristic curve.

Table 3.
Top 10 feature importance values from the random forest (N=2,579)
Feature Absolute importance (95% CI) Real value
Feeding plan .12 (0.09–0.14) 0.12
Breastfeeding at 1 mo .11 (0.09–0.13) 0.11
Marriage period .06 (0.05–0.07) –0.06
Maternal prenatal weight .06 (0.05–0.07) –0.06
Self-respect .05 (0.05–0.06) 0.05
Alcohol consumption .05 (0.04–0.07) –0.05
Grit .05 (0.04–0.05) 0.05
Value placed on child .05 (0.05–0.06) 0.05
Maternal age .05 (0.04–0.05) –0.05
Depression .04 (0.04–0.05) –0.05

CI, confidence interval.

Figure & Data

REFERENCES

    Citations

    Citations to this article as recorded by  

      • ePub LinkePub Link
      • Cite
        CITE
        export Copy Download
        Close
        Download Citation
        Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

        Format:
        • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
        • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
        Include:
        • Citation for the content below
        Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey
        Close
      • XML DownloadXML Download
      Figure
      • 0
      We recommend
      Related articles
      Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey
      Image
      Fig. 1. Flow of study participants.
      Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey
      Characteristic Category Value
      Maternal age (yr) 33.5±4.16 (17–49)
      Marriage period (mo) 24.33±13.91 (6–62)
      Maternal prenatal weight (kg) 68.05±12.17 (45–125)
      Employment status Employed 1,308 (50.7)
      Unemployed 1,271 (49.3)
      No. of children 1.45±0.65 (1–4)
      Twin pregnancy Yes 131 (5.1)
      No 2,448 (94.9)
      Type of birth Normal delivery 997 (38.7)
      Cesarean section 1,582 (61.3)
      Alcohol consumption Yes 1,237 (48.0)
      No 1,342 (52.0)
      Smoking Yes 75 (2.9)
      No 2,504 (97.1)
      Use of rooming-in Yes 1,820 (70.6)
      No 759 (29.4)
      Time of first breastfeeding ≤1 hr 119 (4.6)
      >1–24 hr 627 (24.3)
      >24–48 hr 537 (20.8)
      >48 hr–7 day 925 (35.9)
      None 371 (14.4)
      Skin contact with baby Yes 1,221 (47.3)
      No 1,358 (52.7)
      Use of nursery Yes 2,222 (86.2)
      No 357 (13.8)
      Use of babysitter Yes 1,346 (52.2)
      No 1,233 (47.8)
      Breastfeeding at 0 mo Breastfeeding only 635 (24.6)
      Mixed 1,602 (62.1)
      Formula only 342 (13.3)
      Breastfeeding at 1 mo Breastfeeding only 600 (23.3)
      Mixed 1,171 (45.4)
      Formula only 808 (31.3)
      Feeding plan Yes 625 (24.2)
      No 1,954 (75.8)
      Self-respect 30.18±5.01(12–40)
      Grit 22.26±4.25 (11–40)
      Value placed on children 25.84±4.86 (9–40)
      Depression 7.78±5.65 (0–29)
      Breastfeeding at 3 mo EBF 495 (19.2)
      Non-EBF 2,084 (80.8)
      Model Precision Accuracy Recall F1-score AUC-ROC
      AdaBoost 88.3 (87.1–89.6) 82.4 (89.5–91.7) 90.6 (89.5–91.7) 89.6 (86.9–90.3) 84.0 (82.6–85.4)
      XGBoost 87.5 (86.2–88.8) 84.6 (83.2–86.0) 93.6 (92.7–94.6) 90.7 (88.5–92.9) 86.0 (84.7–87.3)
      Decision tree 87.0 (85.7–88.3) 73.5 (72.2–75.6) 78.7 (77.1–80.3) 82.5 (80.8–86.2) 85.0 (83.6–86.4)
      Random forest 86.6 (85.3–87.9) 84.8 (83.4–86.2) 96.8 (96.1–97.5) 91.9 (89.0–95.7) 86.0 (84.7–87.3)
      Logistic regression 87.4 (86.1–88.7) 83.6 (82.2–85.1) 93.5 (92.6–94.5) 86.7 (83.4–88.4) 85.0 (83.6–86.4)
      Feature Absolute importance (95% CI) Real value
      Feeding plan .12 (0.09–0.14) 0.12
      Breastfeeding at 1 mo .11 (0.09–0.13) 0.11
      Marriage period .06 (0.05–0.07) –0.06
      Maternal prenatal weight .06 (0.05–0.07) –0.06
      Self-respect .05 (0.05–0.06) 0.05
      Alcohol consumption .05 (0.04–0.07) –0.05
      Grit .05 (0.04–0.05) 0.05
      Value placed on child .05 (0.05–0.06) 0.05
      Maternal age .05 (0.04–0.05) –0.05
      Depression .04 (0.04–0.05) –0.05
      Table 1. Characteristics of datasets (N=2,579)

      Values are presented as mean±standard deviation (minimum–maximum) or number (%).

      EBF, exclusive breastfeeding.

      Table 2. Comparison of the performance of machine learning models (N=2,579)

      Values are presented as % (95% confidence interval).

      F1-score, harmonic mean of precision and recall; AUC-ROC, area under the receiver operating characteristic curve.

      Table 3. Top 10 feature importance values from the random forest (N=2,579)

      CI, confidence interval.


      J Korean Acad Nurs : Journal of Korean Academy of Nursing
      Close layer
      TOP