Hyun Kyoung Kim

doi:10.4040/jkan.25086

Articles

Page Path: HOME > J Korean Acad Nurs > Volume 55(4); 2025 > Article

Research Paper
Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey: Hyun Kyoung Kim; Journal of Korean Academy of Nursing 2025;55(4):519-527.
DOI: https://doi.org/10.4040/jkan.25086
Published online: October 28, 2025

Department of Nursing, Kongju National University, Gongju, Korea

Corresponding author: Hyun Kyoung Kim Department of Nursing, Kongju National University, 56 Gongjudaehak-ro, Gongju 32588, Korea E-mail: hkk@kongju.ac.kr

• Received: June 23, 2025 • Revised: September 2, 2025 • Accepted: September 2, 2025

This is an Open Access article distributed under the terms of the Creative Commons Attribution NoDerivs License (http://creativecommons.org/licenses/by-nd/4.0) If the original work is properly cited and retained without any modification or reproduction, it can be used and re-distributed in any format and medium.

2,652 Views
217 Download

prev next

Full Article

Download PDF

Abstract
Introduction
Methods
Results
Discussion
Conclusion
Article Information
References

Abstract

Purpose
This study aimed to develop a machine learning model to predict exclusive breastfeeding during the first 3 months after birth and to explore factors affecting breastfeeding outcomes.
Methods
Data from 2,579 participants in the Korean Early Childhood Education & Care Panel between March 1 and June 3, 2025 were analyzed using Python version 3.12.8 and Colab. The dataset was split into training and testing sets at an 80:20 ratio, and five classifiers (random forest, logistic regression, decision tree, AdaBoost, and XGBoost) were trained and evaluated using multiple performance metrics and feature importance analysis.
Results
The confusion matrix of the random forest classifier model demonstrated strong performance, with a precision of 86.6%, accuracy of 84.8%, recall of 96.8%, F1-score of 91.9%, and an area under the curve of 86.0%. Twenty-one features were analyzed, from which feeding plan, breastfeeding at 1 month, marriage period, maternal prenatal weight, self-respect, alcohol consumption, grit, value placed on children, maternal age, and depression emerged as important predictors of exclusive breastfeeding in the first 3 months.
Discussion
A robust model was developed to predict exclusive breastfeeding that identified feeding planning and breastfeeding at 1 month as the most influential predictors. The model could be implemented in clinical and community settings to guide tailored breastfeeding support strategies, coupled with the integration of maternal self-respect, grit, and the value placed on children in counseling programs to promote exclusive breastfeeding.
Key words: Birth; Breast feeding; Machine learning; Pregnancy; Women

Introduction

Exclusive breastfeeding (EBF) is defined as receiving only breast milk—no other liquids or solids—except for oral rehydration solutions, or drops or syrups containing vitamins, minerals, or medicines, meaning that breast milk serves as the sole source of nutrition for the infant during the early postpartum period [1,2]. The Healthy People 2030 Initiative set a goal to increase the 6-month EBF rate from 27.2% (based on 2021 data) to 42.4% [2]. However, South Korea has shown persistently low and declining breastfeeding rates in recent years. The EBF rate at 3 months fell from 30.5% in 2018 to 19.3% in 2024, reflecting a significant downward trend [1].

Breastfeeding offers substantial health benefits for both infants and mothers. It provides optimal nutrition for infants, supporting healthy growth and development. Additionally, breastfeeding reduces the risk of several short- and long-term illnesses. Breastfed children have a lower risk of developing asthma, obesity, type 1 diabetes mellitus, and sudden infant death syndrome. Breastfeeding is also associated with reduced maternal risk of type 2 diabetes mellitus, hypertension, breast cancer, and ovarian cancer. Accordingly, the American Academy of Pediatrics recommends EBF for the first 3 months of an infant’s life [3]. Research has underscored the importance of 3 months of EBF, linking it to a reduced risk of infantile eczema up to 2 years of age, a 60% reduction in sudden infant death syndrome, and a 62% decrease in postnatal mortality. Continuing breastfeeding until 3 months also lowered the risk of ulcerative colitis and childhood obesity [3].

Key predictors of exclusive breastfeeding at 3 months include feeding intention, initial feeding practices, high breastfeeding self-efficacy, positive attitudes toward breastfeeding, low confidence in formula feeding, and fewer concerns about insufficient milk supply [4]. EBF at 3 months has also shown positive associations with breastfeeding support following hospital discharge, perceptions that formula has limited nutritional value, experiences of mastitis, return to work, and vaginal delivery [5]. Additional influential factors include prenatal intention to breastfeed, early skin-to-skin contact, EBF at hospital discharge, maternal self-efficacy, postpartum professional support, delayed introduction of formula, and supportive partner involvement [6]. Moreover, childbirth method, prenatal decisions regarding breastfeeding, breastfeeding at 1 month, and participation in prenatal parenting education programs have been identified as significant predictors [7]. This study examined features differentiating EBF status by incorporating variables identified in previous research, using Bronfenbrenner’s ecological systems theory—a framework that conceptualizes the individual’s environment as an interconnected ecosystem. The analysis included a range of micro- to macro-level influences, including individual, social, and institutional factors [8].

The persistently low EBF rate in Korea has not been fully explained by existing research. While this study does not seek to explain causality behind low EBF rates, it focuses on distinguishing breastfeeding status through data-driven classification. In this context, machine learning techniques offer a promising approach by enabling the identification of features that contribute to accurate differentiation of EBF versus non-EBF cases. Only a few studies have applied machine learning methods: for instance, studies in China [9] and Turkey [10] reported EBF rates at 6 months of 83% and 41%, respectively, compared to only 21% in Korea, underscoring the need for a context-specific approach. Machine learning provides a robust computational tool to manage high-dimensional, nonlinear data and reveal complex patterns that traditional methods may miss [11]. Accordingly, this study aimed to develop a model to classify EBF status at 3 months postpartum using machine learning, and to identify key features that contribute to this model’s classification performance.

Methods

1. Study design

This study employed a secondary analysis using machine learning techniques to examine factors influencing EBF at the first 3 months. This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.

2. Datasets

Publicly available data from the Korean Early Childhood Education & Care Panel (K-ECEC-P) 2022, collected by the Korea Institute of Child Care and Education (KICCE) [12], were used. The target population comprised mothers of children born in 2022. When their infants were 3 months old, mothers completed an online survey. Cross-sectional data on 3,380 mothers were obtained from the 2022 K-ECEC-P dataset. Of these, 2,579 mothers reported their breastfeeding status at 3 months and were included in the analysis. Data were accessed for analysis on March 1, 2025. KICCE provided the survey codebook, instrument profile, user guide, and the dataset for public use. Survey items covered maternal demographic, obstetric, psychological, social, emotional, and behavioral data. Twenty-one features were selected for analysis: maternal age, marriage period, maternal prenatal weight, employment status, number of children, twin pregnancy, type of birth, smoking, alcohol consumption, use of rooming-in, time of first breastfeeding, skin contact with baby, use of nursery, use of babysitter, breastfeeding at 0 months, breastfeeding at 1 month, feeding plan, self-respect, grit, value placed on children, and depression.

3. Sampling

This study conducted a secondary analysis of the 2022 K-ECEC-P dataset, an anonymized public dataset provided by the KICCE and accessed without modification from its official website. KICCE identified the target population based on the most recent national census data available at the time of the sampling design, utilizing the 2019 birth and death statistics from Statistics Korea, and employed quota sampling using a list of medical institutions with delivery records as the sampling frame. The sampling approach included both appropriate sampling units and frames to ensure representative selection. The sample included 143 obstetric clinics and hospitals nationwide, covering both regional and local institutions. Pregnant women were recruited through in-person interviews during visits to these facilities. Baseline survey I was administered using tablet-assisted personal interviews at initial contact.

Participants were mothers of children born between January and August 2022 who had completed Baseline Survey I of the K-ECEC-P. Exclusion criteria included childbirth prior to 2022 (i.e., preterm births), cases of miscarriage or infant death, and situations where the mother’s health was severely compromised. Additional exclusions applied to participants who refused to participate, demonstrated repeated non-cooperation, or were unable to complete the survey due to missing or invalid contact information. Of the 3,380 eligible participants, 594 were excluded due to non-response to the postpartum survey, 11 were excluded for not responding to the main survey, and 196 were removed because of excessive missing data. The final analytic sample comprised 2,579 participants, representing 76.3% of the initial eligible population. Within the final sample, 495 mothers reported EBF, while 2,084 mothers did not exclusively breastfeed, including 1,611 who used formula feeding only and 473 who practiced mixed feeding (Figure 1). For the purposes of the machine learning analysis, the final dataset was dichotomized into two groups: EBF (n=495) and non‑EBF (n=2,084).

4. Measurements

1) Label: exclusive breastfeeding

EBF was measured using a single question: “Please indicate the feeding method according to your baby’s age in months,” with response options “1=exclusive breastfeeding,” “2=mixed (breast and formula),” and “3=formula only.” Responses of 2 and 3 were recoded to 0 for dichotomization.

2) Features

(1) Self-respect

Self-respect was assessed using the Korean version of the Self-Respect Scale [13], adapted from the original Rosenberg Self-Esteem Scale [14]. This 10-item scale uses a 4-point Likert response format ranging from “not at all” (1) to “very true” (4). Cronbach’s alpha was .77 in the original and .83 in this study.

(2) Grit

Grit was measured using the Short Grit Scale (GRIT–S), developed by Duckworth and Quinn [15] and validated in Korean [16]. This 8-item scale uses a 5-point Likert format (“not at all”=1 to “very true”=5). Cronbach’s alpha was .77 in the original and .83 in this study.

(3) Value placed on children

The Value Placed on Children Scale assesses the importance parents place on children, with eight items covering two subdomains: emotional value (four items) and instrumental value (four items) [17]. The scale uses a 5-point Likert response from “not at all true” (1) to “very true” (5). Cronbach’s alpha was .88 in the original and .95 in this study.

(4) Depression

Depression was measured using the Korean version of the Edinburgh Postpartum Depression Scale (K-EPDS) [18], based on the 10-item EPDS [19]. Each item is scored on a 4-point Likert scale (0–3). Cronbach’s alpha was .85 in the original and .79 in this study.

(5) General and obstetric characteristics

Personal characteristics included maternal age (years), marriage period (months), maternal prenatal weight (kg), employment status (employed/unemployed), smoking (yes/no), and alcohol consumption (yes/no), all self-reported. Obstetric characteristics included number of children, twin pregnancy (yes/no), skin contact with baby (yes/no), type of birth (normal spontaneous vaginal delivery / cesarean section, combining elective and emergency cases), time of first breastfeeding (hours after birth), use of rooming-in (yes/no), use of nursery (yes/no), use of babysitter (yes/no), breastfeeding at 0 months, breastfeeding at 1 month, and feeding plan (yes/no), all self-reported.

5. Analysis

Supervised machine learning models were used to predict EBF, employing Python ver. 3.12.8 (https://www.python.org/) and the Google Colab environment ver. 1.2.0 (Colaboratory Chrome Extension; Google LLC). Python libraries used included Pandas, NumPy, Matplotlib, and Scikit-Learn, with specific imports such as matplotlib.pyplot as plt, seaborn as sns, plotly.express as px, sklearn, pandas as pd, and numpy as np for exploratory data analysis (EDA) and data processing. EDA included examining variable data types, assessing the distribution and normality of continuous variables, and applying standardization using the StandardScaler preprocessing method—specifically for the variable “marriage period.” The process also involved identifying missing values, detecting outliers coded as “9999” for maternal prenatal weight, and computing frequency counts for categorical variables. The K-ECEC-P provided a pre-cleaned dataset, and missing values were handled according to variable type: numerical values were imputed with the mean, and categorical values with the mode where applicable [20]. After preprocessing, the final analytic sample comprised 2,579 participants. Missing and outlier values were managed according to variable type: means were used to impute missing numerical data, and the mode was used for categorical data where applicable [20]. However, no categorical variables in this dataset contained missing values. Specifically, 18 missing values for marriage period were imputed with the mean. One missing value for the 3-month EBF variable was excluded from the analysis. As a result, the final analytic sample remained at 2,579 cases. The dataset was split into training and test sets in an 80:20 ratio.

After preprocessing, the cleaned dataset was used to train and evaluate random forest, logistic regression, decision tree, AdaBoost, and XGBoost classifier models. The performance of the five models was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), as well as confusion matrices. Accuracy was defined as the proportion of correct predictions (both positive and negative) out of all predictions. Precision was calculated as the proportion of true positives among all positive predictions. Recall was calculated as the proportion of true positives among all actual positives. The F1-score balances precision and recall, representing their harmonic mean. AUC-ROC assesses the model's ability to discriminate between classes [21,22]. In addition, 95% confidence intervals (CIs) for model performance metrics were estimated using the bootstrapping method. GridSearchCV was employed during model development to perform hyperparameter tuning by systematically searching for the optimal combination of parameter values to improve model performance. After model selection, important factors influencing EBF were assessed using absolute feature importance, derived differently across models: for the decision tree (Supplementary Figure 1) and random forest, importance was calculated based on the reduction in Gini impurity; for logistic regression, it was determined using the absolute values of standardized coefficients; for AdaBoost, it was obtained from the weighted impurity decrease across weak learners; and for XGBoost, it was evaluated using the ‘gain’ metric, representing each variable’s contribution to improving model performance [11,22].

6. Ethical statements

This study received approval from the Institutional Review Board (IRB) of KICCE (approval no., KICCEIRB-2022-01), which permitted the use of the dataset without additional IRB review. All study procedures conformed to the Declaration of Helsinki, and informed consent was obtained from all participants.

Results

1. Data characteristics

The study population had a mean maternal age of 33.5±4.16 years, with an average marriage period of 24.33±13.91 months, a mean maternal prenatal weight of 68.05±12.17 kg, and an average number of children of 1.45±0.65. Employed mothers accounted for 50.7%, twin births for 5.1%, cesarean sections for 61.3%, alcohol consumption for 48.0%, smoking for 2.9%, and use of rooming-in for 70.6%. The timing of first breastfeeding was categorized as follows: ≤1 hour (4.6%), >1–24 hours (24.3%), >24–48 hours (20.8%), >48 hours–7 days (35.9%), and none (14.4%). Skin contact with the baby was reported by 47.3%. Use of nursery was reported by 86.2% of mothers and 52.2% reported use of a babysitter. Breastfeeding only at 0 months accounted for 24.6%, breastfeeding only at 1 month for 23.3%, and feeding plan for 24.2%. The mean score was 30.18±5.01 for self-respect, 22.26±4.25 for grit, 25.84±4.86 for value placed on children, and 7.78±5.65 for depression. Breastfeeding at 3 months included EBF (19.2%) and non-EBF (80.8%) (Table 1).

2. Model performance comparison

A confusion matrix was generated to compare five machine learning models—decision tree, random forest, logistic regression, AdaBoost, and XGBoost—using a confusion matrix. Model performance metrics included accuracy, precision, recall, F1-score, and AUC-ROC. The highest accuracy was observed with the random forest model (84.8%; 95% CI, 83.4–86.2), while the lowest was with the decision tree classifier (73.5%; 95% CI, 72.2–75.6). AdaBoost demonstrated the highest precision (88.3%; 95% CI, 87.1–89.6), with random forest showing the lowest (86.6%; 95% CI, 85.3–87.9). The random forest achieved the highest recall (96.8%; 95% CI, 96.1–97.5), while the decision tree had the lowest recall (78.7%; 95% CI, 77.1–80.3). The highest F1-score was also seen in the random forest (91.9%; 95% CI, 89.0–95.7), with the lowest in the decision tree (82.5%; 95% CI, 80.8–86.2). The highest AUC-ROC was recorded by both XGBoost and random forest (86.0%; 95% CI, 84.7–87.3). Thus, except for precision, overall model performance was superior in the random forest (Table 2).

3. Feature importance

The top 10 features important for predicting EBF were identified using the random forest. The most influential predictors at 3 months were feeding plan (.12; 95% CI, 0.09–0.14), breastfeeding at 1 month (.11; 95% CI, 0.09–0.13), marriage period (.06; 95% CI, 0.05–0.07), maternal prenatal weight (.06; 95% CI, 0.05–0.07), self-respect (.05; 95% CI, 0.05–0.06), alcohol consumption (.05; 95% CI, 0.04–0.07), grit (.05; 95% CI, 0.04–0.05), value placed on children (.05; 95% CI, 0.05–0.06), maternal age (.05; 95% CI, 0.04–0.05), and depression (.04; 95% CI, 0.04–0.05) (Table 3).

Discussion

This study developed a predictive model to classify mothers practicing EBF at 3 months postpartum and identified key determinants of EBF. Grounded in Bronfenbrenner’s ecological systems theory, this study framework identified key contributors to distinguishing between EBF and non-EBF groups. The most influential predictors were feeding plan and breastfeeding at 1 month. Microsystem-level factors included marriage period, maternal age, and employment status, while health-related variables such as maternal prenatal weight and alcohol consumption also contributed substantially. Macro-system attributes—including grit, self-respect, the value placed on children, and depression—further improved the model’s classification performance.

In this study, the random forest demonstrated superior performance among the five machine learning models. Feature importance differed between models; in the random forest, importance was calculated by measuring how much each feature reduces impurity at each node. The random forest is widely recognized for its superior predictive accuracy and robustness, particularly in handling complex, high-dimensional data. It reduces overfitting by aggregating multiple decision trees through bootstrap sampling and random feature selection, thereby enhancing generalizability across datasets [23].

Existing studies have primarily addressed breastfeeding outcomes at 6 months or during the immediate postpartum period. Previous machine learning studies for breastfeeding prediction have reported accuracies ranging from 0.70 to 0.90 [9,10,24-26]. For instance, Açikgöz et al. [9] and Choi et al. [24] developed models to predict EBF at 6 months using the random forest algorithm, with reported accuracies of 0.72 and 0.76, respectively. Liu et al. [10] also focused on 6-month outcomes using the random forest algorithm, reporting accuracies ranging from 0.77 to 0.90, and further examined predictors including breastfeeding self-efficacy, intention, social support, and postpartum depression. Oliver-Roig et al. [25] and Walle et al. [26] analyzed breastfeeding initiation and early cessation during the in-hospital postpartum stay, reporting accuracies of 0.84 and 0.83, using XGBoost and random forest, respectively. Therefore, this study addresses a gap in the literature by developing a machine learning model specifically aimed at predicting EBF at 3 months postpartum. It further contributes by developing a predictive model tailored to the context of South Korea, where the 3-month EBF rate remains relatively low [1].

The top predictors spanned both micro-system factors, such as personal and demographic characteristics, and macro-system factors, such as values [8]. Unlike a previous study during the in-hospital postpartum period, which found older maternal age and normal body mass index increased EBF [25], this study identified longer marriage duration—often accompanying delayed marriage and advanced maternal age in South Korea—as associated with lower EBF rates. Although no prior studies have directly examined marriage duration, an analysis of Korean data showed that women aged 40–49 years had significantly lower odds of breastfeeding than those aged 19–29 years (odds ratio, 0.47) [27]. Similarly, research in the United Kingdom has linked maternal obesity to reduced breastfeeding outcomes, potentially due to physiological and psychological factors such as delayed lactogenesis and low body confidence [28]. Feeding plan and breastfeeding at 1 month were the strongest predictors, suggesting that early plans and experiences critically influence sustained EBF. This underscores the importance of prenatal counseling and immediate postpartum support.

This study also identified unique predictors not emphasized in prior machine learning research [9,10,24-26], including alcohol consumption, maternal grit, value placed on children, self-respect, and depression. Compared to non-drinkers, those who quit, reduced, or resumed drinking had lower odds of breastfeeding [29]. Depression negatively affects breastfeeding self-efficacy, while social support and positive attitudes enhance it [30]. Although self-respect and value placed on children have been less studied, alignment with family-centered norms has been linked to stronger breastfeeding beliefs and empowerment [31]. Grit, marked by persistence despite adversity, has emerged as a distinguishing trait among breastfeeding mothers [32]. Drawing on emotional availability theory, maternal psychological well-being—including self-respect and depression—may influence breastfeeding through emotional attunement and sensitivity toward the infant [33]. Macro-system characteristics such as self-respect, grit, and value placed on children were also critical predictors, highlighting the need for emotionally attuned, resilience-focused breastfeeding interventions. Previous machine learning studies have analyzed various predictors of breastfeeding, including maternal health problems and drinking water access [9], self-efficacy [10], maternal diabetes mellitus [24], neonatal weight, skin contact with baby, and prior maternal breastfeeding experience [25], as well as maternal age, cesarean section, and access to healthcare facilities [26]. While sociodemographic variables were commonly included across studies, the present study additionally incorporated behavioral factors (feeding plan, breastfeeding at 1 month) and psychological factors (self-respect, grit, value placed on children), which demonstrated strong predictive power for EBF at 3 months postpartum. These findings suggest the model’s effectiveness in early identification of mothers at risk of early breastfeeding cessation. The inclusion of such multidimensional factors highlights the strengths of machine learning in capturing complex interactions beyond traditional biomedical predictors, supporting the appropriateness of the variable selection in this study.

This study has several limitations. First, although the K-ECEC-P dataset is large and nationally representative, its cross-sectional design limits causal inference between predictors and EBF outcomes. Longitudinal data would allow for more robust predictive modeling and temporal interpretation. Second, EBF was measured using a single self-reported item at 3 months postpartum, which may introduce recall and social desirability bias, potentially affecting classification accuracy. Third, although this study incorporated a range of psychosocial, demographic, and obstetric variables, key predictors identified in previous research—such as breastfeeding self-efficacy, skin contact with baby, workplace breastfeeding support, and history of lactation consultation—were not available in the secondary dataset, potentially limiting the model’s scope and comprehensiveness. Fourth, the imputation of missing values using means or modes, although necessary, may have introduced bias. Finally, these findings are contextually based on Korean mothers and healthcare settings, so caution should be exercised when generalizing to other populations. Further external validation using diverse and longitudinal cohorts is needed to confirm the robustness and applicability of the model.

Conclusion

This study developed and evaluated machine learning models to predict EBF at 3 months postpartum using data from the K-ECEC-P. Among the five models tested, the random forest demonstrated the best overall performance, with high accuracy, precision, and AUC-ROC, making it a suitable tool for identifying key predictors of EBF. The analysis showed that early breastfeeding behaviors—particularly feeding plan and breastfeeding at 1 month—were the strongest predictors of EBF. Additionally, maternal psychological factors such as self-respect, grit, and value placed on children had significant effects on sustained breastfeeding. These findings underscore the importance of early intervention during the prenatal and early postpartum periods to support and encourage exclusive breastfeeding. Healthcare providers should prioritize enhancing maternal psychological readiness and reinforcing positive breastfeeding intentions and behaviors immediately after birth. By applying machine learning to maternal and infant health data, this study provides a data-driven framework for targeted interventions aimed at improving breastfeeding outcomes. Future research should employ longitudinal models and diverse populations to enhance generalizability and support the development of personalized breastfeeding support programs.

Article Information

Conflicts of Interest

Hyun Kyoung Kim is member of the editorial board of the Journal of Korean Academy of Nursing. However, she was not involved in the editorial handling, peer review, or decision-making process for this manuscript.

Acknowledgements

None.

Funding

This work was supported by the research grant of Kongju National University in 2025 and the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MIST) (No. RS-2023-00239284).

Data Sharing Statement

Please contact the corresponding author for data availability.

Supplementary Data

Supplementary data to this article can be found online at https://doi.org/10.4040/jkan.25086.

Supplementary Figure 1. Visualization of the model using a decision tree classifier. EBF, exclusive breastfeeding.

jkan-25086-Supplementary-Figure-1.pdf

Author Contributions

HKK participated in the conception, design of the study, the acquisition of data, drafted the first and final manuscript and funding acquisition.

Fig. 1.

Flow of study participants.

Table 1.

Characteristics of datasets (N=2,579)

Characteristic	Category	Value
Maternal age (yr)		33.5±4.16 (17–49)
Marriage period (mo)		24.33±13.91 (6–62)
Maternal prenatal weight (kg)		68.05±12.17 (45–125)
Employment status	Employed	1,308 (50.7)
Employment status	Unemployed	1,271 (49.3)
No. of children		1.45±0.65 (1–4)
Twin pregnancy	Yes	131 (5.1)
Twin pregnancy	No	2,448 (94.9)
Type of birth	Normal delivery	997 (38.7)
Type of birth	Cesarean section	1,582 (61.3)
Alcohol consumption	Yes	1,237 (48.0)
Alcohol consumption	No	1,342 (52.0)
Smoking	Yes	75 (2.9)
Smoking	No	2,504 (97.1)
Use of rooming-in	Yes	1,820 (70.6)
Use of rooming-in	No	759 (29.4)
Time of first breastfeeding	≤1 hr	119 (4.6)
	>1–24 hr	627 (24.3)
	>24–48 hr	537 (20.8)
	>48 hr–7 day	925 (35.9)
	None	371 (14.4)
Skin contact with baby	Yes	1,221 (47.3)
Skin contact with baby	No	1,358 (52.7)
Use of nursery	Yes	2,222 (86.2)
Use of nursery	No	357 (13.8)
Use of babysitter	Yes	1,346 (52.2)
Use of babysitter	No	1,233 (47.8)
Breastfeeding at 0 mo	Breastfeeding only	635 (24.6)
	Mixed	1,602 (62.1)
	Formula only	342 (13.3)
Breastfeeding at 1 mo	Breastfeeding only	600 (23.3)
	Mixed	1,171 (45.4)
	Formula only	808 (31.3)
Feeding plan	Yes	625 (24.2)
Feeding plan	No	1,954 (75.8)
Self-respect		30.18±5.01(12–40)
Grit		22.26±4.25 (11–40)
Value placed on children		25.84±4.86 (9–40)
Depression		7.78±5.65 (0–29)
Breastfeeding at 3 mo	EBF	495 (19.2)
Breastfeeding at 3 mo	Non-EBF	2,084 (80.8)

Values are presented as mean±standard deviation (minimum–maximum) or number (%).

EBF, exclusive breastfeeding.

Table 2.

Comparison of the performance of machine learning models (N=2,579)

Model	Precision	Accuracy	Recall	F1-score	AUC-ROC
AdaBoost	88.3 (87.1–89.6)	82.4 (89.5–91.7)	90.6 (89.5–91.7)	89.6 (86.9–90.3)	84.0 (82.6–85.4)
XGBoost	87.5 (86.2–88.8)	84.6 (83.2–86.0)	93.6 (92.7–94.6)	90.7 (88.5–92.9)	86.0 (84.7–87.3)
Decision tree	87.0 (85.7–88.3)	73.5 (72.2–75.6)	78.7 (77.1–80.3)	82.5 (80.8–86.2)	85.0 (83.6–86.4)
Random forest	86.6 (85.3–87.9)	84.8 (83.4–86.2)	96.8 (96.1–97.5)	91.9 (89.0–95.7)	86.0 (84.7–87.3)
Logistic regression	87.4 (86.1–88.7)	83.6 (82.2–85.1)	93.5 (92.6–94.5)	86.7 (83.4–88.4)	85.0 (83.6–86.4)

Values are presented as % (95% confidence interval).

F1-score, harmonic mean of precision and recall; AUC-ROC, area under the receiver operating characteristic curve.

Table 3.

Top 10 feature importance values from the random forest (N=2,579)

Feature	Absolute importance (95% CI)	Real value
Feeding plan	.12 (0.09–0.14)	0.12
Breastfeeding at 1 mo	.11 (0.09–0.13)	0.11
Marriage period	.06 (0.05–0.07)	–0.06
Maternal prenatal weight	.06 (0.05–0.07)	–0.06
Self-respect	.05 (0.05–0.06)	0.05
Alcohol consumption	.05 (0.04–0.07)	–0.05
Grit	.05 (0.04–0.05)	0.05
Value placed on child	.05 (0.05–0.06)	0.05
Maternal age	.05 (0.04–0.05)	–0.05
Depression	.04 (0.04–0.05)	–0.05

CI, confidence interval.

References

1. Korea Institute for Health and Social Affairs. The 2024 National Family and Fertility Survey [Internet]. Korea Institute for Health and Social Affairs; 2024 [cited 2025 Mar 9]. Available from: https://www.kihasa.re.kr/publish/report/research/view?seq=68528
2. US Department of Health and Human Services. Healthy People 2030: increase the proportion of infants who are breastfed exclusively through age 6 months-MICH‑15 [Internet]. US Department of Health and Human Services; 2021 [cited 2025 Mar 9]. Available from: https://odphp.health.gov/healthypeople/objectives-and-data/browse-objectives/infants/increase-proportion-infants-who-are-breastfed-exclusively-through-age-6-months-mich-15
3. Meek JY, Noble L; Section on Breastfeeding. Policy statement: breastfeeding and the use of human milk. Pediatrics. 2022;150(1):e2022057988. https://doi.org/10.1542/peds.2022-057988Article PubMed
4. Davie P, Chilcot J, Chang YS, Norton S, Hughes LD, Bick D. Effectiveness of social-psychological interventions at promoting breastfeeding initiation, duration and exclusivity: a systematic review and meta-analysis. Health Psychol Rev. 2020;14(4):449-485. https://doi.org/10.1080/17437199.2019.1630293Article PubMed
5. Gianni ML, Bettinelli ME, Manfra P, Sorrentino G, Bezze E, Plevani L, et al. Breastfeeding difficulties and risk for early breastfeeding cessation. Nutrients. 2019;11(10):2266. https://doi.org/10.3390/nu11102266Article PubMed PMC
6. Zhang J, Li Y, Zhu L, Shang Y, Yan Q. The effectiveness of online breastfeeding education and support program on mothers of preterm infants: a quasi-experimental study. Midwifery. 2024;130:103924. https://doi.org/10.1016/j.midw.2024.103924Article PubMed
7. Wu HL, Lu DF, Tsay PK. Rooming-in and breastfeeding duration in first-time mothers in a modern postpartum care center. Int J Environ Res Public Health. 2022;19(18):11790. https://doi.org/10.3390/ijerph191811790Article PubMed PMC
8. Bronfenbrenner U. The ecological model of human development in international encyclopedia of education. 2nd ed. Elsevier; 1994.
9. Açikgöz A, Çakirli M, Şahin BM, Çelik Ö. Predicting mothers’ exclusive breastfeeding for the first 6 months: interface creation study using machine learning technique. J Eval Clin Pract. 2024;30(6):1000-1007. https://doi.org/10.1111/jep.14009Article PubMed
10. Liu Y, Xiang J, Yan P, Liu Y, Chen P, Song Y, et al. Trajectory of breastfeeding among Chinese women and risk prediction models based on machine learning: a cohort study. BMC Pregnancy Childbirth. 2024;24(1):858. https://doi.org/10.1186/s12884-024-07010-zArticle PubMed PMC
11. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. Springer; 2009.
12. Korea Institute of Child Care and Education. Korean Early Childhood Education & Care Panel (K-ECEC-P) 2022 survey [Internet]. Korea Institute of Child Care and Education; 2024 [cited 2025 Mar 1]. Available from: https://panel.kicce.re.kr/kececp/module/rawDataManage/index.do?menu_idx=52
13. Korea Institute of Child Care and Education. Scale profile of self-respect for parents [Internet]. Korea Institute of Child Care and Education; 2024 [cited 2025 Feb 21]. Available from: https://panel.kicce.re.kr/pskc/board/view.do?menu_idx=42&board_idx=44530&manage_idx=161
14. Rosenberg M. Society and the adolescent self-image [Internet]. Wesleyan University Press; 1989 [cited 2023 Jun 5]. Available from: https://socy.umd.edu/about-us/using-rosenberg-self-esteem-scale
15. Duckworth AL, Quinn PD. Development and validation of the short grit scale (grit-s). J Pers Assess. 2009;91(2):166-174. https://doi.org/10.1080/00223890802634290Article PubMed
16. Kim HM, Hwang MH. Validation of the Korean grit scale for children. J Educ. 2015;35(3):63-74. https://doi.org/10.25020/je.2015.35.3.63Article
17. Lee SS, Jung YS, Kim HK, Choi EY, Park SK, Jo NH, et al. 2005 National Survey on Dynamics of Marriage and Fertility [Internet]. Korea Institute for Health and Social Affairs; 2005 [cited 2025 Feb 23]. Available from: https://repository.kihasa.re.kr/en/bitstream/201002/608/1/%ec%97%b0%ea%b5%ac%eb%b3%b4%ea%b3%a0%ec%84%9c%202005-30-1.pdf
18. Han K, Kim M, Park JM. The Edinburgh Postnatal Depression Scale, Korean version: reliability and validity. J Korean Soc Biol Ther Psychiatry. 2004;10(2):201-207.
19. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression: development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782-786. https://doi.org/10.1192/bjp.150.6.782Article PubMed
20. Austin PC, White IR, Lee DS, van Buuren S. Missing data in clinical research: a tutorial on multiple imputation. Can J Cardiol. 2021;37(9):1322-1331. https://doi.org/10.1016/j.cjca.2020.11.010Article PubMed
21. Krishnan R, Rajpurkar P, Topol EJ. Self-supervised learning in medicine and healthcare. Nat Biomed Eng. 2022;6(12):1346-1352. https://doi.org/10.1038/s41551-022-00914-1Article PubMed
22. Sagu A, Gill NS. Machine learning decision tree classifier and logistic regression model. Int J Adv Trends Comput Sci Eng. 2020;9(1.4):163-166. https://doi.org/10.30534/ijatcse/2020/2491.42020Article
23. Salman HA, Kalakech A, Steit A. Random forest algorithm overview. Babylonian J Mach Learn. 2024;2024:69-79. https://doi.org/10.58496/BJML/2024/007Article
24. Choi ES, Lee JS, Lee H, Lee KS, Ahn KH. Association between breastfeeding duration and diabetes mellitus in menopausal women: a machine-learning analysis using population-based retrospective study. Int Breastfeed J. 2024;19(1):33. https://doi.org/10.1186/s13006-024-00642-zArticle PubMed PMC
25. Oliver-Roig A, Rico-Juan JR, Richart-Martínez M, Cabrero-García J. Predicting exclusive breastfeeding in maternity wards using machine learning techniques. Comput Methods Programs Biomed. 2022;221:106837. https://doi.org/10.1016/j.cmpb.2022.106837Article PubMed
26. Walle AD, Abebe Gebreegziabher Z, Ngusie HS, Kassie SY, Lambebo A, Zekarias F, et al. Prediction of delayed breastfeeding initiation among mothers having children less than 2 months of age in East Africa: application of machine learning algorithms. Front Public Health. 2024;12:1413090. https://doi.org/10.3389/fpubh.2024.1413090Article PubMed PMC
27. Huh Y, Kim YN, Kim YS. Trends and determinants in breastfeeding among Korean women: a nationwide population-based study. Int J Environ Res Public Health. 2021;18(24):13279. https://doi.org/10.3390/ijerph182413279Article PubMed PMC
28. Dalrymple KV, Briley AL, Tydeman FA, Seed PT, Singh CM, Flynn AC, et al. Breastfeeding behaviours in women with obesity; associations with weight retention and the serum metabolome: a secondary analysis of UPBEAT. Int J Obes (Lond). 2024;48(10):1472-1480. https://doi.org/10.1038/s41366-024-01576-6Article PubMed PMC
29. Washio Y, Raines AL, Lv M, Pei S, Taylor SN, Zhang Z. The association of maternal smoking and drinking changes during pregnancy and postpartum breastfeeding pattern and duration. Breastfeed Med. 2023;18(6):449-461. https://doi.org/10.1089/bfm.2022.0130Article PubMed PMC
30. Mercan Y, Tari Selcuk K. Association between postpartum depression level, social support level and breastfeeding attitude and breastfeeding self-efficacy in early postpartum women. PLoS One. 2021;16(4):e0249538. https://doi.org/10.1371/journal.pone.0249538Article PubMed PMC
31. Dehghani M, Kazemi A, Heidari Z, Mohammadi F. The relationship between women’s breastfeeding empowerment and conformity to feminine norms. BMC Pregnancy Childbirth. 2023;23(1):287. https://doi.org/10.1186/s12884-023-05628-zArticle PubMed PMC
32. Woods Barr A. “It needs to become a norm again and not make it feel like it’s something so foreign”: (re)normalizing and reclaiming breastfeeding in African American families. J Perinat Neonatal Nurs. 2025;39(2):118-128. https://doi.org/10.1097/JPN.0000000000000901Article PubMed
33. Kim CY, Smith NP, Teti DM. Associations between breastfeeding, maternal emotional availability, and infant-mother attachment: the role of coparenting. J Hum Lact. 2024;40(3):455-463. https://doi.org/10.1177/08903344241247207Article PubMed PMC

Figure & Data

REFERENCES

Citations

Citations to this article as recorded by

ePub Link

Cite

CITE: export Copy Download Format; Close

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

RIS — For EndNote, ProCite, RefWorks, and most other reference management software
BibTeX — For JabRef, BibDesk, and other BibTeX-specific software

Include:

Citation for the content below
Citation and abstract for the content below

Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey

J Korean Acad Nurs. 2025;55(4):519-527. Published online October 28, 2025

DOI: https://doi.org/10.4040/jkan.25086

XML Download

Figure

We recommend

Related articles

Formative versus reflective measurement models in nursing research: a secondary data analysis of a cross-sectional study in Korea

Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey

Fig. 1. Flow of study participants.

Fig. 1.

Development of a predictive model for exclusive breastfeeding at 3 months using machine learning : a secondary analysis of a cross-sectional survey

Characteristic	Category	Value
Maternal age (yr)		33.5±4.16 (17–49)
Marriage period (mo)		24.33±13.91 (6–62)
Maternal prenatal weight (kg)		68.05±12.17 (45–125)
Employment status	Employed	1,308 (50.7)
Employment status	Unemployed	1,271 (49.3)
No. of children		1.45±0.65 (1–4)
Twin pregnancy	Yes	131 (5.1)
Twin pregnancy	No	2,448 (94.9)
Type of birth	Normal delivery	997 (38.7)
Type of birth	Cesarean section	1,582 (61.3)
Alcohol consumption	Yes	1,237 (48.0)
Alcohol consumption	No	1,342 (52.0)
Smoking	Yes	75 (2.9)
Smoking	No	2,504 (97.1)
Use of rooming-in	Yes	1,820 (70.6)
Use of rooming-in	No	759 (29.4)
Time of first breastfeeding	≤1 hr	119 (4.6)
	>1–24 hr	627 (24.3)
	>24–48 hr	537 (20.8)
	>48 hr–7 day	925 (35.9)
	None	371 (14.4)
Skin contact with baby	Yes	1,221 (47.3)
Skin contact with baby	No	1,358 (52.7)
Use of nursery	Yes	2,222 (86.2)
Use of nursery	No	357 (13.8)
Use of babysitter	Yes	1,346 (52.2)
Use of babysitter	No	1,233 (47.8)
Breastfeeding at 0 mo	Breastfeeding only	635 (24.6)
	Mixed	1,602 (62.1)
	Formula only	342 (13.3)
Breastfeeding at 1 mo	Breastfeeding only	600 (23.3)
	Mixed	1,171 (45.4)
	Formula only	808 (31.3)
Feeding plan	Yes	625 (24.2)
Feeding plan	No	1,954 (75.8)
Self-respect		30.18±5.01(12–40)
Grit		22.26±4.25 (11–40)
Value placed on children		25.84±4.86 (9–40)
Depression		7.78±5.65 (0–29)
Breastfeeding at 3 mo	EBF	495 (19.2)
Breastfeeding at 3 mo	Non-EBF	2,084 (80.8)

Model	Precision	Accuracy	Recall	F1-score	AUC-ROC
AdaBoost	88.3 (87.1–89.6)	82.4 (89.5–91.7)	90.6 (89.5–91.7)	89.6 (86.9–90.3)	84.0 (82.6–85.4)
XGBoost	87.5 (86.2–88.8)	84.6 (83.2–86.0)	93.6 (92.7–94.6)	90.7 (88.5–92.9)	86.0 (84.7–87.3)
Decision tree	87.0 (85.7–88.3)	73.5 (72.2–75.6)	78.7 (77.1–80.3)	82.5 (80.8–86.2)	85.0 (83.6–86.4)
Random forest	86.6 (85.3–87.9)	84.8 (83.4–86.2)	96.8 (96.1–97.5)	91.9 (89.0–95.7)	86.0 (84.7–87.3)
Logistic regression	87.4 (86.1–88.7)	83.6 (82.2–85.1)	93.5 (92.6–94.5)	86.7 (83.4–88.4)	85.0 (83.6–86.4)

Feature	Absolute importance (95% CI)	Real value
Feeding plan	.12 (0.09–0.14)	0.12
Breastfeeding at 1 mo	.11 (0.09–0.13)	0.11
Marriage period	.06 (0.05–0.07)	–0.06
Maternal prenatal weight	.06 (0.05–0.07)	–0.06
Self-respect	.05 (0.05–0.06)	0.05
Alcohol consumption	.05 (0.04–0.07)	–0.05
Grit	.05 (0.04–0.05)	0.05
Value placed on child	.05 (0.05–0.06)	0.05
Maternal age	.05 (0.04–0.05)	–0.05
Depression	.04 (0.04–0.05)	–0.05

Table 1. Characteristics of datasets (N=2,579)

Values are presented as mean±standard deviation (minimum–maximum) or number (%).

EBF, exclusive breastfeeding.

Table 2. Comparison of the performance of machine learning models (N=2,579)

Values are presented as % (95% confidence interval).

F1-score, harmonic mean of precision and recall; AUC-ROC, area under the receiver operating characteristic curve.