Abstract:
ABSTRACT
Introduction; Sexually Transmitted Infections
Objective: Objective of this study is to investigate spatial distribution and machine learning
prediction of Sexually Transmitted Infection and associated factors among sexually active Men
and Women in Ethiopia, Evidence from EDHS 2016.
Methods: A community-based cross-sectional study which was conducted from January 18,
2016, to June 27, 2016. The total sample size for spatial analysis after weighting EDHS data was
20,740. The analysis was done using spatial autocorrelation Moran’s I to detect a cluster of
sexually transmitted infection. Spatial scan statics was done to identify local significant clusters
based on the Bernoulli model using the SaTScan™ version 9.6. Supervised machine learning
model such as C5.0 Decision tree, Random Forest, Support Vector Machine, Naïve Bayes and
Logistic regression was applied on 2016 EDHS dataset of 20,799 unweighted records and
analyzed their performance. Association rules were done using unsupervised machine learning
algorithm.
Result: The spatial distribution of STI in Ethiopia was clustered across the country with a global
Moran’s index=0.06 and p value=0.04. Random Forest algorithm was best for STI prediction
with 69.48% balanced accuracy and 68.50% area under the curve. Random forest model showed
that region, wealth, age category, educational level, age at first sex, working status, marital
status, media access, alcohol drinking, chat chewing and sex of the respondent were the top 11
predictor of STI in Ethiopia.
Key words: STI, spatial distribution, machine learning prediction, Ethiopia.