mirage

Application of variable selection and dimension reduction on predictors of MSE’s development

DSpace Repository

Show simple item record

dc.contributor.author Tilaye Wubetie, Habtamu
dc.date.accessioned 2019-06-25T05:27:49Z
dc.date.available 2019-06-25T05:27:49Z
dc.date.issued 2019-07-10
dc.identifier.uri http://hdl.handle.net/123456789/2239
dc.description.abstract Nature create variables using its character component, and variables are sharing characters from a vary small to relatively large scale. This results, variables to have from a vary different to a more similar character, and leads to have a relation ship. Literature suggested different relation measures based on the nature of variable and type of relation ship exist. Today, due to having high variety of frequently produced large data size, currently suggested variable filtering and selection methods have gaps to full fill the need. This research desires to fill this gap by comparing literature suggested methods to finding out a better variable selection and dimension reduction methods. The result from regression analysis using all literature suggested factors shows that none of the predictors for development status of enterprise are significant, and only 10 predictors for number of employer in an enterprise are significant out of 81 factors. Since, variable selection and dimension reduction methods are applied to find out predictors of a response by removing variable redundancy, and complexity of incorporating large number variable. Based on statistical power, for the results from variable selection methods, specially association and correlation methods showed that, CANOVA more efficiently detects non-linear or non-monotonic correlation between a continuous–continuous and a continuous-categorical variables. Spearman’s correlation coefficient more efficiently detects a monotonic correlation between a continuous with a continuous, and a continuous with a categorical variable. Pearson correlation coefficient more efficiently detects the linear correlation between continuous variables. MIC efficiently detects non-linear or non-monotonic relation between continuous variables. Chi-square test of independence efficiently detects relation between a continuous with a continuous, and categorical with categorical variables, but the non linear or non monotonic relation between a continuous with a categorical are not well detected. On the other hand, the result from lasso and stepwise methods reveals that, the relation between the predictor and response due to interaction effect not detected by correlation and association methods are detected by stepwise variable selection method, and the multicollinearity is detected and removed by lasso method. Regressing the response variable “number of employer in an enterprise” based on variables selected by lasso and stepwise method does bring greater model fitness (based on adjusted R-squared value) than variables selected by association and correlation methods. Similarly, regressing the response variable “development status of an enterprise” based on variables selected by association and correlation methods does bring en_US
dc.language.iso en en_US
dc.title Application of variable selection and dimension reduction on predictors of MSE’s development en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search in the Repository


Advanced Search

Browse

My Account