Professional Documents
Culture Documents
PROPOSED FORECASTING MODEL FOR THE STUDENTS ACADEMIC PERFORMANCE OF BSCS STUDENTS IN NEW ERA UNIVERSITY
Teddy Eddie Q. Disp !". Libis Dike 1,Brgy. Balite St., Montalban, Rodriguez Rizal teqdispojr@gmail.com Re#ie$ %e&&e'( P. T)*+& #4 Manansala St., Krus na Ligas, Diliman, Quezon ity rkenneth_tugano@yahoo.com ABSTRACT !"e Data mining tool is a##e$ted as a de#ision making tool %"i#" is able to &a#ilitate better resour#e utilization in terms o& students' $er&orman#e. (t is essential &or de#ision)makers to obtain early &eedba#k on a#ademi# $er&orman#e and t"e e&&e#ti*eness o& di&&erent learning strategies. (n t"is $a$er t"e data &rom om$uter S#ien#e student "as been taken and *arious data mining met"ods "a*e been $er&ormed to im$ro*e students' a#ademi# $er&orman#e and to in#rease t"e de#reasing $o$ulation o& om$uter S#ien#e students' &rom &irst year to &ourt" year. Des#ri$ti*e met"od %as used to analyze t"e data and &ore#ast Ba#"elor o& S#ien#e in om$uter S#ien#e %ill &inis" #ourse &our years s$an and graduate on time. !o ensure im$artiality o& data t"e resear#"ers used t"e elements in t"e $o$ulation as its sam$le making in more in#lusi*e and re$resented so t"at t"e study %ill "a*e su&&i#ient and ade+uate data &or greater statisti#al e&&i#ien#y. !"e aim o& t"is study is to a$$ly di&&erent data mining te#"ni+ues to analyze t"e best model t"at %ill &it in &ore#asting students' a#ademi# $er&orman#e. !"e result o& study using t%o met"ods o& de#ision tree is to re$resent rule t"at is easy to inter$ret and by t"e used o& t"is met"od (D, algorit"m gi*es -..4./ a##urate results. %ey, "ds- Data Mining, lassi&i#ation, 0ore#asting, De#ision tree, Regression, 1er&orman#e INTRODUCTION !"e ability to $redi#t a students' a#ademi# $er&orman#e is *ery im$ortant in edu#ational en*ironments. 1redi#tion models t"at in#lude all $ersonal, so#ial, $sy#"ologi#al and ot"er en*ironment *ariables are ne#essitated &or t"e e&&e#ti*e $redi#tion in t"e $er&orman#e o& t"e students. !"e $redi#tion o& student $er&orman#e %it" "ig" a##ura#y is bene&i#ial to identi&y %"o among t"e students need a s$e#ial attention in t"eir studies. (t is re+uired t"at t"e identi&ied students be assisted more by t"e tea#"er so t"at t"eir $er&orman#e %ill im$ro*e in t"e &uture 213. Data mining e4tra#ts interesting non)tri*ial, im$li#it, $re*iously unkno%n and $otentially use&ul in&ormation or $atterns &rom data. (t #an be a$$lied to a number o& di&&erent a$$li#ations, su#" as data summarization, learning #lassi&i#ation rules, &inding asso#iations, analyzing #"anges and dete#ting anomalies 253. Data mining is a data analysis met"odology used to identi&y "idden kno%ledge o& a large data in databases and it "as been su##ess&ully used in di&&erent areas in#luding t"e edu#ational en*ironment. Data mining met"odology is used to study students' $er&orman#e and $ro*ide many tasks t"at #ould be used in $redi#ting and &ore#asting a#ademi# $er&orman#e. !"e reasons o& good or bad $er&orman#es o& t"e students s"ould be one o& t"e main interests o& tea#"ers. !"e tea#"ers #an $lan and #ustomize t"eir tea#"ing $rogram, based on t"e &eedba#k o& t"e students 2,3. Data mining is one o& t"e $o%er&ul analyti#al tool a$$roa#"es, %"i#" #an $ro*ide an e&&e#ti*e assistan#e in re*ealing #om$le4 relations"i$s be"ind t"e students' grades and $er&orman#es 243.
Fi*)"e /- F"+0e, "1 2 " A3+de0i3 Pe"2 "0+&3e !"e data &rom t"e student or a$$li#ant %ill store into database. !"e system %ill get t"e data &rom t"e database and &lat &iles to #ombine t"e $ossible data needed in order to get %"at indi#ator or $redi#tor %ill used. !"e large data %ill &iltered using #leanse and trans&orm to utilize t"e $redi#tors to kno% t"e in$ut *alue to #reate &ore#asting model to $redi#t t"e $robability o& t"e students to &inis" t"e Ba#"elor o& S#ien#e in om$uter S#ien#e #ourse in &our years in time and %"o among t"e student are not. !"e de#ision *ariable ser*e as t"e inde$endent *ariable in t"is study and t"e $robability o& graduating %ill be t"e de$endent *ariable. !"e $attern re#ognition $ro*ides t"e reasonable ans%er &or all $ossible in$uts and t"e de#ision makers in*ol*ed on %"at are t"e results in *isualization and *alidation &or t"e $robability o& t"e graduating students. 7s a %"ole t"e de#ision makers "a*e an in&luen#e to de#ide t"ings and #an iterate t"e $ro#ess o& t"e $ro$osed study to make t"e model more e&&i#ient and a##urate.
!"e resear#"ers s"o%ed t"e $redi#tors to be #onsidered %"i#" %ere t"e sub<e#t #ode &rom mat"emati#s and s#ien#e sub<e#ts, ma<or sub<e#ts, and general edu#ation sub<e#ts to easily *isualize sub<e#ts in #urri#ulum &rom t"e sub<e#ts in &irst year to &ourt" year in om$uter S#ien#e su#" as S>!8 ?, @S!11, 8@AL>1, S>5,1 A;7, M7!>1B5, S>445, S>1,5 A;7, S>545, 18>,, @S!15, 8@AL>., 8@AL>5, M7!>1B1, S>,44, 1?(LC>1, S>,,6 A;7, 18>5, S>4,,, S>4,4, M7!>,41, S>,,5, S>545 A;7, 18>4, 0(L>57, 1CL>S (>5, S>145 A;7, 1?D>5 A;7, S>,41 A;7, S>141 A;7, 8@AL>4, E7LF8S, S>,41 A;7, S>541 A;7, 1?D>1 A;7, S>5,, A;7, 0(L>1, S>,,1 A;7, L(!>1, S>,,,, S>4,5, S>5,5 A;7 and S>,45 A;7. !"ese *ariables #an be #onsidered to "a*e an in&luen#e on t"e $er&orman#e o& students 2G3. 5.. CORRELATIONS OF THE PREDICTORS TO THE ACADEMIC PERFORMANCE OF BSCS STUDENTS orrelation des#ribed t"e degree o& #orres$onden#e bet%een t%o or t"ree *ariables. !"is ty$e o& Bi*ariate #orrelation test re+uired t"at t"e *ariables bot" "a*e a s#ale le*el o& measurement order &or t"e *alues and t"e distan#e in bet%een t"e *alues #an be determined 2B3. !"e resear#"ers sim$li&ied t"e $redi#tor *ariables into ,
Fi*)"e .- M de$ S)00+"y i& M)$'ip$e Li&e+" Re*"essi & R means is a #om$anion to a$$ly regression and its automati#ally $ro#ess t"e log base 5 o& in#ome in t"e e+uation %"i#" is t"e Multi$le Linear Regression model. R s+uare measures t"e relations"i$ bet%een a $ort&olio and its ben#"mark. (t #an be measure "o% #lose t"e data are to t"e &itted regression line. !"e resear#"ers test #oming &rom t"e "istori#al data o& t"e res$ondents, t"e *alue o& R is e+ual to .B.K and R s+uare is e+ual to .GK-, it means t"at R indi#ates t"e model e4$lains all t"e *ariability o& t"e res$onse data around its mean. !"e result *alue &rom t"e model summary in R L .B.K, R s+uare L .KGK- and ad<usted R s+uare L .41- is better, be#ause in general t"e "ig"er t"e R)s+uared t"e better model &its in t"e data. (& t"e results o& R s+uare indi#ate K/ meaning t"e model e4$lains none o& t"e *ariability o& t"e res$onse data around its mean. !"e standard error o& t"e 8stimate is #losely related to t"e +uantity o& standard de*iation. Standard error o& t"e 8stimate is e+ual to .KG,/ it means &rom 1KK/ a##ura#y o& t"e model t"e test result is almost G/ e+ui*alent o& -4/ to 1KK/. G/ is not t"at bad using standard error be#ause t"e true *alue o& t"e standard de*iation is usually unkno%n. (n su#" #ases it is im$ortant to be #lear about %"at "as been done and to attem$t to take $ro$er a##ount o& t"e &a#t t"at t"e standard error is only an estimate. !"e resear#"ers test t"e true *alue or t"e a##ura#y o& t"e Multi$le Linear Regression using M718 9mean absolute $er#entage error:. @ormal 1robability $lot #om$ares t"e distribution o& t"e residuals to a normal distribution and assessing %"et"er or not a data set is a$$ro4imately normal distributed. !"e data are $lotted against t"eoreti#al normal distribution in a
!"e 1earsonJs orrelation bet%een *ariables is a measure o& "o% %ell t"ey are related. !"e most #ommon measure o& #orrelation in stats is t"e 1earson orrelation 9te#"ni#ally #alled t"e 1earson 1rodu#t Moment orrelation or 11M :. (t s"o%s t"e linear relations"i$ bet%een t%o sets o& data. !"ere is strong relations"i$ bet%een t"e *ariables i& t"e $)*alue is #lose to 1, it means t"at #"anges in one *ariable are strongly #orrelated %it" t"e #"anges in t"e se#ond *ariable. !"e Sig. 95) tailed: *alue tells i& t"ere is a statisti#ally signi&i#an#e #orrelations bet%een your *ariables. (& t"e Sig. 95) tailed: *alue is less t"an to .K1 it #on#lude t"at t"ere is a signi&i#an#e #orrelation bet%een your *ariables. (n t"is #ase, $)*alue &or Ma<or sub<e#ts is e+ual to . G6K, Mat I S#i. sub<e#ts is e+ual to .44- and Aen8d s"o%ed a number o& .66- %"i#" means t"e relations"i$ bet%een t"e Ma<or and Aen8d sub<e#ts are more moderate asso#iation. !"e relations"i$ o& Mat I S#i. sub<e#ts is %eak #orrelated %"ile t"e Sig. 95)tailed: *alue &or Ma<or sub<e#ts, Mat I S#i. sub<e#ts and Aen8d sub<e#ts is .KKK it means t"at t"ere is a signi&i#an#e #orrelations bet%een Ma<or, Mat I S#i. and Aen8d sub<e#ts. 5.6 DATA MINING TECHNIQUES AND ALGORITHMS 5.6./ REGRESSION Regression analysis is a statisti#al te#"ni+ue &or studying linear relations"i$s among *ariables and to $redi#t a #ontinuous de$endent *ariable &rom number o& inde$endent *ariables and t"e a#t or an instan#e o& regressing. !"e resear#"ers used t"e regression analysis to "el$ understand "o% t"e ty$i#al *alue o& t"e de$endent *ariable #"anges
Fi*)"e 6- N "0+$ P" 7+7i$i'y P$ ' Usi&* M)$'ip$e Li&e+" Re*"essi & (n e4$e#ted #umulati*e $robability s"o%s t"at uni&orm distribution "as an S s"a$e and it mat#"es t"e $attern o& a set o& $aired data. !"e resear#"ers belie*e t"at it indi#ates normal distribution into long)tailed be#ause t"e #ur*e starts belo% t"e normal line, bends to &ollo% t"e #ur*e and ends abo*e. (t means t"at more *arian#e t"an you %ould e4$e#t in a normal distribution and t"e resear#"ers agree t"at normal distribution #an be im$ro*e u$on as a model &or testing. 5.6.. DECISION TREE De#ision tree #reates a tree)based #lassi&i#ation model. (t #lassi&ies #ases into grou$s or $redi#ts *alues o& a de$endent *ariable based on *alues o& inde$endent *ariables. !"e $ro#edure $ro*ides *alidation tool &or e4$loratory and #on&irmatory #lassi&i#ation analysis 2.3.
Fi*)"e 5- M de$ S)00+"y Usi&* CHAID Me'( d !"e resear#"ers used ?7(D met"od to #ategorize ea#" $redi#tor i& ea#" *ariable are not signi&i#antly di&&erent %it" res$e#t to t"e de$endent *ariable. 0igure 4, indi#ates t"at only one o& t"e sele#ted inde$endent *ariables made a signi&i#ant enoug" #ontribution to be in#luded in t"e model %"i#" is t"e CM!>441.
Fi*)"e 8- M de$ S)00+"y P" d)3ed 7y !59 De3isi & T"ee M4. s"o%s t"e error le*el %"en a$$lying t"e #lassi&ier to t"e training data. !"e most im$ortant &igures &rom model summary are t"e numbers o& #orre#tly and in#orre#tly #lassi&ied instan#es. Fsing M4. #lassi&ier, #orre#tly #lassi&ied instan#es is e+ual to G5/ %"ile in#orre#tly #lassi&ied instan#es is e+ual to ,./. Mean absolute error is e+ual to K.K165 %"i#" is t"e measure "o% #lose t"e &ore#asts or $redi#tion are to t"e e*entual out#omes. !"e results using ?7(D met"od is a$$ro4imately "ig" #om$ared to J48 #lassi&ier.
!able 4 s"o%s t"e a##ura#y and e&&i#ien#y o& t"e model. (D, te#"ni+ue "as a lo%est $er#entage error o& K.K165/ or 1.65/ indi#ates t"at t"e a##ura#y le*el o& t"e gi*en model is -..4./ out o& 1KK/ 21K3. ?7(D met"od also s"o%ed an a##e$table le*el o& a##ura#y. (n Multi$le Linear Regression, t"e resear#"ers used Mean 7bsolute 1er#entage 8rror 9M718: in order to #al#ulate t"e e&&i#ien#y o& t"e model %"i#" results to $er#entage error o& 5.-,/. (t means t"at t"e a##ura#y le*el using Regression analysis is -B.KB/. Multi)layer &eed)&or%ard algorit"m s"o%ed a "ig"est $er#entage error. CONCLUSION !"is study #ould be a great "el$ &or om$uter S#ien#e students and &or t"e tea#"ers to im$ro*e students' a#ademi# $er&orman#e, trim do%n &ailure rate, to better understand students' be"a*ior, and to im$ro*e tea#"ing. !"is study #an "el$ de*elo$ a &ait" on data mining te#"ni+ues so t"at $resent edu#ation systems may ado$t t"is as a strategi# management tool. Arade $oint a*erage 9A17: is used in "ig"er learning institution to dis#o*er kno%ledge &rom edu#ation data and students' $er&orman#e $lays an im$ortant role in $rodu#ing t"e best +uality graduates. 7#ademi# a#"ie*ement, grades are t"e main &a#tors t"at #an se#ure a stable <ob in li&e and all t"e students must gi*e t"e greatest e&&ort. (n sim$li&ying t"e *ariables into t"ree #ategories su#" as Mat"emati#s I S#ien#e, Ma<or, and Aeneral 8du#ation sub<e#ts t"ere are signi&i#ant relations"i$ bet%een t"em. !"e result o& t"is study indi#ates t"at data mining te#"ni+ues $ro*ided e&&e#ti*e im$ro*ing tools &or students' a#ademi# $er&orman#e. (t s"o%s "o% use&ul data mining #an be in "ig"er learning institutions es$e#ially using De#ision tree and Regression $arti#ularly to $redi#t a number and estimates t"e *alue o& t"e target as a &un#tion o& t"e $redi#tors &or ea#" #ase in t"e build data. 7lso S1SS gi*es an entire analyti#al $ro#ess &rom
!able , s"o%s t"e a##ura#y o& ?7(D, (D,, and Multi)layer &eed)&or%ard algorit"ms &or #lassi&i#ation a$$lied on t"e data. ?7(D te#"ni+ue "as "ig"est a##ura#y o& B5.G/ #om$ared to ot"er met"ods. (D, algorit"m also s"o%ed an a##e$table le*el o& a##ura#y %"ile Multi)layer &eed)&or%ard "as a lo%est a##ura#y o& 6K../ 2-3.