|
- 2018
A TEXT MINING APPLICATION ON THE DETERMINATION OF SPAM-CONTENTED E-MAIL: POLARIZATION OF TERMS BASED ON THE GAMA RELATIONSHIP COEFFICIENTKeywords: Metin Madencili?i,Veri Madencili?i,Genelle?tirilmi? Lineer Model,Polarite,Gama ?li?ki Katsay?s?,S?n?fland?rma,?leti?im,?stenmeyen ??erik Abstract: The development of technology has also changed the level and form of communication. Two-ended closed-circuit communication (telephone, letter, telegraph, etc.) models have been replaced by communication models that are originated from a single point and opens to the world (Facebook, Twitter, Instagram, etc.). While this makes it impossible for us to determine the limits of communication personally, it also makes a lot of personal communication paths that cannot be hidden (E-mail, Whatsapp number, etc.). The current situation carries many risks, such as by a simple e-mail, that private data stored on the computer gets into the hands of undesirable people. In order to prevent this, many virus software is being developed and it helps to detect the risky elements encountered in electronic environment. However, some risky elements appear as a normal text rather than a virus format. In such cases it is necessary to examine the relevant text as content and decide whether it is risky or not. In this study, e-mails with spam and ham content are determined and classified by a text mining algorithm. For this purpose, a composite polarity variable based on the gamma relationship coefficient was created and generalized linear models were established on this variable. The average classification success of the models is approximately 81.2%
|