|
MALURLS: A Lightweight Malicious Website Classification Based on URL FeaturesDOI: 10.4304/jetwi.4.2.128-133 Keywords: malicious websites , machine learning , genetic algorithm , classification Abstract: Surfing the World Wide Web (WWW) is becoming a dangerous everyday task with the Web becoming rich in all sorts of attacks. Websites are a major source of many scams, phishing attacks, identity theft, SPAM commerce and malwares. However, browsers, blacklists and popup blockers are not enough to protect users. That requires fast and accurate systems with the ability to detect new malicious content. We propose a lightweight system to detect malicious websites online based on URL lexical and host features and call it MALURLs. The system relies on Na ve Bayes classifier as a probabilistic model to detect if the target website is a malicious or benign. It introduces new features and employs self learning using Genetic Algorithm to improve the classification speed and precision. A small dataset is collected and expanded through GA mutations to learn the system over short time and with low memory usage. A completely independent testing dataset is automatically gathered and verified using different trusted web sources. They algorithm achieves an average precision of 87%.
|