Detection of Malicious Android Applications: Classical Machine Learning vs. Deep Neural Network Integrated with Clustering

Hemant Rathore, Sanjay K. Sahay, Shivin Thukral, Mohit Sewak

February 2021

Abstract

Today anti-malware community is facing challenges due to ever-increasing sophistication and volume of malware attacks developed by adversaries. Traditional malware detection mechanisms are not able to cope-up against next-generation malware attacks. Therefore in this paper, we propose effective and efficient Android malware detection models based on machine learning and deep learning integrated with clustering. We performed a comprehensive study of different feature reduction, classification and clustering algorithms over various performance metrics to construct the Android malware detection models. Our experimental results show that malware detection models developed using Random Forest eclipsed deep neural network and other classifiers on the majority of performance metrics. The baseline Random Forest model without any feature reduction achieved the highest AUC of 99.4%. Also, the segregating of vector space using clustering integrated with Random Forest further boosted the AUC to 99.6% in one cluster and direct detection of Android malware in another cluster, thus reducing the curse of dimensionality. Additionally, we found that feature reduction in detection models does improve the model efficiency (training and testing time) many folds without much penalty on effectiveness of detection model.

Type

Conference paper

Publication

Broadnets 2020

Shivin Thukral

Machine Learning Engineer

Working as an MLE on building recommendation systems using ML and NLP techniques