Download AI Based An$virus
Document related concepts
no text concepts found
Transcript
AIBasedAn*virus: Detec*ngAndroidMalwareVariantsWitha DeepLearningSystem ThomasLeiWang @thomaslwang About me • My first (boring) job was a virus analyst in 2004. • I had a dream… Virus Analysis VS Image Recognition ImageProvidedbytheMNISThandwriHendatabase Experiencedvirusanalystsome*mesisdoingimagerecogni*on! Sample increase VS signature efficiency decrease NumberofMaliciousAndroidApps Dowgin:ARichVariantsAndroidAdwareFamily NewDowginSamplesVSAverageDowginSamplesHitPerSignature Maliciousapps,DowginsamplesandDowginsignaturesarecountedfromourdatabase. Our evolution Signature basedrules Behavioral basedrules Opcodebased rules AIbaseddeep learningsystem Training Feature Extrac+on Feature Normaliza+on • Structural type • Sta*s*caltype • Empiricaltype • Continuous value • 0-1value TraininginDeep NeuralNetwork • Standard score normaliza*on • CuVng technique • Quan*le normaliza*on • PaddlePaddle plaYorm • Residuallayer • AutoEncoder • Configura*on tunings Models • Malware model • PUAmodel Prediction InputAPK features Model Output Feature extraction Numeraliza*on(N=1235) Structuralfeatures • Numofuses-permissonsinAndroidManifest • Numberofpicturefilesin/res • Sizeof/res • NumberofclassesstartswithLcom/ • NumofclassesstartswithLjava/ • Numoffieldstypeboolean • Numofmethodswhichhasparameters>20 APK Sta*s*calfeatures • Countcer*ficatefieldsinsamplestoget 100stringswithdiscrimina*veinfo.E.g. [email protected] malicious/benign=52 Empiricalfeatures • Hasexecutablefilein/res • Hasapkfilein/assets • RegisterDEVICE_ADMIN_ENABLED broadcastandhassendSMSMessage permission 205 3 34.5 143234 285 68 296 7 13850 157 11218 847 1.23e+9 422 1004 177 0 398 13.333 125 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 Con*nuousvalue (N=571) 0-1value (N=664) Tomakefeaturesmorediscrimina*ve Precisionincreasedby9% Feature normalization Standardscore normaliza*on Gaussiandistribu*on CuVng technique Con*nuous value Noiseproblem CuVng technique Mul*modaldistribu*on Quan*lenormaliza*on Long-taileddistribu*on [-1,1] Training in deep neural network Inputlayer Hiddenlayer1 Normalized Con*nuous value (n=571) Configura*ons: • Hiddenlayerac*va*onfunc*on:Tanh andReLU • Costfunc*on:Mul*classcrossentropy • Learningmethod:ADADELTA • Finallayerac*va*onfunc*on:Sormax • Passes:20–30 outputlayer Hiddenlayer2 Hiddenlayer3 Residuallayer iden*ty n=256 Sormax ReLU Tanh 0-1value (n=664) ReLU n=256 AutoEncoder 1 Tanh Tanh n=256 0 n=256 n=256 NetworkArchitecture TrainedonPaddlePaddleplaYormwith15M+samples Tanh ReLU Prediction & Evaluation Apkfeatures Models Predic*on 0.995 0.99 0.985 0.98 0.975 0.97 0.965 0.96 0.955 Sendfeaturesto thecloud Extractapkfeatures onthephone Perf:140ms/apk Traffic:1kB/apk Returnpredic*onto thephone Modellife+me Recall Predictin thecloud Detec+onperformance TruePosi*veRate Produc+ondeployment 0.95 0 0.02 0.04 0.06 0.08 0.1 FalsePosi*veRate 0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 0.89 0.88 Jan2016 Mar2016 May2016 Jul2016 Detec*onperformanceasROCcurve Thelife*meofmodeltrainedonJan2016 ROCcurveistestagainstAV-TESTJuly’ssamples: 7613Androidmalware,3020legi*mateAndroid apps,total10633. ThemodelistrainedonJan2016andtested againstAV-TESTJan,Mar,MayandJuly’s samples.Recallratedroppedby7.6%in6 months. Limitations Advantages • Can’t provide explanations for its detection results • Can’t understand code meaning. • Build on static analysis and lack of dynamic inspection. • Can’t self learning, need continuous training with labeled data. • More difficult to evade • Fixed-size Conclusion • Feature extraction is the key step • Virus analyst experience can help to find valuable features. • AutoEncoder neural network can be used to extract the most valuable features from a large number of features. • This system is designed to detect Android malware, but these methods can also be used in detecting malware in other platforms. • Our system learns in image recognition way. It’s effective only in detecting malware variants. Thank you • Welcome contact me • Twitter: @thomaslwang • Email: [email protected] • Welcome cooperation and partnership with us • Acknowledgement • Baidu IDL: Lyv Qin, Xiao Zhou, Jie Zhou, Errui Ding, Yuanqing Lin, Andrew Ng • Partner: Liuping Hou, Jinke Liu, Zhijun Jia, Yanyan Ji • PaddlePaddle platform http://paddlepaddle.org
Related documents