Cybersecurity. Machine Learning and Algebraic Approach in Malware Detection

The project started 2019 by Oleksandr Letychevskyi and Tatiana Polhul. It was based on the Ph.D. thesis of T. Polhul and algebraic approach to malware detection based on the behavior algebra.

Tetiana Polhul Conceptual Model of an Intelligent System for Detecting Fraud During Mobile Applications Installation // IEEE International Conference on Dependable Systems, Services and Technologies (DESSERT 2019)

Suspicious behavior may not be detected because the formalized data in algebraic approach are insufficient. In this case, we use machine learning to create a model of classification that accepts similar data related to the defined slot of the attack. The corresponding neural network can be generated.

Let us describe the blockchain attack as a set of behavior algebra expressions. Watching the action in the blockchain, we receive actions from the network, convert them to behavior algebra specifications, and match them with the model of attack. After the attack is detected in behavior algebra expressions, we check the satisfiability conjunction of the precondition of the action in the model of attack in the environment obtained from the network.

Let us consider a sequence of actions in the network. Every action is connected to a time slot. The environment of the sequence is checked for consistency with the sequence of predicates in the description of the attack, especially in the action’s precondition.

Let t1,t2,… is the set of time slots. E(t1), E(t2), …  are the corresponding environments or dataset obtained from network. P1(t1),P2(t2),… is set of predicates that define the attack or fraud actions. If this set of predicates defined the fraudulent actions unambiguously, then we do not need an additional classification by the neural network. Otherwise, we use the model of classification for the further refinement of the fraudulent action.

Therefore, the final algorithm for the detection of a fraudulent action is the following:

  1. We obtain a sequence of actions in corresponding time slots matched to the behavior expression and the set of corresponding data E(t1),E(t2)… . We can obtain it from network functioning logs or sequentially online.
  2. We match the given data to the environmental conditions of the actions in the model description. If E(ti)&P(ti) are satisfiable, then there is a possible attack, or malicious actions are caused by fraudulent behavior.
  3. Otherwise we classify the given dataset E(t1), E(t2),… through the neural network that recognizes fraudulent behavior. If they are classified as fraudulent actions, then the attacks are recognized.
© 2019 LitSoft Enterprise R&D. All Rights Reserved. Designed By JoomLead