## Cybersecurity. Algebraic Fuzzing

The project started at Glushkov Institute of cybernetics in 2019.

The main idea is to create a quick and efficient fuzzing engine based on algebraic modeling. It implements a symbolic execution engine of binary code of x86 instructions performing memory penetration analysis and detection of data that can corrupt memory or entail the other anomalies. It is anticipated that the traversal of paths of code execution uses symbolic modeling. During the traversal, the cycles are detected and analyzed for estimation of the size and content of external input with respect of memory crash or other malicious actions.

The traversal can be customized from the point of possible intrusion (command line, file reading, socket or keyboard).

It also detects the points of a possible crash and provides modeling of the writing of memory that causes this crash.

The example of fuzzing is below the figure.

We start fuzzing from the input point instructions (command line, file reading, socket or keyboard).

The input instructions contain (for example in Linux) the call of input function with some parameters that are in registers. The output of the function will be located also in registers and then written to the memory. The bytes can be input in a cycle. We can estimate the cycle or if it has no restrictions then we denote it as N, where N – the number of input bytes. We mark this memory as penetrated by the description of the predicate

addr can be computed from the instructions or it also can consider as arbitrary address.

Moving via program tree before the first possible memory crash point occurrence we can also get in different cycles where formula F(i) can change up to the cycle analysis.

Having such formula in the point of possible memory crash or point of gaining control, for example writing to the top of the stack, we should check the satisfiability of the following expression:

F(i)  &&  (i = j) && (j = topStack) || (j =0))

j is the address where MOV instruction writes the bytes (it could be top of the stack or equal Null-address).

If the formula is satisfiable then we can get one of the solutions by backward symbolic modeling from this point to the point of input and create an exploit.

Estimation of cycles.

During the estimation of a cycle we shall define the possible value of bytes in memory that can be penetrated. We use the conversion of assembler instructions to behavior algebra expressions.

In behavior algebra, we have the cycle a1,a2,….an,a1 as the sequence of actions corresponding to the set of instructions. Every action is defined as pre- and postcondition of action. Using predicate transformer definition we can get some term that corresponds to the cycle:

C = Pt(p1,(Pt(p2…..Pt(pn, Pt(p1,Env))…)

That corresponds to the path in the cycle.

First, we shall detect the class of the cycle. Whether the cycle corresponds to the class where we can estimate some variables or create some invariant for this variable. It shall be for example cycle with monotonous environments. The next we shall derive the changing of formula F(i) from the C.

Such formal methods of cycle estimations are described in the works of authors that will be presented in future conferences.