A forum for reverse engineering, OS internals and malware analysis 

Ask your beginner questions here.
 #29570  by nosecure
 Thu Nov 17, 2016 3:58 pm
Hi everyone,

I'm newbie of this field and I would like to ask you of a particular function on packer detector DiE (or for those people that don't like a acronym the software: Detect It Easy) the option Entropy what does it do? In addition to this what I can do with this information? How can I'll use it in reversing field? Let me know, thanks in advance. If you have some books or PDFs to advise please don't feel free to send them to me!

Best regards,

Nosecure
 #29571  by Vrtule
 Thu Nov 17, 2016 4:32 pm
Entropy is a term from the theory of information that, in simple terms, indicates how random the contents of your packet is. The higher the entropy is, the more random the packet data is (meaning that the probability that certain byte of the packet is set to X gets near to 1/256). If you wish to look at how high-entropy data look like, take a look at encrypted data – high entropy may be a good indicator of encrypted contents.
 #29572  by nosecure
 Thu Nov 17, 2016 4:43 pm
Hi Vrtule, for example in this graph what I can understand or however see? I have downloaded the "Elements of Information Theory" of Thomas M. Cover & Joy A. Thomas what you think about it? Let me know, thanks in advance! (nosecure)
Attachments
entropy_TeslaCrypt.jpg
entropy_TeslaCrypt.jpg (17.95 KiB) Viewed 463 times
 #29577  by Vrtule
 Fri Nov 18, 2016 2:07 pm
Hello,

I know nothing about the book, so I can tell you nothing about it. It would be probably best to start at Wikipedia; they give you the definition at least.

Let's say we have a random variable X that can be one of the following values: x_1, ..., x_n. Then, entropy of X (in Shannon sense) is defined as:
Code: Select all
H(X) = - sum_{i=1}^{n} (p_i * log_{2} p_i
where p_i is a probability that X = x_i.

This is kind of a definition, so let's see how to use it in practice; how to compute an entropy of a file, for example. In that case, we can define the X as a random variable modeling one byte of file contents – X \elem {0, ..., 255}. Then, p_i would be
Code: Select all
p_i = count(i) / file_size
where count(i)[/code] is a number of times a byte with value []b[i/b] is present within the file.

See this for more details:
http://stackoverflow.com/questions/9904 ... -of-a-file

Let's have a 1 MB file that contains all zeroes. Entropy of such a file would be zero, since p_i = 1 for i = 0 and 0 for i # 0. Since log 0 is not defined for real numbers, we do not count zero probabilities into the entropy formula.

Let's again have a 1 MB file with content following an uniform distribution (meaning all byte values have the same probability). p_i = 1/256 for i \in {0, ..., 255}, so the entropy is
Code: Select all
H(x) = - 256 * (1/256*log 1/256) = - log 1/256 = 8
The more random the file content looks like, the higher the entropy is.

------------------------------

You can also compute entropy for different regions of the file because entropy of the whole file is just one number that might not tell you anything interesting.

Well, now about the graph you posted.
I don't know how the graph was created so I am only guessing. It seems to me that the program got a series of packets on its input and computed entropy on byte values present on individual offsets of the packets. Then it normalized all the computed entropies, so they are between 0 and 1.

So, what I would deduce from the graph
The first part of the packets (from offset 0 to offset 100) has quite many low entropy areas which means the data on that offset do not vary too much. So, it might be some metadata informing the receiver about the packet content (maybe, there are IP and TCP/UDP headers visible too). Starting by offset 100, there is a region with quite high entropy which indicates quite random content (maybe some sort of encryption – but I would expect higher entropy for that case). At about offset 300, there is a small region with really low entropy; this looks to me as a footer metadata that are similar accross all the captured packets.

Hope this helps

Vrtule