A forum for reverse engineering, OS internals and malware analysis 

All off-topic discussion goes here.
 #22772  by TeleZed
 Thu May 01, 2014 10:21 am
Attention: this post is not intended to being personal, or question anyone's skills in malware research.
I'm a newbie in malware analysis (although having 9 year of experience in ITSEC), and found a great community here, with a great resource of information.
We all are on the same side, fighting malware, and make the malware writers/operators life harder.
Please, take my post as a constructive criticism.

Now, after this long prologue, the reason I started this post is because:
I was so pissed off that a lot of malware research and analysis is published with the MD-5 hash of the malware only. And this is wrong. Very wrong.
Although it was practically safe to use MD-5 hashes 10 years ago to identify malware, since 2006, it is not.
Even a script kiddie can create two different malware having the same MD-5 hash.
The best would be to totally ban MD-5 from malware analysis at all. Using MD-5 to identify malware is almost the same as using CRC-32. Although CRC-32 has the advantege, because it takes less storage :geek:

I don't remember any case where the criminals did take advantege of this problem, but it might be that we as a community did not even notice this...

If you are interested in the details, I wrote a blog post about this issue.
http://jumpespjump.blogspot.com/2014/03 ... 5-now.html

TL;DR : Don't use MD-5 to identify malware samples. Believe me, it is a bad idea. Use SHA-256 or a stronger hash function.
 #22775  by Xylitol
 Thu May 01, 2014 5:41 pm
TeleZed wrote:Even a script kiddie can create two different malware having the same MD-5 hash.
TeleZed wrote:I don't remember any case where the criminals did take advantege of this problem
conclusion: no one care.

MD5 is still reliable for sharing sample....
Bad guys have no time to waste trying to calculate several hours for doing a lame hash collision on a executable.
That can be unsafe from 2006, but i never seen a malware with a calc.exe hash or whatever legit windows executable file.
And that ain't worth it, if a researcher get the sample, he get the sample, end of story, MD5 jamming is useless.

The only thing about MD5 i wouldn't recommend is to use it alone on authentication system.
 #22776  by Vrtule
 Thu May 01, 2014 6:31 pm
Hello,

So, If I understand you correctly, the following attack on MD5 is now well possible:
Input: Fixed message M, H = MD5(M)
Output: Message M', different from M but having certain strong properties (like containing certain executable or code), MD5(M') = H.

I know that the second pre-image attack on MD5 is possible, howerver, I thought that the M', the pre-image generated by the attacker, is either quite random, or has to be quite similar to the original message. In other words, I thought that the attacker still has not enough control to abuse this attack for things you are suggesting.

Well, my knowledge on this subject is quite old, I admit. So, I would be very happy if someone corrects me.
 #22783  by R136a1
 Thu May 01, 2014 7:38 pm
I think TeleZed is absolutely right with his request to finally stop using MD-5 hashes for malware identification. Especially if you claim to be a professional security company it is essential that you use at least SHA-1 for sample identification. it says a lot about a security company which still uses MD-5 as the only source for malware identification, as already stated by TeleZed.

What annoys me the most is that every security company uses either MD-5, SHA-1 or SHA-256. What is the problem of using only SHA-256?
 #22784  by TeleZed
 Thu May 01, 2014 9:45 pm
Xylitol wrote: Bad guys have no time to waste trying to calculate several hours for doing a lame hash collision on a executable.
Criminals stealing our money, I agree, they won't bother. Nation state attackers? Who knows ...
Xylitol wrote: That can be unsafe from 2006, but i never seen a malware with a calc.exe hash or whatever legit windows executable file.
The real threat is not that malware has been created with the same hash as calc.exe. That attack is still theoretical to my knowledge.
The problem is when malware developer creates 2 (or more) different binaries having the same MD-5 hash.
All binary have totally different functions, but are analyzed only once because they have the same (MD-5) hash.
Xylitol wrote: The only thing about MD5 i wouldn't recommend is to use it alone on authentication system.
I might be wrong here, but the biggest problem with storing passwords hashes with MD-5 hash only, is not MD-5 as a hash function, but the lack of salts and lack of iterations. Having a (different and random) salt and 10.000 iterations with MD-5 is still a better password hash than a simple SHA-256 only.

My message is, that there are a lot of problems with MD-5, and I can't see any reason why not to use SHA-256 (or stronger).
Does it take more time to calculate? Is it longer? I don't see any of this as a real problem.
Just forget MD-5 forever, and everybody will be happier than before. Especially me :)
 #22787  by TeleZed
 Fri May 02, 2014 6:55 am
Buster_BSA wrote:I guess the speed is the reason to explain why some companies are not using SHA-256.
The first Google result for comparing speed performance http://atodorov.org/blog/2013/02/05/per ... 56-sha512/
MD5 10.275190830230713, 10.155328989028931, 10.250311136245728
SHA1 11.985718965530396, 11.976419925689697, 11.86873197555542
SHA256 16.662450075149536, 21.551337003707886, 17.016510963439941
SHA512 18.339390993118286, 18.11187481880188, 18.085782051086426

I believe if someone was able to run MD-5 in 2006 for a specific task, now the computing power should be enough to even calculate SHA-512. Unless the company is using the same hardware ...
 #22790  by Xylitol
 Fri May 02, 2014 11:57 am
About hash collisions on sample, just check on VTMIS for example, on the whole engine only 19 files are detected with collition but on all those files, all of are repack of the two samples i attached on the forum.
I searched yesterday about sample colliding but still.. i've found nothing (just some theory articles).
For sample tracking Mandiant developped Imphash as solution https://www.mandiant.com/blog/tracking- ... t-hashing/
working pretty good for unpacked samples, but when most of them are packed... (note that this one can be used for both (group and single))
TeleZed wrote:All binary have totally different functions, but are analyzed only once because they have the same (MD-5) hash.
Image
Who the hell base a detection just on md5 nowodays :|
Anyway even for sharing, i still think MD5 collision is not something we should be worried about.
I will change my opinion on MD5 when someone will do match a MD5 hash of a 100kb malware with a 1000kb malware with a complete different structure.
like a Zbot (Citadel is having 221kb not packed) having the same MD5 of an Atrax (+1Mb), if you can do it TeleZed i would appreciate an example.
 #22794  by TeleZed
 Fri May 02, 2014 3:25 pm
Xylitol wrote: I will change my opinion on MD5 when someone will do match a MD5 hash of a 100kb malware with a 1000kb malware with a complete different structure.
like a Zbot (Citadel is having 221kb not packed) having the same MD5 of an Atrax (+1Mb), if you can do it TeleZed i would appreciate an example.
Challenge accepted! It's on my TODO list.