MD5

MD5

MD5 (Message-Digest algorithm 5) is a cryptographic hash function invented by Ronald Rivest in 1991. Recently, collisions have been detected in MD5. First, a team of Chinese cryptanalysts led by Xiaoyun Wang and Hongbo Yu found a collision, but they did not initially share their method. Others were able to reverse engineer this attack, and the Wang Yu team did eventually publish a report with their method.

Implementation

This implementation of MD5 was built (with help from my partner on this project, Ying Zhang) following notes from a manuscript by Professor Mark Stamp. This implementation can process strings, files of any size, and hexadecimal representations of bytes. There is also a “-test” option, which will run all of the test cases from the reference implementation. In addition, we have added 3 more cases. One is a string exactly 64 characters in length. The other two are byte representations that produce a collision.

Practical MD5 Attacks

The Wang/Yu attack produces collisions with only minor bit differences, and only on seemingly random values. Also, since the attack requires the ability to control both sets of inputs, it is easy to think that this is of little real-world use.

The weakness of MD5 is that it only makes one pass over the data. (In fact, it appears that an easy way to foil the Wang/Yu attack would be to simply rehash the file using the output of the first hash as the new initialization value. This would prevent the attack since a collision with one initialization value is highly unlikely to still be a collision with a different initialization value). As a result, if h(M) == h(M'), then h(M+X) == h(M'+X).

This seems less than useful, since M and M' are only random 1024 blocks of bits. What, therefore, is the use of appending the same bits to these values?

Simply put, the very fact that there is even a single bit difference can be exploited by inventive attackers. The idea is to include conditional processing in X that is appended to both values. Then, although the files are nearly identical, they will behave entirely differently.

Colliding Postscript Files

Of all the demonstrations I have seen, this attack by Stefan Lucks at the University of Mannheim is the most visually impressive. In this attack, two seemingly entirely different postscript files are presented. In the story, Alice’s boss signs one, but then Alice is able to use that same signature for the other file.

This attack is easy to replicate. In fact, here is a recommendation letter and an authorization letter that we had created for this project. It is worth noting, that while we could have done this from scratch, there was no need to. By using the documents produced by Stefan Lucks as a template, we were able to perform this attack with a simple text editor. Their hash values are the same:

$ md5Hash -f recommendation.ps

MD5(recommendation.ps) =

f9a4617288a52541a6f641c3d4f462d6

$ md5Hash -f authorization2.ps

MD5(authorization2.ps) =

f9a4617288a52541a6f641c3d4f462d6

This is a brief description of how this attack was performed. First, the beginning of the file contains the needed postscript header information and a ‘(‘. Whitespace is included before the ‘(‘ to pad out this section to exactly 64 bytes. This then serves as the initialization value for the next block.

The next step is to calculate colliding 1024-bit blocks M and M' with the needed initialization value.

After this, we put in a ‘,’ and in both documents add M plus “)eq”. In postcript, this indicates an equality comparison. Using this, we can then append and ifelse statement with the two messages we want.

Here is an excerpt from one of the postscript files. ^M’s have been left in to illustrate the whitespace padding. The section in blue serves as the IV value. The ‘)(‘ and ‘)eq’ in pink highlight the break between M' and M. Message 1 is in green. Message 2 is in red:

%!PS-Adobe-1.0^M

%%BoundingBox: 0 0 612 792 ^M

(?B?j?M^@??~I_??~\??/ʷ~W

F~??^CT>??^?^HEm3^E^A?S?[?| ~C^S?xR?Z3?6~Y^M~\^GnEZR~Dy?/?^O~U?W?v:쿪^O^K?ʵsY?2?}^K,??v?6^^ ??;~C??}??6^\S~F??

k??~T?Na^_|~D~@`?o~T?^C~P)(?B?j?M^@??~I_??~\??/ʷ~W

F~??^CT>??^?^HEm3^E^A?S?[?| ~C^S?xR?Z3?6~Y^M~\^GnEZR~Dy?/?^O~U?W?v:쿪^O^K?ʵsY?2?}^K,??v?6^^ ??;~C??}??6^\S~F??

k??~T?Na^_|~D~@`?o~T?^C~P)eq{/Times-Roman findfont 20 scalefont setfont ^M

300 700 moveto (Prof. Mark Stamp) show^M

300 680 moveto (One Washington Square) show^M

300 660 moveto (San Jose, CA 95192) show^M

25 500 moveto (March 20, 2006) show^M

25 450 moveto (To Whom it May Concern:) show^M

25 400 moveto ^M

(Tom Austin and Ying Zhang have demonstrated decent programming)^M

show^M

25 380 moveto^M

(ability. They should do OK in any programming position, provided that)^M

…

(Sincerely,)^M

show^M

25 150 moveto^M

(Mark Stamp) ^M

show^M

}{/Times-Roman findfont 20 scalefont setfont ^M

300 700 moveto (Prof. Mark Stamp) show^M

300 680 moveto (One Washington Square) show^M

300 660 moveto (San Jose, CA 95192) show^M

25 500 moveto (March 20, 2006) show^M

25 450 moveto (To Bank of America:) show^M

25 400 moveto ^M

(Tom Austin and Ying Zhang are authorized access to all of my account)^M

show^M

25 380 moveto^M

(information and may make withdrawals or deposits.)^M

show^M

25 300 moveto^M

(Sincerely,)^M

show^M

25 250 moveto^M

(Mark Stamp) ^M

show^M

}ifelse^M

showpage^M

This attack could be easily detected simply by viewing the file in a text editor. However, most users would not be likely to do that. Also, while this attack specifically was done with postscript, it would seem that it could be applied to any document format that supported conditional processing. While inserting a macro into a Word document to do this attack might be a great deal more complex, it would also be much less likely to be detected or questioned.

Any digital signature to the first message would apply to both.