The MD5 checksum for a file is a 128-bit value, something like a fingerprint of the file. There is a very small possibility of getting two identical checksums of two different files. This feature can be useful both for comparing the files and their integrity control.
Let us imagine a situation that will help you to understand how the checksum works.
Alice and Bob have two similar huge files. How do we know that they are different without sending them to each other? We simply have to calculate the checksums of these files and compare them.
The RFC 1321 describes MD5 checksum (MD5 message-digest) as:
The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest.
The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.
The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, the MD5 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly.
You can read full copy of RFC 1321 here...