[ietf-dkim] canonicalized null body and dkim

Charles Lindsey chl at clerew.man.ac.uk
Mon Jan 8 03:51:13 PST 2007


On Sun, 07 Jan 2007 18:49:50 -0000, Eric Allman <eric+dkim at sendmail.org>  
wrote:

> I have (finally) managed to slog my way through all the messages on this  
> topic.  Let me start out by saying that I don't see the ambiguity in the  
> current text:
>
>         If there is no trailing CRLF on the message, a CRLF is added.
>         It makes no other changes to the message body. In more formal
>         terms, the "simple" body canonicalization algorithm converts
>         "0*CRLF" at the end of the body to a single "CRLF".
>
> So if the message ends without a CRLF (which should only be possible  
> using CHUNKING) one gets added.  In particular, this is important  
> because if a message is sent using CHUNKING through one relay and DATA  
> through another, the CRLF will have to be added to get the <CRLF>.<CRLF>.

Indeed there is no ambiguity in that, but that is because you have only  
quoted half the text. The full text is:

    The "simple" body canonicalization algorithm ignores all empty lines
    at the end of the message body.  An empty line is a line of zero
    length after removal of the line terminator.  If there is no trailing
    CRLF on the message, a CRLF is added.  It makes no other changes to
    the message body.  In more formal terms, the "simple" body
    canonicalization algorithm converts "0*CRLF" at the end of the body
    to a single "CRLF".

Observe carefully that the text some times tells you to consider the  
"message", and somtimes the "message body" (which I take to mean exactly  
the <body>, if any, defined by RFC 2822).

Consider the example, in DATA format:

    Field: foobar<CRLF>.
    <CRLF>
    <CRLF>
    .<CRLF>

The ".<CRLF>" is evidently not a part of either the message or of the  
message body. The "message body" consists of "<CRLF>". Let us apply the  
sentences of 3.4.3 one by one.

    The "simple" body canonicalization algorithm ignores all empty lines
    at the end of the message body.

To see what an "empty line" is we need one more sentence;

    An empty line is a line of zero
    length after removal of the line terminator.

So the line "<CRLF>" (which is the whole of the message body) IS an empty  
line.
That empty line is at the end of the message body, so we ignore it. That  
leaves the message

    Field: foobar<CRLF>.
    <CRLF>

Take the next sentence:

    If there is no trailing
    CRLF on the message, a CRLF is added.

There IS a trailing CRLF on the message (NB, it does not say "message  
body" there), so we add nothing. We still have:

    Field: foobar<CRLF>.
    <CRLF>

Take the next sentence:

    It makes no other changes to
    the message body.

So we are finished. We have a message with an empty <body>, so that empty  
<body> is what we hash.

Take the next sentence:

    In more formal terms, the "simple" body
    canonicalization algorithm converts "0*CRLF" at the end of the body
    to a single "CRLF".

That is supposed to produce the same result, so start over with the  
original message:

    Field: foobar<CRLF>.
    <CRLF>
    <CRLF>
    .<CRLF>

of which we have already seen that "<CRLF>" is the body.

Indeed if contains ""0*CRLF" at its end (in fact, it contains 1*CRLF), so  
we convert it to "<CRLF>", and that is what we hash.

Therefore, the description in the first four sentences produces a  
different result to the supposedly identifal description in the fifth  
sentence.

Q.E.D.

Now it appears that some implementations have followed one interpretation  
and some the other, so something needs to be fixed. My suggested wording  
is:

    The "simple" body canonicalization removes empty lines from the end of  
the
    body until either the last line is non-empty, or no lines remain. An  
empty
    line is a line of zero length after removal of any terminating CRLF. If
    the body is not now empty and the last line is not already terminated by
    CRLF, a CRLF is added to it.

       INFORMATIVE NOTE: Following [RFC 2822}, the CRLF which separates the
       header fields from the body is NOT part of the body, and therefore is
       never presented to the signing or verification algorithm. In the case
       of a pure binary message (such as one with a  
Content-Transfer-Encoding
       of 'binary') the concept of "lines" may not be meaningful.  
Nevertheless,
       wherever the pair of octets that represent CRLF happens to occur,  
that
       is to be considered as the end of a "line" for the purposes of this
       canonicalization algorithm.

That follows what I consider to be both the spirit and the letter of the  
first four sentences, at the expense of ignoring and renmoving the fifth  
sentence.

In particular, it leads to the easily remembered invariant:

    "After canonicalization, there will NEVER be an empty line at the end
     of what remains to be hashed."

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl at clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5


More information about the ietf-dkim mailing list