This is number five in a series of posts on developing a POP3 client in C#. Take a look at the previous ones:
MIME - Multipurpose Internet Mail Extensions
With the POP3 client developed so far, we are able to download mail messages from the server. We receive these messages in plain text, not very comfortable to read and definively not unsuitable for displaying in an application. First these messages may look a bit chaotic but in fact they are sort of object oriented.
I'll now overview the structure of MIME messages in short. If you'd like some more detailed info I recommend you to read the MIME article on Wikipedia and the RFCs related to MIME, starting with RFC 2045.
A MIME messages consists of one or several parts called entities. Each entity has some headers and a body part which can either hold some content like the message or an attachment, or other entities.
There's only one header which occurs in every header: Content-Type. This can be any kind of Internet Media Type like text/html, image/gif or audio/mpeg. There are two types you may are not familiar with but are important for MIME messages. These are multipart/mixed and multipart/alternative.
Multipart/mixed defines an entity which contains other entities of various content and content types. Multipart/alternative is used to give a alternative view of the same content as plain text message or HTML formatted message.
The sample MIME message shown in the figure below is a message with the content type "multipart/mixed" which contains a multipart/alternative part with a plain text and the HTML view and two attachments:
So I said that MIME is some sort of object oriented. Each box in the figure is an object with some headers and some content or a collection of other object of the same type (maybe not the same content type, but always with a header and some content or a collection or other objects of the same type :-)
MIME Headers
The headers section of a MIME entity is composed as follows (example):
From: <sender@example.org>
To: "Recipient" <recipient@example.org>
Subject: Here comes the subject of the email
Date: Fri, 16 Mai 2008 20:23:48 +0100
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="-----=_NextPart_000_001A_123849.12A98DE
X-Priority: 3
...
It's always the name of the header followed by a colon and a value. Maybe you have noticed the indention seventh line. The Content-Type header can contain more than one value, so after the first one there's a semicolon and the second value is on the next line with some whitespace(s) at the beginning.
So parsing the headers should really not be very difficult.
Multipart/...
Entities with a Content-Type of "multipart/..." must define a boundary string which is used to separate the different parts of the multipart entity. It can exists of whatever characters but it must be unique in the message. MIME entities are always surrounded by these boundary strings with a precedent "--". So for two parts there are three boundaries, one before the first part, one after the second and one between them.
...
Content-Type: multipart/mixed;
boundary="-----=_NextPart_000_001A_123849.12A98DE
...
-------=_NextPart_000_001A_123849.12A98DE
Content-Type: multipart/alternative;
boundary="-----=_NextPart_001_001B_159832.124F987
-------=_NextPart_001_001B_159832.124F987
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
This is the plain text content of the message.
-------=_NextPart_001_001B_159832.124F987
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html><body>
<p>This is the html text content of the message.<p>
</body></html>
-------=_NextPart_001_001B_159832.124F987
-------=_NextPart_000_001A_123849.12A98DE
Content-Type: image/gif;
name="image.gif"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="image.gif"
Nt2YcPes8V1lfZc7GvoeAMEBrhps15YA9K6EABRe8VivewAIvwIAbO08iC/n4raleCZIXtuFi7wB
Aj8QFwx+8JXjSN8K0DB5JT5ecpk/QOK15bgleD7rjOtcOTBPOQCE7geaD3zoO1f6AAjw8AsIAADL
HQPSmc4anN/26QAo7wkKAAACVP2xHbc2ym8bAAB0HdXlDTvIQxB1rDNH67slLG99LnconL3ub996
/3fRq371rG+9618P+9jLfva0r73tb4/73Ot+97zvve9/D/zgC3/4xC++8Y+P/ORDPAEAOw==
-------=_NextPart_000_001A_123849.12A98DE Content-Transfer-Encoding
quoted-printable
Quoted-printable is an encoding using printable characters. Characters other than alphanumerics are encoded using the "=" and a hexadecimal double figure which represents the character's numeric value. For more details have a look at Quoted-printable on Wikipedia. For decoding a quoted-printable encoded text we only have to search the text for =xx where xx is a hexadecimal number and replace it with the appropriate character:
//...
Regex hexRegex = new Regex(@"(\=([0-9A-F][0-9A-F]))", RegexOptions.IgnoreCase);
content = hexRegex.Replace(content, new MatchEvaluator(HexMatchEvaluator));
//...
private static string HexMatchEvaluator(Match m)
{
int dec = Convert.ToInt32(m.Groups[2].Value, 16);
char character = Convert.ToChar(dec);
return character.ToString();
}
base64
Base64 encoding is mostly used for attachments. For more details on how it works, read the Base64 article on Wikipedia. I'll just show you the code to decode a base64 encoded string. The easiest way may be the following:
public static T Base64Deserialize<T>(string s)
{
using (MemoryStream ms = new MemoryStream(Convert.FromBase64String(s)))
{
return (T)new BinaryFormatter().Deserialize(ms);
}
}
Parsing MIME messages
These are the basics you have to know about MIME messages. Now we could begin to parse the messages and bring them in a usable format - and this will be the topic of my next post. Just have a little patience. :-)
2bb166da-a2ff-4034-bac3-d4e84e041744|3|5.0