This post was motivated by my frustrations trying to set up a working, public-facing email server. The sheer number of protocols, moving parts, and obscure error messages involved here is quite something, and probably an order of magnitude more complex than, say, a web server. I’m attempting to teach myself as much as anyone else, so a lot of this post is lifted from various Wikipedia articles.

What I’m aiming to explain in this post is what all of these moving parts do, the problems they solve, and how they talk to each other. I’m teaching myself as I go, so if any of the details are incorrect I can only apologise and ask that you send in a correction.

Now, let’s get into it. Suppose you want to send an email to someone. How does that message make its way from your server to their server? How can they be sure that email came from you? How is that email encrypted in transit, and decrypted on the recipient’s server? When will I stop asking questions and start writing down the answers?

SMTP

Simple Mail Transfer Protocol (hereafter referred to as SMTP) is an application layer protocol used for email transmission only. It can be used in several parts of the email lifecycle, for example, when a sender’s mail client submits an email to their own server, or when the sender’s server submits that email to the recipient’s server. Note that it’s also possible for the sender’s client to connect directly to the recipient’s server - see the section below on Mail User Agents (MUAs) for more information.

SMTP is text-based and connection-oriented. In SMTP parlance, the client is the initiating agent, i.e. the server submitting an email, and the server is the recipient of the email. Once a client has connected to a server, a session is opened and two things happen: session parameters are exchanged, and some number of SMTP transactions are performed. Clients must typically authenticate themselves with a username and password.

As an aside, there are some small differences between how mail clients talk to an SMTP server, and how mail servers talk to each other: a mail client will generally use port 25, but servers talking to each other will normally use either port 587 or port 465. The reason that choice exists is because port 465 was briefly registered as an “smtps” port, i.e. SMTP with TLS. This usage was deprecated in RFC2487, only to be un-deprecated in RFC8314.

Here’s an example SMTP session, using S: for messages the server sends, and C: for messages sent by the client. This is ripped shamelessly from Wikipedia.

S: 220 smtp.example.com ESMTP Postfix
C: HELO relay.example.com
S: 250 smtp.example.com, I am glad to meet you
C: MAIL FROM:<bob@example.com>
S: 250 Ok
C: RCPT TO: <alice@example.com>
S: 250 Ok
C: RCPT TO: <theboss@example.com>
S: 250 Ok
C: DATA
S: 354 End data with <CR><LF>.<CR><LF>
C: From: "Bob Example" <bob@example.com>
C: To: Alice Example <alice@example.com>
C: Cc: theboss@example.com
C: Date: Sun, 25 Aug 2019 17:09:42 +1000
C:
C: Hello Alice,
C:
C: This is a test message with five header fields and some text in the message
C: body.
C:
C: Your friend,
C: Bob
C: .
S: 250 Ok: queued as 12345
C: QUIT
S: 221 Bye

This protocol is fairly straightforward. The only subtlety here is that if an email contains a line with just a dot, then the client will instead send a line with two dots to avoid terminating the body of the email too early.

ESMTP

There are a number of problems with the original version of SMTP as outlined above:

  1. No authentication
  2. Plaintext data transmission
  3. Only textual data can be transferred
  4. No UTF-8 support

An update to the SMTP specification called “Extended SMTP” (hereafter referred to as “ESMTP”) aims to address these problems. It does this using several mechanisms, the first of which is called “SMTP Authentication”.

SMTP authentication aims to solve the problem of “open relays” in the original rendition of SMTP, mail relays had no mechanisms for determining who was trying to send an email (which quickly became a spam problem). This extension, as with all other SMTP extensions, is advertised in the HELO response. Note that to advertise compatibility with SMTP, the client sends “EHLO” instead of “HELO”.

SMTP authentication is frequently used together with STARTTLS, which is a command sent from the server to the client indicating that a TLS negotiation should proceed. From that point on, the connection can be considered secure, however this method is prone to a denial of service attack: the STARTTLS command is transmitted in plaintext, and could be dropped by a malicious intermediary. There are workarounds for this, including configuring SMTP clients to require TLS for outgoing connections. There is also work ongoing to draft a protocol that relies on the certificate authority system.

Another keyword involved in the ESMTP connection negotiation is 8BITMIME, which allows for 8-bit data transmission, rather than the 7-bit ASCII permitted in vanilla SMTP. This is a prerequisite for the SMTPUTF8 keyword, which indicates that UTF-8 is supported in mailbox names and header fields. Tying all of these extensions together, here’s a worked example (again, shamelessly ripped from Wikipedia):

S: 220 smtp.example.com ESMTP Server
C: EHLO client.example.com
S: 250-smtp.example.com Hello client.example.com
S: 250-SIZE 35882577
S: 250-8BITMIME
S: 250-AUTH GSSAPI DIGEST-MD5
S: 250-ENHANCEDSTATUSCODES
S: 250 SMTPUTF8
S: 250 STARTTLS
C: STARTTLS
S: 220 Ready to start TLS
    ... TLS negotiation proceeds.
     Further commands protected by TLS layer ...
C: EHLO client.example.com
S: 250-smtp.example.com Hello client.example.com
S: 250 AUTH GSSAPI DIGEST-MD5 PLAIN
C: AUTH PLAIN dGVzdAB0ZXN0ADEyMzQ=
S: 235 2.7.0 Authentication successful

SPF

You might have noticed previously that when a client sends an email, the client is responsible for putting the From address in the envelope. So what stops a malicious client from claiming an email is from, say, a legitimate bank? One mechanism for stopping this is called “Sender Policy Framework”, hereafter referred to as “SPF”. To understand SPF, we need to take a quick foray into how servers look up IP addresses from hostnames.

DNS

“Domain Name System”, hereafter referred to as DNS, is a naming system for computers (or other resources) connected to the internet. It publicly lists mappings from a hostname to a set of IP addresses, so that when a computer needs to know the IP address (say) example.com, it can perform a DNS lookup on example.com.

You can interact with this system yourself by running nslookup:

$ nslookup example.com
Server:		192.168.1.1
Address:	192.168.1.1#53

Non-authoritative answer:
Name:	example.com
Address: 93.184.216.34
Name:	example.com
Address: 2606:2800:220:1:248:1893:25c8:1946

As you can see, we’ve asked a DNS server “what IP addresses are associated with example.com?” and got two replies: an IPv4 address of 93.184.216.34, and an IPv6 address of 2606:2800:220:1:248:1893:25c8:1946.

These records are sorted into “categories” - an IPv4 address lookup would look for an “A” record, and an IPv6 lookup would look for an “AAAA” record. Some other categories are “MX” records (for mail servers) and “TXT”, which lets domain owners associate arbitrary plaintext information with their domain.

… and back to SPF

Now that we know how DNS works, it’s fairly straightforward to use it to check whether an incoming email is attempting to forge it’s From field: we can attempt a TXT record lookup on where the email claims to be from, and have email servers maintain a record of allowed IP addresses that emails are allowed to come from.

You might ask, why should a server maintain a separate TXT record for this whitelist? Why can’t we just use the “A” and “AAAA” entries in a DNS record? The answer to that is because a mail server might be distinct from, say, a web server that needs to have correct “A” and “AAAA” entries. A mail server might also use a third-party service for relaying email, for example a service like SendGrid. TODO link

Let’s check the SPF record for a domain that sends a lot of email, like gmail.com:

$ nslookup -query=txt gmail.com
Server:		192.168.1.1
Address:	192.168.1.1#53

Non-authoritative answer:
gmail.com	text = "v=spf1 redirect=_spf.google.com"

OK, so record is using the redirect mechanism to point us at _spf.google.com. Let’s try a TXT record lookup on that domain:

$ nslookup -query=txt _spf.google.com
Server:		192.168.1.1
Address:	192.168.1.1#53

Non-authoritative answer:
_spf.google.com	text = "v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

So there are a few _include mechanisms there that should be fairly self-explanatory, and we’ll dive into them shortly. But what’s that ~all doing at the end of the record? There’s two things going here, the all and the tilde. The all will match all domains, and putting at the end makes it a fallback. So, we’re going to include those three netblocks entries, and then do something else for all addresses not covered there.

That “something else” depends on the qualifier in front of the record:

  • + is a PASS. In our example, this would let other servers allow any IP address to send emails as gmail.com, which clearly isn’t optimal.
  • ? is a NEUTRAL. This would be the same as leaving the all off entirely.
  • ~ is a SOFTFAIL. This tells servers that messages failing an SPF check for gmail.com should be accepted, but somehow tagged to differentiate them. In practice, this would put them in the user’s “Spam” folder, or something similar.
  • - is a FAIL, and would tell the server performing the SPF check for gmail.com to completely reject the incoming mail.

Now, as mentioned, let’s get back to that include directive - we’ll just do the one here.

$ nslookup -query=txt _netblocks.google.com
Server:		192.168.1.1
Address:	192.168.1.1#53

Non-authoritative answer:
_netblocks.google.com	text = "v=spf1 ip4:35.190.247.0/24 ip4:64.233.160.0/19 ip4:66.102.0.0/20 ip4:66.249.80.0/20 ip4:72.14.192.0/18 ip4:74.125.0.0/16 ip4:108.177.8.0/21 ip4:173.194.0.0/16 ip4:209.85.128.0/17 ip4:216.58.192.0/19 ip4:216.239.32.0/19 ~all"

OK, we’ve finally got some IP addresses. This entry happens to list allowed IPv4 addresses, but the entry for _netblocks2 contains IPv6 addresses.

DKIM

While SPF ensures that an incoming email is from a domain authorised to send it, there are still some missing guarantees - for example, how can I be sure that the contents of a message, or its attachments, have not been modified since it was sent?

DomainKeys Identified Mail, hereafter referred to as DKIM, aims to solve this problem (among others). It uses a similar DNS-based lookup for verifying the From field, but adds a few elements from public key cryptography for verifying email bodies (and other fields) - messages are signed with a DKIM-Signature header field, which looks like this (example shamelessly stolen from Wikipedia):

DKIM-Signature: v=1; a=rsa-sha256; d=example.net; s=brisbane;
     c=relaxed/simple; q=dns/txt; t=1117574938; x=1118006938;
     h=from:to:subject:date:keywords:keywords;
     bh=MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTI=;
     b=dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSbav+yuU4zGeeruD00lszZ
              VoG4ZHRNiYzR

I’m not going to dig too deep into the protocol here, but the two most important tags for our purposes are bh and b. bh is short for “body hash”, which is fairly self-explanatory. This hash is added to a second hash of various headers, which comes from the h tag.

This first hash (of the message body) is added verbatim as the bh field. Our second hash, which now hashes over both the body and headers, is encrypted using the sender’s private key, encoded in Base64, and appended as the b field.

When an email is received, the d field is queried with the selector s, using the method specified in q. For the example above, this is a DNS TXT record for brisbane._domainkey.example.net. (The _domainkey string is a fixed part of the specification.)

This will result in a record that looks something like this:

"v=DKIM1; h=sha256; k=rsa; t=s; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDmzRmJ
RQxLEuyYiyMg4suA2SyMwR5MGHpP9diNT1hRiwUd/mZp1ro7kIDTKS8ttkI6z6eTRW9e9dDOxzSxNuX
mume60Cjbu08gOyhPG3GfWdg7QkdN6kR4V75MFlw624VY35DaXBvnlTJTgRg/EW72O1DiYVThkyCgpS
YS8nmEQIDAQAB"

The p tag in this response refers to the sender’s public key, which can be used to validate the signature on the hash value in the header field, and check it against the hash value for the mail message that was received. If these two values match, this cryptographically proves that the mail was not tampered with in transit.

In contrast to SPF, DKIM does not allow the sender to specify what to do if the signature verification has failed - instead, the RFC for DKIM (RFC7001) specifies that the precise reasons why the authenticity of the message could not be proven should be made available downstream.

For this reason, DKIM is used alongside SPF despite DKIM’s stronger verification that an email came from who it claims to.

DMARC

Joining the club of DNS-based email authentication protocols is the catchily named “Domain-based Message Authentication, Reporting and Conformance” protocol, hereafter referred to as DMARC. Rather than protecting email servers from spam, however, DMARC is designed to protect domain owners by giving them the ability to prevent a domain from being used in business email compromise attacks, phishing emails, and other attacks.

The chief difference between DMARC and SPF/DKIM is that DMARC is not a purely technical measure - the DNS TXT record published is a list of policies for how to handle SPF or DKIM failures. Let’s take a look at an example to make this a bit more concrete:

$ nslookup -type=txt _dmarc.gmail.com
Server:		192.168.1.1
Address:	192.168.1.1#53

Non-authoritative answer:
_dmarc.gmail.com	text = "v=DMARC1; p=none; sp=quarantine; rua=mailto:mailauth-reports@google.com"

Let’s take a look at these tags:

  • v=DMARC1 is simply the version of DMARC to use
  • p=none is the policy, which will be discussed in more detail later
  • sp=quarantine is the subdomain policy, which will also be discussed later
  • rua is URI to send SPF and DKIM failure rate reports to

DMARC operates by checking that either SPF or DKIM passes on a given message, and taking the action specified in the DMARC DNS entry if it doesn’t. In the example above, the domain owner is monitoring SPF and DKIM failure reports, and does not expect emails to be sent from subdomains of gmail.com.

IMAP

So far we’ve only looked at the protocols involved in transferring mail between servers, but for most users this step is an implementation detail. All most people need to care about is whether they can read emails, and whether other people can read emails they send. To this effect, “Internet Message Access Protocol” (hereafter referred to as “IMAP”) is a protocol used by email clients to retrieve email messages from a mail server.

IMAP was designed to allow multiple mail clients to manage an email box on a server, so clients typically leave messages lying around on the server until the user deletes them. IMAP has broadly replaced POP, which won’t be covered here since it’s more or less obsolete.

IMAP is a connection-oriented protocol, with clients normally opening a single persistent TCP connection to the mail server. This connection is used by the server for pushing notifications of new email, and by the client for downloading messages, searching, and other common things users do with email.

IMAP connections can be cryptographically protected using TLS, which is referred to as IMAPS. Servers typically listen on port 993 for IMAPS, as opposed to port 143 for regular IMAP.

Here’s an example IMAP session, ripped straight from the IMAP RFC (RFC 3501):

C: <open connection>
S:   * OK IMAP4rev1 Service Ready
C:   a001 login mrc secret
S:   a001 OK LOGIN completed
C:   a002 select inbox
S:   * 18 EXISTS
S:   * FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
S:   * 2 RECENT
S:   * OK [UNSEEN 17] Message 17 is the first unseen message
S:   * OK [UIDVALIDITY 3857529045] UIDs valid
S:   a002 OK [READ-WRITE] SELECT completed
C:   a003 fetch 12 full
S:   * 12 FETCH (FLAGS (\Seen) INTERNALDATE "17-Jul-1996 02:44:25 -0700"
      RFC822.SIZE 4286 ENVELOPE ("Wed, 17 Jul 1996 02:23:25 -0700 (PDT)"
      "IMAP4rev1 WG mtg summary and minutes"
      (("Terry Gray" NIL "gray" "cac.washington.edu"))
      (("Terry Gray" NIL "gray" "cac.washington.edu"))
      (("Terry Gray" NIL "gray" "cac.washington.edu"))
      ((NIL NIL "imap" "cac.washington.edu"))
      ((NIL NIL "minutes" "CNRI.Reston.VA.US")
      ("John Klensin" NIL "KLENSIN" "MIT.EDU")) NIL NIL
      "<B27397-0100000@cac.washington.edu>")
      BODY ("TEXT" "PLAIN" ("CHARSET" "US-ASCII") NIL NIL "7BIT" 3028
      92))
S:   a003 OK FETCH completed
C:   a004 fetch 12 body[header]
S:   * 12 FETCH (BODY[HEADER] {342}
S:   Date: Wed, 17 Jul 1996 02:23:25 -0700 (PDT)
S:   From: Terry Gray <gray@cac.washington.edu>
S:   Subject: IMAP4rev1 WG mtg summary and minutes
S:   To: imap@cac.washington.edu
S:   cc: minutes@CNRI.Reston.VA.US, John Klensin <KLENSIN@MIT.EDU>
S:   Message-Id: <B27397-0100000@cac.washington.edu>
S:   MIME-Version: 1.0
S:   Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
S:
S:   )
S:   a004 OK FETCH completed
C    a005 store 12 +flags \deleted
S:   * 12 FETCH (FLAGS (\Seen \Deleted))
S:   a005 OK +FLAGS completed
C:   a006 logout
S:   * BYE IMAP4rev1 server terminating connection
S:   a006 OK LOGOUT completed

JMAP

IMAP as it stands has several issues: the protocol is stateful, which is unsuitable for mobile use where network connectivity may be intermittent. It can’t be used for sending messages, which means clients need to implement two protocols. It is quite limited in the actions a user can batch together, for example setting two different flags to two different messages is impossible.

To alleviate these issues, among others, work started in 2014 on a replacement protocol based on HTTPS and JSON. The new protocol, called “JSON Meta Application Protocol” (hereafter referred to as “JMAP”) is currently undergoing standardisation and is implemented in several mail servers and clients.

JMAP is a fundamentally different protocol to IMAP, representing emails in a standardised structure rather than just plaintext. It also elects not to represent attachments in Base64 for efficiency reasons, instead forcing the client to download attachments via a separate HTTPS connection.

Here’s an example from the documentation. First, the client fetches all mailboxes:

[[ "Mailbox/get", {
  "accountId": "u33084183",
  "ids": null
}, "0" ]]

And the server might respond with:

[[ "Mailbox/get", {
  "accountId": "u33084183",
  "state": "78540",
  "list": [{
    "id": "MB23cfa8094c0f41e6",
    "name": "Inbox",
    "parentId": null,
    "role": "inbox",
    "sortOrder": 10,
    "totalEmails": 16307,
    "unreadEmails": 13905,
    "totalThreads": 5833,
    "unreadThreads": 5128,
    "myRights": {
      "mayAddItems": true,
      "mayRename": false,
      "maySubmit": true,
      "mayDelete": false,
      "maySetKeywords": true,
      "mayRemoveItems": true,
      "mayCreateChild": true,
      "maySetSeen": true,
      "mayReadItems": true
    },
    "isSubscribed": true
  }, {
    "id": "MB674cc24095db49ce",
    "name": "Important mail",
    ...
  }, ... ],
  "notFound": []
}, "0" ]]