Parsing MIME e-mail messages in Rust

Today we released mail-parser, an e-mail parsing library written in Rust that fully conforms to the Internet Message Format standard (RFC 5322), the Multipurpose Internet Mail Extensions (MIME; RFC 2045–2049) as well as other internet messaging RFCs.

It also supports decoding messages in 41 different character sets including obsolete formats such as UTF-7. All Unicode (UTF-*) and single-byte character sets are handled internally by the library while support for legacy multi-byte encodings of Chinese and Japanese languages such as BIG5 or ISO-2022-JP is provided by the optional dependency encoding_rs.

In general, this library abides by the Postel’s law or Robustness Principle which states that an implementation must be conservative in its sending behavior and liberal in its receiving behavior. This means that mail-parser will make a best effort to parse non-conformant e-mail messages as long as these do not deviate too much from the standard.

Unlike other e-mail parsing libraries that return nested representations of the different MIME parts in a message, this library conforms to RFC 8621, Section 4.1.4 and provides a more human-friendly representation of the message contents consisting of just text body parts, html body parts and attachments. Additionally, conversion to/from HTML and plain text inline body parts is done automatically when the alternative version is missing.

Performance and memory safety were two important factors while designing mail-parser:

  • Zero-copy: Practically all strings returned by this library are Cow<str> references to the input raw message.
  • High performance Base64 decoding based on Chromium’s decoder (the fastest non-SIMD decoder).
  • Fast parsing of message header fields, character set names and HTML entities using perfect hashing.
  • Written in 100% safe Rust with no external dependencies.
  • Every function in the library has been fuzzed and meticulously tested with MIRI.
  • Thoroughly battle-tested with millions of real-world e-mail messages dating from 1995 until today.

The library conforms to all internet messaging RFCs:

And supports 41 different character set encodings:

  • UTF-8
  • UTF-16, UTF-16BE, UTF-16LE
  • UTF-7
  • US-ASCII
  • ISO-8859–1
  • ISO-8859–2
  • ISO-8859–3
  • ISO-8859–4
  • ISO-8859–5
  • ISO-8859–6
  • ISO-8859–7
  • ISO-8859–8
  • ISO-8859–9
  • ISO-8859–10
  • ISO-8859–13
  • ISO-8859–14
  • ISO-8859–15
  • ISO-8859–16
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1255
  • CP1256
  • CP1257
  • CP1258
  • KOI8-R
  • KOI8_U
  • MACINTOSH
  • IBM850
  • TIS-620
  • SHIFT_JIS
  • BIG5
  • EUC-JP
  • EUC-KR
  • GB18030
  • GBK
  • ISO-2022-JP
  • WINDOWS-874
  • IBM-866

Using the library is straightforward:

The mail-parser library is available on Crates.io (https://crates.io/crates/mail-parser) and the documentation at https://docs.rs/mail-parser/.

Please follow us on Twitter for updates!

--

--

We develop distributed apps in Rust.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store