Generating Gmail addresses with steganography

Alternate title: Oh the things you have time for on sabbatical

Many, many, forms that accept e-mail addresses do not comply with the RFC 3696 specification for e-mail addresses:

Without quotes, local-parts may consist of any combination of alphabetic characters, digits, or any of the special characters

 ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~

period (".") may also appear, but may not be used to start or end the local part, nor may two or more consecutive periods appear. Stated differently, any ASCII graphic (printing) character other than the at-sign ("@"), backslash, double quote, comma, or square brackets may appear without quoting

This commonly manifests itself when trying to take advantage of Gmail's task specific e-mail addresses using the '+' sign; the example below show's bayareafastrak.org's non-compliance.  

A perfectly valid e-mail address rejected by bayareafastrak.org

Using '+' unlocks virtually unlimited Gmail addresses so each service can have an individual address to track down leaks, etc. Fortunately, Gmail offers another way to create a finite set of e-mail addresses - dots don't matter in Gmail addresses.

Note the RFC verbiage on using dots/periods - the address cannot start, end, or contain repeated dots. So, a single period can go between any two characters meaning that if the "local" (username) part of a Gmail address has 12 characters, there are 211 or 2048 unique e-mail addresses that can be created by leveraging how Gmail handles periods.

The issue becomes handling these finite e-mail addresses. How do we make sure every e-mail address is unique? One option is to start sticking periods in the first available slot and checking a password manager to make sure that the addresses are unique.

Using Bitwarden to prevent e-mail address collisions

Manually inserting periods works just fine, but requires a password manager to map from a given e-mail address to a service. The '+' method makes the mapping very easy - bsilvereagle+fastrak is clearly used for Fastrak but b.silve.ereagle has no human readable element. Fortunately, using some steganography we can embed human understandable elements into the e-mail.

The plan is to limit the information to lowercase letters and attempt to embed meaningful 2-3 character representations into the e-mail address. For example, "Hacker News" may be "hn", "Reddit" may be "r", etc.

It takes 5 bits to represent the 26 lowercase letters (log2(26) ≈ 4.7) which can be done by mapping them to their respective digits on a 0-index basis, i.e. 'a' is '0', 'b' is '1', etc.

Next, we can convert the digits to a binary representation and then stuff them into the e-mail address. "hn" becomes "00111 01101" which can be represented in period form with "--...-..-.".

After that, the period/binary string can be slotted into the e-mail address, as shown below:

Note that the amount of encodable information is limited by the length of the original e-mail address. "bsilvereagle" can only fit 11 bits of period-encoded information and is limited to two characters, or 262 (276) unique combinations. Many of those combinations will not be meaningful ("zj", "xq", "xv", etc) so the useful combination range is significantly lower. An e-mail address that has 15 slots (at least 16 characters) opens up to 263 (17576) combinations.

Fortunately, the script at the end of this post can automate the encoding and decoding of the period e-mails.

This steganographic method does not really improve the usability of a password manager handling the mapping of e-mails to services and is largely impractical.


Was this an interesting read? Say thanks and help keep this site ad & analytic free by using my Amazon Affilliate URL. I'll receive a small portion of any purchases made within 24 hours.


from itertools import chain, zip_longest


def alternate(list1, list2):
    # Merges two lists in an alternate fashion
    # ['A','B','C'], ['a','b'] -> ['A','a','B','b','C']
    return [x for x in chain.from_iterable(zip_longest(list1, list2)) if x]


def int2binlist(integer):
    # Converts an integer 5 into a list of binary representation padded to 5 spaces [0, 0, 1, 0, 1]
    if (integer >= 31) or (integer < 0):
        return [1, 1, 1, 1, 1]

    binary = [int(x) for x in list('{0:0b}'.format(integer))]
    return [0]*(5 - len(binary)) + binary


def encode(email, identifier, width=5):
    payload = "abcdefghijklmnopqrstuvwxyz"
    local, domain = email.split('@')
    # Determine the number of identfiers that can be encoded using one character per 5 available "slots"
    identifier_count = (len(local) - 1) // width
    # Look up identifier in payload, get binary representation, then convert to "." binary encoding
    encoding = ['.' if x else '' for x in chain(*map(int2binlist, map(payload.index, identifier[:identifier_count])))]
    return ''.join(alternate(local, encoding)) + '@' + domain


def decode(email, width=5):
    payload = "abcdefghijklmnopqrstuvwxyz"
    local, domain = email.split('@')
    encoded = []
    local_iter = iter(local[1:-1])  # Skip first and last character
    for c in local_iter:
        if c == '.':
            encoded.append(1)
            next(local_iter)  # We've recorded something for this slot, move to the next one
        else:
            encoded.append(0)
    identifier_count = len(encoded) // width
    result = ''
    for i in range(identifier_count):
        result += payload[int(''.join([str(i) for i in encoded[i*width:(i+1)*width]]), 2)]

    return result
print(encode("aaaaaaaaaaa@gmail.com", "secretkey"))
a.aaa.aaaa.aaa@gmail.com