LOAF is a simple extension to email that lets you append your entire address book to outgoing mail message without compromising your privacy. Correspondents can use this information to prioritize their mail, and learn more about their social networks. The LOAF home page is at http://loaf.cantbedone.org.
LOAF creates and maintains a database of all your correspondents, defined as people to whom you have sent email at least once. Every time you send an email message, LOAF appends this information to the email message, using a format described further below. LOAF-enabled correspondents collect and store this information in their own local databases.
When you receive an email from an address you have not previously written to, LOAF checks to see if the email address is known to any of your existing correspondents. This essentially sorts incoming email into three categories:
Mail in category (2) can be further classified by counting how many correspondents you and the sender have in common. If the originating email appears in the address books of several of your correspondents, this may indicate a person with whom you have many connections. Insert standard social network theory here.
Let's say Alice sends email to Bob about an exciting business opportunity in Nigerian bridge futures. Bob has never received email from Alice before, and is very skeptical about messages from unknown sources.
Bob takes Alice's email address and checks it against a database he maintains on his machine. This database contains an entry for each of Bob's correspondents, with a specially hashed list of their own corresponents. Bob checks Alice's email address against each of his correspondents' lists. If any of them have written Alice before, the search will result in a hit.
Let's say Bob finds Alice in a correspondent's list, and decides to read her email message. He is intrigued by the business opportunity and writes back to Alice asking for more information.
Because Bob has now written to Alice, all future emails Bob writes will list Alice as one of his correspondents. If Bob now sends a message to his old friend Carol, Carol will update her database entry for Bob with the LOAF information she finds in his email, wihch now includes the reference to Alice. When Alice writes to Carol a week later, Carol will see that Alice has previously interacted with Bob (and perhaps some other people in Carol's list of correspondents).
LOAF uses a special data structure called a Bloom filter, described below. To the unaided eye, the filter looks like a string of characters. It contains a specially hashed representation of the sender's correspondents list, a few thousand bytes in length.
It is computationally trivial to check whether a given address appears in the filter, but reconstructing the list of email address in the filter requires a brute-force attack. It is possible to infer the rough size of the address list from the proportion of 'on' bits in the filter, but this will give an inexact estimate.
Outgoing email is munged by an outgoing mail filter to include the LOAF filter, in the form of a MIME attachment. Every address on an outgoing message is automatically added to the database of correspondents, unless the user expressly flags the message.
Incoming email is analyzed for the attachment, and messages that fall into category (2) above are marked with the special flashing LOAF banner.
Every time an email arrives from a known correspondent, the corresponding entry in the local database is updated with the latest LOAF filter.
A Bloom filter is compact way of testing to see whether you have seen something, requiring less memory than a full list of the items you have seen. The tradeoff is a risk of false positives (Bloom filters never give a false negative) that is a function of the length of the filter and the number of items stored in it. The smaller the filter, and the more items it contains, the greater the risk that it will give a false positive.
The filter consists of two parts, a series of bits and a set of hashing functions. To add a string to a Bloom filter of length m, you run it through each one of the hashing functions, each with range (m), and turn on whatever bit of the filter corresponds to the output of the hashing function.
To see if a string has been seen by the filter, you run it through the same set of hashing functions and XOR the on bits in output with the corresponding bits in the filter. If the XOR gives a non-zero result, you are guaranteed that the string has not been seen before. If the result is zero, you can be certain with a known probability that the string is contained in the filter.
The false positive rate for Bloom filters is determined by the number of hashing functions, the size of the filter, and the number of entries in the filter, given by the approximate formula:
( 1 - e^(kn/m) )^k
Where k is the number of hashing functions, n is the number of keys in the filter, and m is the length of the filter in bits.
For the detailed math, see http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html#SECTION00053000000000000000
- Don't use LOAF.
- Create a blacklist of addresses for LOAF to pass over when generating a filter.
- Set a false positive rate high enough to give you plausible deniability: ``Oh, honey, don't be ridiculous. I certainly never wrote to X, that must be a false positive'' will work, but you must be sure to read the caveat about keeping a constant filter size in Dictionary attack below.
Sending an overloaded filter does not help you get accepted by new correspondents, but once you are added to their list, it will make you appear to know everyone. One possible solution to this spoofing problem is to impose a maximum density.
This implies that the truly paranoid should use a presized filter large enough to contain as many correspondents as they ever expect to have on record, and an invariant set of salts. Under those conditions, collecting multiple filters will not change the false positive rate. A mostly empty large filter might have an unacceptably low false positive rate, so you would want to pad the list of real emails out with random data, to maintain a constant ratio of on/off bits as well.
The tradeoff with a high false positive rate is that the filter will be less useful to legitimate recipients. An intriguing possibility is that of sending out very inaccurate filters that are updated on a regular basis (for example weekly) so that a user has to accumulate a certain number of the filters in order to run queries with a good degree of certitude. This spreads private information over several filters and ensures that an eavesdropper who intercepts only one file will find it of very limited value.
Of course, the truly paranoid would be crazy to use LOAF.
For example, let's say Alice has a copy of Bob's filter. Charlie also has Bob's filter, and he takes it and replaces Bob's name with his own, hashing in some of his own email addresses so the filters are not completely identical.
Let's say in the normal course of things Alice adds Charlie as a correspondent. Now if Alice does some social network analysis of her stored bloom filters, she will see that Charlie and Bob are very closely related, sharing almost all correspondents.
It's unclear what the goal of such an attack would be, but there are conceivable scenarios where it might be beneficial to claim close similarity to another user. The defense against this attack is to include the email of the filter owner as an input in every hashing operation, so that changing the email address of the filter owner will effectively scramble the filter contents.
Standard salts would allow people to pool their LOAF information through bitwise OR (match if there are any hits) or bitwise AND (match only on a hit against everyone).
What are the security implications of standard salts?
(c) 2004 Joshua Schachter, Maciej Ceglowski <maciej @ceglowski.com>