I’m uncomfortable with several year’s worth of my email sitting only on Google’s servers, especially when they have lost data before. If anything every goes wrong, I want to be able to restore all my data or move providers.
I decided to implement a pretty simple backup routine that would download any newly received mail each day, then enter my normal rsync backups I’ve previously blogged about.
First up, I’m using getmail to do all the work, you can install it on Ubuntu as follows:
sudo apt-get install getmail4
Next, decide where you’re doing to be downloading the mail to and create the top level folder:
mkdir /email@example.com cd /firstname.lastname@example.org
Getmail needs 3 folders created (it won’t create them for you), so let’s create them next
mkdir cur new tmp
We’re going to be using IMAP to connect over SSL; it gives a little more flexibility if you’re only interested in some Labelled mails. To do this you need to enable it in your Gmail account. Head on over to your Gmail Settings and go into the “Forwarding and POP/IMAP”. Select “Enable IMAP” and leave the other settings as default.
Now we need to configure getmail, so create a config file:
Now enter the following configuration:
[retriever] username = email@example.com password = mysecret type = SimpleIMAPSSLRetriever server = imap.gmail.com mailboxes = ("[Google Mail]/All Mail",) [destination] type = Maildir path = /firstname.lastname@example.org/ [options] # print messages about each action (verbose = 2) # Other options: # 0 prints only warnings and errors # 1 prints messages about retrieving and deleting messages only verbose = 2 message_log = /email@example.com/message_log.log delivered_to = false received = false read_all = false
The username and password obviously need updating along with the path and message_log. The mailboxes setting tells getmail what to download; depending on your account “all mail” will either be [Google Mail]/All Mail or [Gmail]/All Mail. I’m not totally sure on the latter, but mine if the first.
You can, if you want, comma separate a list of labels here, no need for a prefix. Mails labelled as “Family” would be just that; “Family”.
An explanation of the last three settings are: –
- delivered_to = if set, getmail adds a Delivered-To: header field to the message. If unset, it will not do so. Default: True. Note that this field will contain the envelope recipient of the message if the retriever in use is a multidrop retriever; otherwise it will contain the string “unknown”.
- received = if set, getmail adds a Received: header field to the message. If unset, it will not do so. Default: True.
- read_all = if set, getmail retrieves all available messages. If unset, getmail only retrieves messages it has not seen before. Default: True.
This will perform an incremental download of newly arrived mail. To test it, run the following, you’ll see full debugging because of our verbose setting in getmail.config. Once it’s all working, you can drop this logging down to 0.
getmail -g "/firstname.lastname@example.org" -q -r getmail.config
Next just throw it into a CRON task to keep on top of it; I chose daily at 4am.
then add the following line
# Backup Gmail every day at 4am 0 4 * * * getmail -g "/email@example.com" -q -r getmail.config
That’s it, done. You can see all the mail downloaded in the “new” folder and can access it through multiple email clients that support Maildir.