How to backup Googlemail in Linux

I’m uncomfortable with several year’s worth of my email sitting only on Google’s servers, especially when they have lost data before. If anything every goes wrong, I want to be able to restore all my data or move providers.

I decided to implement a pretty simple backup routine that would download any newly received mail each day, then enter my normal rsync backups I’ve previously blogged about.

First up, I’m using getmail to do all the work, you can install it on Ubuntu as follows:

sudo apt-get install getmail4

Next, decide where you’re doing to be downloading the mail to and create the top level folder:

mkdir /path/to/backup/user@domain.com
cd /path/to/backup/user@domain.com

Getmail needs 3 folders created (it won’t create them for you), so let’s create them next

mkdir cur new tmp

We’re going to be using IMAP to connect over SSL; it gives a little more flexibility if you’re only interested in some Labelled mails. To do this you need to enable it in your Gmail account. Head on over to your Gmail Settings and go into the “Forwarding and POP/IMAP”. Select “Enable IMAP” and leave the other settings as default.

Now we need to configure getmail, so create a config file:

nano getmail.config

Now enter the following configuration:

[retriever]
username = user@gmail.com
password = mysecret
 
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
mailboxes = ("[Google Mail]/All Mail",)
 
[destination]
type = Maildir
path = /path/to/backup/user@domain.com/
 
[options]
# print messages about each action (verbose = 2)
# Other options:
# 0 prints only warnings and errors
# 1 prints messages about retrieving and deleting messages only
verbose = 2
message_log = /path/to/backup/user@domain.com/message_log.log 
delivered_to = false
received = false
read_all = false

The username and password obviously need updating along with the path and message_log. The mailboxes setting tells getmail what to download; depending on your account “all mail” will either be [Google Mail]/All Mail or [Gmail]/All Mail. I’m not totally sure on the latter, but mine if the first.

You can, if you want, comma separate a list of labels here, no need for a prefix. Mails labelled as “Family” would be just that; “Family”.

An explanation of the last three settings are: –

  • delivered_to = if set, getmail adds a Delivered-To: header field to the message. If unset, it will not do so. Default: True. Note that this field will contain the envelope recipient of the message if the retriever in use is a multidrop retriever; otherwise it will contain the string “unknown”.
  • received = if set, getmail adds a Received: header field to the message. If unset, it will not do so. Default: True.
  • read_all =  if set, getmail retrieves all available messages. If unset, getmail only retrieves messages it has not seen before. Default: True.

This will perform an incremental download of newly arrived mail. To test it, run the following, you’ll see full debugging because of our verbose setting in getmail.config. Once it’s all working, you can drop this logging down to 0.

getmail -g "/path/to/backup/user@domain.com" -q -r getmail.config

Next just throw it into a CRON task to keep on top of it; I chose daily at 4am.

crontab -e

then add the following line

# Backup Gmail every day at 4am
0 4 * * * getmail -g "/path/to/backup/user@domain.com" -q -r getmail.config

That’s it, done. You can see all the mail downloaded in the “new” folder and can access it through multiple email clients that support Maildir.

Categorized: Geeky
Tagged:

Leave a Reply

Your email address will not be published. Required fields are marked *