Monday, 15 September 2014

Scraping multiple emails from gmail

I have a gmail address with a lot of data that multiple people have sent me with similar subjects:

(for example: they all start with the string "123")

Each e-mail contains a table that looks like this

#, user, arbitrary number

    user1 5
    user2 3
    user3 4 etc.

How would I create a file that filters for all of these messages and then proceeds to take the information from the table from each of these e-mails?

Would a mail client make it easier? What sort of technology/ coding language should I use?

I'm not really sure what to look into or how to start this.

2 Answers

You can use modules of Perl for this task. Look at How can I read messages in a Gmail account from Perl? to know how to read messages from gmail account through POP client. Once you have read them, the messages can be easily processed with the regular expressions in perl. Ex:

if ($msg_subject =~ /^123.*/s) {
     # Add your logic for such mails
}

This is a really broad question: "How to get emails from Gmail?", "How to filter the emails?" and "How to parse structured data from an email?"

How to get emails from Gmail?

You can fetch emails from Gmail using the IMAP protocol. This can be done using the imaplib standard library.

Another StackOverflow user gave a snippet that does that (uses imaplib to fetch mails from a Gmail account) : How can I download emails from Gmail?

How to filter my mails?

You can easily filter your mails (for instance those starting with '123') by doing something like the following:

emails = get_emails()
filtered_emails = [email for email in emails if email.subject.startswith('123')]

How to parse data from my mails?


You know that each line of your mail have this format: ID USERNAME SOME_NUMBER so you only have to split each line using (space) as a delimiter.

for line in email:
    row = line.split(' ')
    id, username, number= row[0], row[1], row[2]
    # Do something with that info

Source:http://stackoverflow.com/questions/14434170/scraping-multiple-emails-from-gmail

No comments:

Post a Comment