从电子邮件文本中解析“发件人”地址

2024-03-10

我正在尝试从电子邮件的纯文本记录中提取电子邮件地址。 我拼凑了一些代码来查找地址本身,但我不知道如何让它区分它们;现在它只是输出文件中的所有电子邮件地址。我想让它只吐出以“发件人:”和一些通配符开头,并以“>”结尾的地址(因为电子邮件设置为“发件人[名称]” )。

现在是代码:

import re #allows program to use regular expressions
foundemail = []
#this is an empty list

mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
 #do not currently know exact meaning of this expression but assuming
 #it means something like "[stuff]@[stuff][stuff1-4 letters]"

        # "line" is a variable is set to a single line read from the file
# ("text.txt"):
for line in open("text.txt"):

    foundemail.extend(mailsrch.findall(line))

    # this extends the previously named list via the "mailsrch" variable
      #which was named before

print foundemail

试试这个:

>>> from email.utils import parseaddr

>>> parseaddr('From: [email protected] /cdn-cgi/l/email-protection')
('', '[email protected] /cdn-cgi/l/email-protection')

>>> parseaddr('From: Van Gale <[email protected] /cdn-cgi/l/email-protection>')
('Van Gale', '[email protected] /cdn-cgi/l/email-protection')

>>> parseaddr('    From: Van Gale <[email protected] /cdn-cgi/l/email-protection>   ')
('Van Gale', '[email protected] /cdn-cgi/l/email-protection')

>>> parseaddr('blah abdf    From: Van Gale <[email protected] /cdn-cgi/l/email-protection>   and this')
('Van Gale', '[email protected] /cdn-cgi/l/email-protection')

不幸的是,它只找到每行中的第一封电子邮件,因为它需要标题行,但也许这样可以?

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

从电子邮件文本中解析“发件人”地址 的相关文章

随机推荐