这是此问题的后续问题:
Python DictReader - 跳过缺少列的行?
事实证明我很愚蠢,并且使用了错误的 ID 字段。
顺便说一句,我在这里使用的是 Python 3.x。
我有一个员工字典,由字符串“directory_id”索引。每个值都是一个包含员工属性(电话号码、姓氏等)的嵌套字典。其中一个值是辅助 ID,称为“internal_id”,另一个值是其经理,称为“manager_internal_id”。 “internal_id”字段是非强制字段,并非每个员工都有一个字段。
{'6443410501': {'manager_internal_id': '989634', 'givenName': 'Mary', 'phoneNumber': '+65 3434 3434', 'sn': 'Jones', 'internal_id': '434214'}
'8117062158': {'manager_internal_id': '180682', 'givenName': 'John', 'phoneNumber': '+65 3434 3434', 'sn': 'Ashmore', 'internal_id': ''}
'9227629067': {'manager_internal_id': '347394', 'givenName': 'Wright', 'phoneNumber': '+65 3434 3434', 'sn': 'Earl', 'internal_id': '257839'}
'1724696976': {'manager_internal_id': '907239', 'givenName': 'Jane', 'phoneNumber': '+65 3434 3434', 'sn': 'Bronte', 'internal_id': '629067'}
}
(我对这些字段进行了一些简化,既是为了使其更易于阅读,也是出于隐私/合规性原因)。
这里的问题是,我们通过每个员工的 Directory_id 为他们建立索引(键),但是当我们查找他们的经理时,我们需要通过他们的“internal_id”来查找经理。
之前,当我们的字典使用internal_id作为键时,employee.keys()是一个internal_id列表,我对此使用了成员资格检查。现在,我的 if 语句的最后一部分将不起作用,因为internal_ids 是字典值的一部分,而不是键本身。
def lookup_supervisor(manager_internal_id, employees):
if manager_internal_id is not None and manager_internal_id != "" and manager_internal_id in employees.keys():
return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn'])
else:
return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found')
所以第一个问题是,如何修复 if 语句来检查 manager_internal_id 是否存在于字典的internal_ids 列表中?
我尝试用employee.values() 替换employee.keys(),但没有成功。另外,我希望有一些更有效的方法,不确定是否有办法获取值的子集,特别是员工[directory_id]['internal_id']的所有条目。
希望有一些 Python 风格的方法可以做到这一点,而无需使用大量嵌套的 for/if 循环。
我的第二个问题是,如何干净地返回所需的员工属性(邮件、名字、姓氏等)。我的 for 循环遍历每个员工,并调用lookup_supervisor。我在这里感觉有点愚蠢/难住了。
def tidy_data(employees):
for directory_id, data in employees.items():
# We really shouldnt' be passing employees back and forth like this - hmm, classes?
data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = lookup_supervisor(data['manager_internal_id'], employees)
我应该重新设计我的数据结构吗?或者还有别的办法吗?
EDIT:我稍微调整了代码,如下所示:
class Employees:
def import_gd_dump(self, input_file="test.csv"):
gd_extract = csv.DictReader(open(input_file), dialect='excel')
self.employees = {row['directory_id']:row for row in gd_extract}
def write_gd_formatted(self, output_file="gd_formatted.csv"):
gd_output_fieldnames = ('internal_id', 'mail', 'givenName', 'sn', 'dbcostcenter', 'directory_id', 'manager_internal_id', 'PHFull', 'PHFull_message', 'SupervisorEmail', 'SupervisorFirstName', 'SupervisorSurname')
try:
gd_formatted = csv.DictWriter(open(output_file, 'w', newline=''), fieldnames=gd_output_fieldnames, extrasaction='ignore', dialect='excel')
except IOError:
print('Unable to open file, IO error (Is it locked?)')
sys.exit(1)
headers = {n:n for n in gd_output_fieldnames}
gd_formatted.writerow(headers)
for internal_id, data in self.employees.items():
gd_formatted.writerow(data)
def tidy_data(self):
for directory_id, data in self.employees.items():
data['PHFull'], data['PHFull_message'] = self.clean_phone_number(data['telephoneNumber'])
data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = self.lookup_supervisor(data['manager_internal_id'])
def clean_phone_number(self, original_telephone_number):
standard_format = re.compile(r'^\+(?P<intl_prefix>\d{2})\((?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})')
extra_zero = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})')
missing_hyphen = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<local_second_half>\d{4})')
if standard_format.search(original_telephone_number):
result = standard_format.search(original_telephone_number)
return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), ''
elif extra_zero.search(original_telephone_number):
result = extra_zero.search(original_telephone_number)
return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Extra zero in area code - ask user to remediate. '
elif missing_hyphen.search(original_telephone_number):
result = missing_hyphen.search(original_telephone_number)
return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Missing hyphen in local component - ask user to remediate. '
else:
return '', "Number didn't match format. Original text is: " + original_telephone_number
def lookup_supervisor(self, manager_internal_id):
if manager_internal_id is not None and manager_internal_id != "":# and manager_internal_id in self.employees.values():
return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn'])
else:
return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found')
if __name__ == '__main__':
our_employees = Employees()
our_employees.import_gd_dump('test.csv')
our_employees.tidy_data()
our_employees.write_gd_formatted()
我猜(1)。我正在寻找一种更好的方法来构造/存储 Employee/Employees,并且(2)我特别遇到了lookup_supervisor() 的问题。\
我应该创建一个 Employee 类,并将其嵌套在Employees 中吗?
我是否应该用 tidy_data() 做我正在做的事情,并在 dict 项目的 for 循环上调用 clean_phone_number() 和 Lookup_supervisor() ?呃。confused.