想象一下这样一个制表符分隔的文件:
9606 1 GO:0002576 TAS - platelet degranulation - Process
9606 1 GO:0003674 ND - molecular_function_z - Function
9606 1 GO:0003674 OOO - molecular_function_z - Function
9606 1 GO:0005576 IDA - extracellular region - Component
9606 1 GO:0005576 TAS - extracellular region - Component
9606 1 GO:0005576 OOO - extracellular region - Component
9606 1 GO:0005615 HDA - extracellular spaces - Component
9606 1 GO:0008150 ND - biological_processes - Process
9606 1 GO:0008150 OOO - biological_processes - Process
9606 1 GO:0008150 HHH - biological_processes - Process
9606 1 GO:0008150 YYY - biological_processes - Process
9606 1 GO:0031012 IDA - extracellular matrix - Component
9606 1 GO:0043312 TAS - neutrophil degranulat - Process
我想创建一个函数来接收包含要保存的信息的列数并返回“特殊”字典。我说“特殊”是因为在我的情况下,信息总是分类的,但它可能有不同的级别,而且我厌倦了不断编写为每个级别添加信息的逻辑。 (也许还有另一种方法,我无法搜索,所以,提前为我的无知道歉)
如果指定的列是8、2和3。8是类别最高的列,3是类别最低的列,可以得到预期的字典:
three_userinput = "8:2:3"
three = map(lambda x: int(x) - 1, three_userinput.split(":"))
DICT3 = {}
for line in file_handle:
info = line.split("\t")
if info[three[0]] in DICT3:
if info[three[1]] in DICT3[info[three[0]]]:
DICT3[info[three[0]]][info[three[1]]].add(info[three[2]])
else:
DICT3[info[three[0]]][info[three[1]]] = set([info[three[2]]])
else:
DICT3[info[three[0]]] = {info[three[1]]:set([info[three[2]]])}
pprint.pprint(DICT3)
Output:
{'Component': {'1': set(['GO:0005576', 'GO:0005615', 'GO:0031012'])},
'Function': {'1': set(['GO:0003674'])},
'Process': {'1': set(['GO:0002576', 'GO:0008150', 'GO:0043312'])}}
现在有四列 8、2、3 和 4。8 是类别最高的列,4 是类别最低的列,可以获得预期的字典:
four_userinput = "8:2:3:4"
four = map(lambda x: int(x) - 1, four_userinput.split(":"))
DICT4 = {}
for line in file_handle:
info = line.split("\t")
if info[four[0]] in DICT4:
if info[four[1]] in DICT4[info[four[0]]]:
if info[four[2]] in DICT4[info[four[0]]][info[four[1]]]:
DICT4[info[four[0]]][info[four[1]]][info[four[2]]].add(info[four[3]])
else:
DICT4[info[four[0]]][info[four[1]]][info[four[2]]] = set([info[four[3]]])
else:
DICT4[info[four[0]]][info[four[1]]] = {info[four[2]]:set([info[four[3]]])}
else:
DICT4[info[four[0]]] = {info[four[1]]:{info[four[2]]:set([info[four[3]]])}}
pprint.pprint(DICT4)
Output:
{'Component': {'1': {'GO:0005576': set(['IDA', 'OOO', 'TAS']),
'GO:0005615': set(['HDA']),
'GO:0031012': set(['IDA'])}},
'Function': {'1': {'GO:0003674': set(['ND', 'OOO'])}},
'Process': {'1': {'GO:0002576': set(['TAS']),
'GO:0008150': set(['HHH', 'ND', 'OOO', 'YYY']),
'GO:0043312': set(['TAS'])}}}
现在,当我面对五个级别的信息(五列)时,代码几乎不可读,而且真的非常乏味......我可以为每个级别创建特定的函数,但是......有没有一种方法可以设计一个可以处理的函数有多少级?
如果我没有正确解释自己,请随时询问我。