我正在使用类似于以下简化脚本的内容来解析较大文件中的 python 片段:
import io
import tokenize
src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)
src = list(tokenize.tokenize(src.readline))
for tok in src:
print(tok)
src = tokenize.untokenize(src)
虽然Python2.x中的代码不一样,但它使用相同的习惯用法并且工作得很好。但是,使用 python3.0 运行上面的代码片段,我得到以下输出:
(57, 'utf-8', (0, 0), (0, 0), '')
(1, 'foo', (1, 0), (1, 3), 'foo="bar"')
(53, '=', (1, 3), (1, 4), 'foo="bar"')
(3, '"bar"', (1, 4), (1, 9), 'foo="bar"')
(0, '', (2, 0), (2, 0), '')
Traceback (most recent call last):
File "q.py", line 13, in <module>
src = tokenize.untokenize(src)
File "/usr/local/lib/python3.0/tokenize.py", line 236, in untokenize
out = ut.untokenize(iterable)
File "/usr/local/lib/python3.0/tokenize.py", line 165, in untokenize
self.add_whitespace(start)
File "/usr/local/lib/python3.0/tokenize.py", line 151, in add_whitespace
assert row <= self.prev_row
AssertionError
我搜索了有关此错误及其原因的参考资料,但找不到任何内容。我做错了什么以及如何纠正?
[edit]
After 党派 https://stackoverflow.com/users/54982/partisann根据观察,在源中附加换行符会导致错误消失,我开始弄乱我要取消标记的列表。看来EOF
如果令牌前面没有紧接着换行符,则会导致错误,因此删除它可以消除错误。以下脚本运行没有错误:
import io
import tokenize
src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)
src = list(tokenize.tokenize(src.readline))
for tok in src:
print(tok)
src = tokenize.untokenize(src[:-1])