解决新手爬虫遇到的UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7084: illegal multibyte sequence
起初的代码
from urllib.request import urlopen
html=urlopen("网址")
with open("xzcf.html",mode="w") as f:
f.write(html.read().decode("UTF-8"))
print("over")
会报错:UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xa0’ in position 7084: illegal multibyte sequence
试了网上的 titleUni = titleHtml.decode(“UTF-8”, ‘ignore’),但是还是报相同的错误,最后是在with open(“xzcf.html”,mode=“w”) as f:括号里加上encoding=‘utf-8’,正确的代码:
with open("xzcf.html",mode="w", encoding='utf-8') as f:
f.write(html.read().decode("UTF-8"))
print("over")