该小说网站的全部内容都是经过一个JS的加密,要想爬取这个网站那么将其内容解析是不可避免的,本文将讲解如何对其进行JS的逆向解析。网站:shuqi
随便点开一本书,打开浏览器自带的抓包工具:
点击第一个包,但是在这里找不到我们想要的数据,说明不在这个包里,退出去XHR类找一下:
从名称中看,第二个包有书的ID、页码等信息,大概率就是这个,打开看一下
发现这里有一份加密的字符ChapterContent,下面我们来对这个关键字进行逆向的解析:
点击右上角的全局搜索:
发现只搜索到一个,在源码中打开,继续搜索:
得到两个结果,我们看不出哪个是,我们先将这两个行打上断点(点击代码那一行对应的数字),并刷新
这里发现,第一个是得到这个加密字符的函数,这个暂时对我们没用,我们直接get之前那个包也能直接得到,这里我们点击这个小按钮进行单步调试。
点着点着发现进入了一个函数的循环,且这个函数对我们没什么用处,于是我们点击向上的这个小箭头,跳出当前函数,并继续一直点一直点,遇到循环就跳出,没有循环就单步调试。
直到我们遇到这个界面!!激动人心的时刻到啦!!
从这里我们就可以看出文章的内容是通过i._decodeCont(t.ChapterContent)这个函数得到的,把鼠标移动到这个函数上
到这里很显然,这个函数的结构是 i._decodeCont(加密的密文),点击i._decodeCont进入查看它的源代码。
新建一个js文件,把这个一整个函数复制到你建的js文件中,(注意:一定不要漏复制)
复制完后修改一下函数,将头改成function _decodeCont(t)这种格式,
(这个格式是由js的格式决定的,同时也是固定的)
function _decodeCont(t) {
return t = function(t) {
return t.split("").map(function(t) {
var e, i;
return t.match(/[A-Za-z]/) ? (e = Math.floor(t.charCodeAt(0) / 97),
i = (t.toLowerCase().charCodeAt(0) - 83) % 26 || 26,
String.fromCharCode(i + (0 == e ? 64 : 96))) : t
}).join("")
}(t),
function(t) {
var e, i, a, n, r, o, c, s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=", d = "", l = 0;
for (t = t.replace(/[^A-Za-z0-9\+\/\=]/g, ""); l < t.length; )
n = s.indexOf(t.charAt(l++)),
r = s.indexOf(t.charAt(l++)),
o = s.indexOf(t.charAt(l++)),
c = s.indexOf(t.charAt(l++)),
e = n << 2 | r >> 4,
i = (15 & r) << 4 | o >> 2,
a = (3 & o) << 6 | c,
d += String.fromCharCode(e),
64 != o && (d += String.fromCharCode(i)),
64 != c && (d += String.fromCharCode(a));
return function(t) {
for (var e, i = "", a = 0, n = 0, r = 0; a < t.length; )
n = t.charCodeAt(a),
n < 128 ? (i += String.fromCharCode(n),
a++) : n > 191 && n < 224 ? (r = t.charCodeAt(a + 1),
i += String.fromCharCode((31 & n) << 6 | 63 & r),
a += 2) : (r = t.charCodeAt(a + 1),
e = t.charCodeAt(a + 2),
i += String.fromCharCode((15 & n) << 12 | (63 & r) << 6 | 63 & e),
a += 3);
return i
}(d)
}(t)
}
到这里之后,我们只需要直接调用这个函数就可以将密文解密出来啦
t = '密文'
console.log(_decodeCont(t));
输出结果:
当然如果不想用JS,我们在py上传入参数也可以的(123.js就是前面放js函数的文件):
# -*- coding: gbk -*- # 防止出现乱码等格式错误
import execjs
def get_data():
# '123.js'是用来放前面的js的
with open('123.js','r',encoding='utf-8')as f:
read_js=f.read()
return_js = execjs.compile(read_js)
t = '44PN44PN4bPp54v577lZ5bvE5YvA5L675cJM5M2X5L+477lZ5bvE6VB95ozl5eF777lO5bvE5L+i5Yhy5LJ75n6277lO4bPqCTWlYm48LaViChBNtBBNtBvXfrJjw+vqvrvigBrqtBrpiBJpvBJjfrr6bhF6uh+8tGkvpv8+CTWlYm7wtVQwtVQzy4UbieaazbGyiXQydLoyeMQcanYzaVaxhV3ztdoiiWevtWmxinQxhVQxhXexhnUxhVixhXiycYGaa6KctMCyynKiiVmzyMayaLeyw7wyw6/xhV3zzX/bfVUzt7CyweiyfYUbt73yweiazbGyxdsiiVmxinQyw6/yvXixhV3aa6Kyco3zeoaiiVUvtW08LaViCwkvpv8+44PN44PN6Y+M6Y6V5n2D55dR6Vdk5oPC6W2W5L+i6VB95YvA55+y6LTG5cJM5M2X5L+45cvi5Yvd5YhN5YzV5dP355dR5Mlj5cn577lZ5Y2T5YvX6Y6V5n2D5nJ55L+i5cvi5evS5dJn55dR5o6V44PPCTWlYm48LaViChBNtBBNtBJyhrnpeBnLe+JWwrJxdhJZh+zMbhzMbhJVcBrnuBJgzrJyf++8wBF5t+nLe+ngb+r7w+F6hhJhghrnuBzKhhJyf++8wBJGdhnoihnQf+F7whJjw+nPb+npvrvPhhrKdB+8wBF4wrJVfBJAtrF6yBJltrJjfrngh+F6uhBNtwkvpv8+CTWlYm7wtVQwtVQzfdUzz77zt7Cyw4wcu43ayW/yvYQxhboxhVQxhXeydoGyuY/bhdixhVeiiVmadoezaVaxhVQbhdiywYizaX/zfdUycVGzye3yfMKiiVmbi5abveUyeeoadosazbGbi57bwn/zaMQct73xhoQxhV3bgosiiVmzz7GyvXibeeexhVQycMscxbwatowwtVV8LaViCwkvpv8+44PN44PN6Y+M5Ydo5oz05Mlb6Vdk5n6277lZ5nJ55eXu5clW5MPQ6Y+U5YvN6nT/6nJk6nJg77lZ5ozm5ov46YJj6Yri6LB955h05bzG6nBL77lOCTWlYm48LaViChBNtBBNtBF7vhJ5gBJ/irrRghJxc+nKfr+8wBJSdBJhghF6hhzQirn0h+F4wrF4v+JBh+F6uh+8wBJyhrrVhrJjfrnQf+vztrnXvhJyhrJAyhnBvrBNtwkvpv8+CTWlYm7wtVQwtVQbv6KzzX/ywMoyvYQyco3xheeyeeoycoaxhM/xhV3be7Gxh4QxhLwiiVmyw6/ytL/ytL/zzX/zyMayaLeyw7wwtVV8LaViCwkvpv8+44PN44PN5cJM5M2X5L+46X+05cvi5MPQ6nnM55dR44PO5Mnq6Y6w55dR77lZ5Lvj5nF05c2y6Y+L5YvA5cvi5Yv65YdT5L+J5bXz55F35Yd644PPCTWlYm48LaViChBNtBBNtBXNaBJ8bBJcghJgxB+8wBnVxrvQirJ5fhn0h++8wBF9bBrpv+vQirF4wrvQirnXvhnVxrJAyhr7zrJVd+F6hhJ9x+F4d+zfa++8a+nVxrF7tBF5vBzQirJCe+F7crJ5fhBNthXNarvXfrJjw+vqvrF9whrqtBJxgBv9e+v9e+rnuBvigBF6uhF4tBJCcrBNtwkvpv8+CTWlYm7wtVQwtVQyiXQydLoyeMQyynsyynsxhXGyb7QiiVmvtWmxinQbe7GazbGytWYzzX/bioiyg6siiVmbi5ayhoGycYGbfVUyeeobi5wayXwyicsbgosxhXiceW/yxdswtVYvtW08LaViCwkvpv8+44PN44PN4bPp6Vhy5cvi6Y+U5YdT5L2O5YdH5oXO77lZ5Lve5Yd65LJ06X646Y+L6VB95Yzj5Mhr5n625o2G5ndm5nnU77lZ5Y2t546j5Mlb6Y+r5LJe5oXO6LB95YvA5Lvj77lZ5YvA566u5L2J5Lvj6YPO5n626LB96Y+L6XnO5LnA5MPQ5YvQ5LJe5oz06Mrl6nJg77lZ6YPO5Yln6XnO5MXa77lZ5YvA5L675cJM5M2X5L+45L675MBd77ls4bPqCTWlYm48LaViChBNtBBNtBXNaBnVxrzQirJ3fhr7w+v3a+F6hhJhghvigBJyirJFc++8wBJJw++8wBnIzrJqvhJChBrnuBF6hhJ3fhr7w+nqcrF6uh+8wBF9bBv1ghr0c+JBh+n0y+nXvhvRhB+8wBJFfrF7eBv/zrJjfrv1fB+8trXNaGkvpv8+CTWlYm7wtVQwtVQyiXQydLoyeMQbe7GaaLQxhVQzwVsbi5mycVGiiVmzenCyco3zaVaxhXGxhXeayYsxheezaW3bi5abieabgoQiiVmyuooxhX3xhVQxhXexheebuYwxhVebi5wzaVaxhVQctMCyvVQaydGiiVmyw6/yxWCxheeyxdsiiVmbveUyfV/baLaaaVixhVQaaYmyi4CycYGyfYUaz7GadbUadbUwtVV8LaViCwkvpv8+44PN44PN4bPp5YvA77lZ5bvE5YvA5L6777lZ5eTP5Y2t5Yhf5cF+6Y+U5bvE5MPa4bPz4bPz4bPqCTWlYm48LaViChBNtBBNtBvXfrJjw+vqvrnkthJXdrF8iBrnuBrpv+JDxrvUdhJ3frrnuBrVhr+8wBvNtrvXfrJxgBJjuhJxgBnWerJDxrF4tBv+hr+8wBJDc+JKxhJDc+JKxhnXirrqtBnKfrrQa+BNtwkvpv8+CTWlYm7wtVQwtVQvtWmxinQbi5axhXiycYGyxbibi5axhLwavc/yxdsiiVmyweizyMayaLeyw7wxhV3ze5GyaXwyeeocu4myhoYzgYiyiYewtVYxinQxh6mxhXGxhXebgooagXszvbexheeyiVGbgoQiiVmaeLayvYQxhbozyMayaLeyw7wiiVmyw6/yfYUayYUxhV3yicsycoaxhbovtXovtXovtW08LaViCwkvpv8+44PN44PN6Vdk5oPC6W2W55l85eBd6LB95eJO5YdT5Lr65c2y77lZ5nJ55oPk5cvi5d2777lZ5Yzs5YvA6XnO5L676LXw56rA5Mlj5cn577lO5nJ55evS5evS55z955z955dR5eF75YdT5Yvx6Y6V5n2D77lZ5L+A5d2w5Yzs5eF75nFs5YdT77lOCTWlYm48LaViChBNtBBNtBv/aBv/aBJpfBrpv+vatrnWx+rHfBJsthrnuBF4aBv+hrnqcrF6uhF4tBv+uhzcgBv9ch+8wBvXfrJjw+vqvrJ/irrRghJCxrrYbBJpfBnparzPb+zcgBv9chv3xrJBh++8wBF4tBJxgBnFahJpdBF6uhv9chv+yrF4vhBNtwkvpv8+CTWlYm7wtVQwtVQbveUyfV/baLazxc7yaXwxheeyeeocdoGbinoxhVeiiVmzvbeycXsyeeoyxWCxhboxhVQbg7CiiVmyiXQydLoyeMQyaXwyxV7canYbg5UazbGaz7GyycwzfWGiiWb8LaViCwkvpv8+44PN44PN4bPp5d275YdT5eXu5clW4bPz4bPz5o+e5L6755lY55lY5d275YdT5eXu5clW4bPz4bPz4bPqCTWlYm48LaViChBNtBBNtBzcgBv9chzUwBnVi+nqwhnjw+JpdBv9chF4vhrnfrF6uhrnfrrpvr+8wBzQirnNdhJyhrF7vhJxdrJUhhzKdBnlbrrpv+z7uBJBuh+8wBJkurrRghvhdrJyhrrvfBJVfBv/zrrawrF6v+BNtwkvpv8+CTWlYm7wtVQwtVQyw6ezzX/xhV3aa6KctMCctdCxhXiycYGztV7xhLwyz57xhbiyuY/iiVmyfV/yfV/yhoGahdeyfYUbcbUye7izeoiwtVV8LaViCwkvpv8+44PN44PN4bPp6Y+L5clW5ePH77lZ5Yd66Y+L5eXu5d274bPz4bPz4bPqCTWlYm48LaViChBNtBBNtBJ8bBJcuhJgxBJDeBvatrviar+8wBnqihF6uhJCb+njyB+8zhXNaBnlbrngh+Jjfrv1ghr0c+J4chJoahnqcr+8wBv/zBvQirJAyhF4dhJyirF7g+zFfr+8wBv/zrJoahF4tBJhzhvztrrpv+JyirJJir+8wBngh+F6uhJCe+JjfrF4wrJNiBzFfrF6uhXNchXNchXNaGkvpv8+CTWlYm7wtVQwtVQzvY/zaL7zfV/yaXwbinoxhVeyxXmyvYQbi5acu4miiVmyi4CxhViyid7yid7xhVQyvdwiiVmycoabi5azeXUyvYQbi5azanKzenCzzX/zanKzvo7bi5ayiXQydLoyeMQazbGiiVmzt7CbcbUxhoQxhVQxhXeyfV/xhXiycYGyz57yweiwtVV8LaViCwkvpv8+44PN44PN4bPp5nFJ6M2v5L+i5cvi5olt5nzT5n2D77ls4bPqCTWlYm48LaViChBNtBBNtBnVi+nqwhnjw+nBtBJ8tBv9chJ4zBJFwBJ8bBJcuhJgxBvigBF6uhJUbBJCcrviar+8wBJ8bBJcuhJgxBrpvrJ8tBrpiBrfxrBNtwkvpv8+CTWlYm7wtVQwtVQbinmbi4sbhdiiiVmyiXQydLoyeMQyueYbtVUbveUycYGaeWUctMCiiWb8LaViCwkvpv8+44PN44PN4bPp6VPO6Vdk5nF077lZ5nJ95YdY77lZ5nFc5nFa55dR5nJ95YdY77lZ5Y2t6X+06X+05bPB5YzV5Yln5clW6Y+M5YzV5ora55dR5YdY4bPz4bPz5Y2t54lp6Y+M6nz06Y2z5YvX5cvi6YPO77lZ5nJ55L+i5cvi6MdH5nBO5dnT6MXk6MJU5bv/5n6255dR5b6Z5n625Yd644PP4bPqCTWlYm48LaViChBNtBBNtBXNaBnJhrnWwrJyhrv3a+nVxrvigBvztrr7zrJyhrJhghF6yBrohBJSeBF5fBF4tBF4dhv0gBv6d+F4d+zfa++8wBF9bBJhghJxc+F4d+JxgBnpvrn0h+v3e+JFc++8trXNaGkvpv8+CTWlYm7wtVQwtVQbtVUbveUycYGyycmyu7ezaWiycWowtVV8LaViCwkvpv8+44PN44PN5Lrt5Yd66YJ25o+M5b6D6Vdk5oPC6W2W55dR5Yd65Yvg77lZ5bdX5Yd657hM5olR6LnF44PPCTWlYm48LaViChBNtBBNtBvXfrJjw+vqvrnSbhnSbhrqtrJ8tBrpiBrqz++8wBF4tBrpv+vatrvNtrvXfrJxgBJjfrrscrzOx+vUdhJ3frv/zrnLe+nlbrngh+nVxB+8wBJ/t+zUwBF4tBnPfh+8wBrpiBnmdhJjfrnBvrF6uhF4v+nqcr+8wBrueBJMwhzOx++8zwkvpv8+CTWlYm7wtVQwtVQvtWmavYaiiVmzfLYxinQyvXiyxXszvWUywMoah5azyMayaLeyw7wiiVmzvWUyfYUzzX/zeoiiiVmxhM/xhV3xiWeyweiazbGvtXovtXovtW08LaViCwkvpv8+44PN44PN6VPO6Vdk5nF05bhW552N6Vdk5oPC6W2W55dR5bzY77lZ5YvN6Mv15o+Q6LJ477lZ6YJ257Fa5bdX5bv/5c2B5ePC6XnO5Yzj5nJ55LTn5Yve6nls55dR5YdY6X+05YdT77lZ6Mrh5nJ55bF/5YvA5bF/5bFC5L675bv/5n625o2G5Yve6nls44PPCTWlYm48LaViChBNtBBNtBvXfrJjw+vqvrzuhhrqtBvNtrvXfrJxgBnWv+nZu+rnuBJpfBnJhrrpv+JBh++8wBnraBrpa+rpv+JVfBF4tBF4dhzqbhroehnSvBrycrrnuBJzu+F6hh+8wBzPb+Jzu+F6hhnparJyhrnYz+nYz+nWv++8wBF4tBvRhBrnuBnSvBrpvrJJuBroeh+8wQkvpv8+CTWlYm7wtVQwtVQvtWmxhXiycYGiiVmyvXiyeeCztWKiiVmycXsydWwxhV3xiWexiXGyeeCxinQiiVmzvWUzzX/bcbUxhoQxinQyz57yweibg5/zvWUyeeoxhcGaz7wyunmytMexhXexiYGiiVUvtW08LaViCwkvpv8+44PN44PN6Vdk5oPC6W2W6Y+L5clW54X56Y+357BX77lZ5nF055n855dR5Yzs5L6W5n6m77lZ5MPf6VPO6Vdk5nF05bdX6X+q6X+05n6Z77lZ55+y6LTG5YvA5cvi5bdX5nJ55L2J5Lvj5cJM5M2X5L+477lZ5oPk54X55YdT54X55nF044PPCTWlYm48LaViChBNtBBNtBXNaBnVxrnRi+nRw++8tr+8trXNaGkvpv8+'
data = return_js.call('_decodeCont',t) # 传入参数t进入函数_decondeCont中
print(data)
if __name__ == '__main__':
get_data()
这个时候可能会遇到下面这个错误:
经过一番搜索之后才找到了这个问题的解决方案:去subprocess.py下面修改它的源代码:
将encoding修改前一定要备份!!万一弄错了好改回来。敲代码最忌不懂,乱改还不保留,这样很招人烦的,尤其团队合作时。
接下来点击运行就可以看见结果啦!
到这里解析就完成啦,剩下的将数据进行简单的处理和保存就搞定了
其他JS逆向案例:JS逆向解析案例-巨潮证券市场数据库