集成学习
集成学习的方式分为两类:
个体的学习器之间存在强依赖关系,必须串行生成序列化方法,代表
Boosting;
个体学习器之间不存在强依赖关系,可同时生成并行化方法,代表是Bagging和随机森林。
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008102804796.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008102903269.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
bagging
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008103122886.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
boosting
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008103254987.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008103502975.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
stacking
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008103848771.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
2)点击率预估
使用GBDT+LR进行点击率(CTR)预估。
https://blog.csdn.net/Snoopy_Yuan/article/details/80703175?depth_1-utm_source=distribute.pc_r
elevant.none-task&utm_source=distribute.pc_relevant.none-task
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008135921618.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008140242281.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
XGBOOST 的推导过程
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008142410651.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
实例:
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008142535914.png#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008142621370.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
XGBOOST 是一个加法模型
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008142658860.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143403370.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
T为叶子节点的个数,Wj为叶子节点的权重。
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143509729.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143625644.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143718453.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143841909.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008143942544.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
Xgboost-重要参数
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144003258.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144204192.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
对于General paramaters,表示使用何种基本模型,通常是决策树或线性模型,一般不需要调整,采
用默认值即可,参数调优不包括这部分。
主要参数是以下几类:
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144323412.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144353450.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
对于Booster paramaters,表示若学习器相关参数,需要仔细调整,会影响模型性能。.
eta 学习率
gamma
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144545631.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144654926.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008144739150.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008145153367.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
对于Learning task parameters,与任务有关,定下来后通常不需要调整。
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008145337283.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
XGBoost调参技巧
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008145415936.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
XGBOOST 比GBDT的特点
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100814562588.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
选用哪一种xgboost
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008150939429.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
*
代码实操* :
## 获取数据
import xgboost as xgb # 调用陈天奇的库
from xgboost.sklearn import XGBClassifier # 使用sklearn api
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import metrics
data = pd.read_table('C:/Users/lb/Desktop/test/gouwu.txt',sep='\t',engine="python",encoding = 'utf-8')
data.columns.values
data.head()
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008155607251.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
data.shape
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008155642494.png#pic_center)
分割数据集
from sklearn.model_selection import train_test_split
data_x = data.drop(['留存标签'],axis=1)
data_y = data['留存标签']
train_x,test_x,train_y,test_y = train_test_split(data_x,data_y,test_size=0.3,random_state=5)
行标签的恢复以后注意添加这个步骤
for i in [train_x,test_x]:
i.index = range(i.shape[0])
train_x.head()
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008155820668.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
模型的训练
from sklearn.metrics import accuracy_score
model = XGBClassifier( )
# 训练模型
model.fit(train_x,train_y)
# 预测模型
pred_y = model.predict(test_x)
metrics.accuracy_score(test_y, pred_y)
分类的准确率
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008160637104.png#pic_center)
xgboost 进阶用法
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008160752879.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
监控每一步的表现
# 监控模型每一步的表现
# eval_set=[(x_test,y_test)] 评估数据集,list类型
# eval_metric 评估标准(多分类问题,使用mlogloss作为损失函数),二分类 auc
# early_stopping_rounds= 10 如果模型的loss十次内没有减小,则提前结束模型训练
# verbose = True True显示,False不显示
eval_set = [(test_x, test_y)]
model.fit(train_x, train_y, early_stopping_rounds=10, eval_metric="auc",
eval_set=eval_set, verbose=True)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008200841746.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
显示特征的重要性
#显示各个特征的重要性 ---- 基于训练的特征树输出
from xgboost import plot_importance
from matplotlib import pyplot
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
plot_importance(model)
pyplot.show()
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100816104471.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
为了得到影响因素正向还是逆向的影响,用shap
shap 来解释xgboost模型
import shap
shap.initjs()
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(train_x)
print(shap_values)
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100816231191.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
shap_values.shape
train_x.shape
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008162336926.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
plt.rcParams['font.sans-serif']=['SimHei'] #显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
shap.summary_plot(shap_values,train_x,max_display=30)
#红色高 蓝色低
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008162401728.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
shap.summary_plot(shap_values,train_x, plot_type="bar")
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008162441718.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
做一个简单的验证
pd.DataFrame(abs(shap_values),columns=train_x.columns).mean().sort_values(ascending=False)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008163942136.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
# 多个变量交互
shap_interaction_values = shap.TreeExplainer(model).shap_interaction_values(train_x)
shap.summary_plot(shap_interaction_values, train_x, max_display=4)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008164554335.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
依赖图
依赖图
同样可以通过单变量的依赖图看出单变量对目标的影响,横坐标是特征值纵坐标是shap值:
shap.dependence_plot('购物距离注册时长',
shap_values,train_x,
interaction_index=None,
show=False)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008165104195.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
可以看出购物距离注册市场对于用户的留存存在正相关的作用
类似案例的参考解释:
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008165159214.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100816522656.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
那么双变量呢? 参考案例如下:
其他别人案例代码:
shap.dependence_plot('Pclass', shap_values[1], X_test, interaction_index='Sex_encoding', show=False)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008201551407.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
得出一个经验:不要看太多的颜色而是方向
[添加链接描述] 可再去参考这个教程(https://blog.csdn.net/weixin_43615654/article/details/103436632?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param)
观察某一行各个特征对结果的影响
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[5], train_x.iloc[5])
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008202450769.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
模型调参
from sklearn.model_selection import GridSearchCV
#这里交叉验证也可以 K折
from sklearn.model_selection import StratifiedKFold
#设定网格搜索的xgboost参数搜索范围,值搜索XGBoost的主要6个参数.
#注意看这里是个字典
param_dist = {
'n_estimators':range(80,150,5),
'max_depth':range(3,10,1),
'learning_rate':np.linspace(0.01,0.1,20),
'subsample':np.linspace(0.7,0.9,20),
'colsample_bytree':np.linspace(0.5,0.98,10),
'min_child_weight':range(1,9,1)
}
网格搜索
grid = GridSearchCV(model,param_dist,cv = 3,scoring = 'neg_log_loss',
n_jobs = -1)
#在训练集上训练
grid.fit(train_x, train_y)
#返回最优的训练器
grid.best_params_
model_best = XGBClassifier(grid.best_params_)
速度慢下3.
#上面这个速度太慢
model1 = XGBClassifier()
learning_rate = [0.0001, 0.001, 0.01, 0.1,0.12,0.15,0.2,0.3]
#转成字典
param_grid = dict(learning_rate=learning_rate)```
```python
param_grid
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008205500596.png#pic_center)
score 在不同场景情况下的取值
Python-sklearn包中StratifiedKFold和KFold生成交叉验证数据集的区别
利用KFOLD 实现交叉验证
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100821033172.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008205938648.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
model1 = XGBClassifier()
learning_rate = [0.0001, 0.001, 0.01, 0.1,0.12,0.15,0.2,0.3]
param_grid = dict(learning_rate=learning_rate)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
grid_search = GridSearchCV(model1, param_grid, scoring="neg_log_loss",
n_jobs=-1, cv=kfold)
grid_result = grid_search.fit(train_x, train_y)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
![在这里插入图片描述](https://img-blog.csdnimg.cn/2020100820553785.png#pic_center)
model2 = XGBClassifier()
n_estimators = range(80,200,5)
param_grid = dict(n_estimators = n_estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
grid_search = GridSearchCV(model2, param_grid, scoring="neg_log_loss",
n_jobs=-1, cv=kfold)
grid_result = grid_search.fit(train_x, train_y)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008205604907.png#pic_center)
# 调参
model3 = XGBClassifier(n_estimators=110,learning_rate= 0.12)
model3.fit(train_x,train_y)
# 预测模型
pred_y = model3.predict(test_x)
metrics.accuracy_score(test_y, pred_y)
0.7625979843225084
保存模型
# 保存模型
import pickle
pickle.dump(model,open("C:/Users/lb/Desktop/test/tl.dat","wb"))
# 可以进行读
# 读取模型,重新打开jupyter
import pickle
load_file = open("C:/Users/lb/Desktop/test/tl.dat","rb")
load_game_data = pickle.load(load_file)
load_file.close()
load_game_data
![在这里插入图片描述](https://img-blog.csdnimg.cn/20201008205823175.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1NTMxNTk0,size_16,color_FFFFFF,t_70#pic_center)
接着预测
load_game_data.predict(train_x)