您应该下次提供 X/Y 的数据,或者一些虚拟的数据,它会更快并为您提供特定的解决方案。现在我创建了一个以下形式的虚拟方程y = X**4 + X**3 + X + 1
.
有很多方法可以对此进行改进,但是找到最佳度数的快速迭代是简单地拟合每个度数的数据并选择具有最佳性能的度数(例如,最低 RMSE)。
您还可以决定如何保留训练/测试/验证数据。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
X = np.arange(100).reshape(100, 1)
y = X**4 + X**3 + X + 1
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
rmses = []
degrees = np.arange(1, 10)
min_rmse, min_deg = 1e10, 0
for deg in degrees:
# Train features
poly_features = PolynomialFeatures(degree=deg, include_bias=False)
x_poly_train = poly_features.fit_transform(x_train)
# Linear regression
poly_reg = LinearRegression()
poly_reg.fit(x_poly_train, y_train)
# Compare with test data
x_poly_test = poly_features.fit_transform(x_test)
poly_predict = poly_reg.predict(x_poly_test)
poly_mse = mean_squared_error(y_test, poly_predict)
poly_rmse = np.sqrt(poly_mse)
rmses.append(poly_rmse)
# Cross-validation of degree
if min_rmse > poly_rmse:
min_rmse = poly_rmse
min_deg = deg
# Plot and present results
print('Best degree {} with RMSE {}'.format(min_deg, min_rmse))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(degrees, rmses)
ax.set_yscale('log')
ax.set_xlabel('Degree')
ax.set_ylabel('RMSE')
这将打印:
最佳 4 级,RMSE 1.27689038706e-08
或者,您还可以构建一个执行多项式拟合的新类,并将其与一组参数一起传递给 GridSearchCV。