Data:
import pandas as pd
data = pd.DataFrame({'classes':[1,1,1,2,2,2,2],'b':[3,4,5,6,7,8,9], 'c':[10,11,12,13,14,15,16]})
My code:
import numpy as np
from sklearn.cross_validation import train_test_split
X = np.array(data[['b','c']])
y = np.array(data['classes'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=4)
问题:
train_test_split将从所有类别中随机选择测试集。有什么办法可以拥有the same测试集的数量每堂课? (例如,第1类的两个数据和第2类的两个数据。注意每个类的总数不相等)
预期结果:
y_test
array([1, 2, 2, 1], dtype=int64)