最近在利用熵权法选取最优指标数据时,计算权重得到的是全为nan值的权重,经过分析过程,找到问题所在。
数据展示:
熵权法步骤:
step 1 :标准化处理
step 2 : 计算每个维度的信息熵
step 3 :差异系数
step 4 :计算权重
step 5 : 计算综合评分
python实现:
#导入数据
data = pd.read_excel(r'data\data.xlsx',sheet_name = None)
df5 = data['2021']
df = df5.drop('class',axis = 1) #axis参数默认为0
df.head()
#调包
from sklearn import preprocessing
import pandas
min_max_normalizer=preprocessing.MinMaxScaler(feature_range=(0,1))
#feature_range设置最大最小变换值,默认(0,1)
scaled_data=min_max_normalizer.fit_transform(df)
#将数据缩放(映射)到设置固定区间
df_normalized=pandas.DataFrame(scaled_data)
#将变换后的数据转换为dataframe对象
# print(df5_normalized)
df_normalized
df_normalized.columns=['X1','X2','X3','X4','X5','X6','X7','X8','X9','X10','X11','X12','X13','X14','X15','X16','X17','X18','X19','X20','X21','X22','X23','X24','X25','X26',
'X27','X28','X29','X30','X31','X32','X33','X34','X35','X36','X37','X38','X39','X40','X41','X42','X43','X44','X45','X46','X47','X48',
'X49','X50','X51','X52','X53','X54','X55','X56','X57','X58','X59','X60','X61','X62','X63','X64']
df_normalized.to_excel('2021年标准化数据.xlsx',index=False)
df_normalized
import copy
[m,n]=df_normalized.shape
#计算信息熵
df_normalized = np.array(df_normalized)
p=copy.deepcopy(df_normalized)
for j in range(0,n):
p[0:,j]=df_normalized[0:,j]/np.sum(df_normalized[0:,j])
print(p)
E=copy.deepcopy(df_normalized[0,:])
p=np.nan_to_num(p)
# for j in range(0,n):
E=(-1/np.log(m))*sum(p*np.log(p))#此处对p值加上一个极小值,否则p中的0.0取对数后讲产生一个-inf无穷大值
# E = -1/np.log(m)*p*np.log(p)
print(E)
# #计算权重
# w=(1-E)/sum(1-E)
# print(w)
print('np.log(p)维度:',np.log(p).shape)
print('p维度:',p.shape)
print('p*np.log(p)维度:',(p*np.log(p)).shape)
# p*np.log(p)
# np.log(p)
#计算评分
score=np.dot(p,w).round(5)
print(score)
score=pd.DataFrame(score,index=df.index,columns=['综合得分']).sort_values(by =['综合得分'],ascending = False)
score
问题出现在64个指标输出时权重值全为nan.
经检查发现经过标准化处理后的数据中含有0.0值
对0取对数将会产生一个无穷小量,即 -inf。
所有利用权重公式计算后得到的权重都是 nan(Not A Number)值
解决方法:
在计算权重时对p加上1个极小值。
E=(-1/np.log(m))*sum(p*np.log(p+1e-10))
问题成功解决。