您可以关注本文 https://towardsdatascience.com/import-csv-files-as-pandas-dataframe-with-skiprows-skipfooter-usecols-index-col-and-header-fbf67a2f92a,这解释了参数之间的差异header
and skiprows
包含来自奥林匹克数据集的示例,可以下载here https://github.com/rashida048/Datasets/blob/master/olympics.csv.
总结一下:默认行为pd.read()
是读取所有行,在此数据集的情况下,包括不必要的第一行行号。
import pandas as pd
df = pd.read_csv('olympics.csv')
df.head()
0 1 2 3 4 ... 11 12 13 14 15
0 NaN № Summer 01 ! 02 ! 03 ! ... № Games 01 ! 02 ! 03 ! Combined total
1 Afghanistan (AFG) 13 0 0 2 ... 13 0 0 2 2
2 Algeria (ALG) 12 5 2 8 ... 15 5 2 8 15
3 Argentina (ARG) 23 18 24 28 ... 41 18 24 28 70
4 Armenia (ARM) 5 1 2 9 ... 11 1 2 9 12
然而参数skiprows
允许您在读取 .csv 文件时删除一行或多行:
df1 = pd.read_csv('olympics.csv', skiprows = 1)
df1.head()
Unnamed: 0 № Summer 01 ! 02 ! ... 01 !.2 02 !.2 03 !.2 Combined total
0 Afghanistan (AFG) 13 0 0 ... 0 0 2 2
1 Algeria (ALG) 12 5 2 ... 5 2 8 15
2 Argentina (ARG) 23 18 24 ... 18 24 28 70
3 Armenia (ARM) 5 1 2 ... 1 2 9 12
4 Australasia (ANZ) [ANZ] 2 3 4 ... 3 4 5 12
如果您想跳过一堆不同的行,您可以执行以下操作(注意缺少的国家/地区):
df2 = pd.read_csv('olympics.csv', skiprows = [0, 2, 3])
df2.head()
Unnamed: 0 № Summer 01 ! 02 ! ... 01 !.2 02 !.2 03 !.2 Combined total
0 Argentina (ARG) 23 18 24 ... 18 24 28 70
1 Armenia (ARM) 5 1 2 ... 1 2 9 12
2 Australasia (ANZ) [ANZ] 2 3 4 ... 3 4 5 12
3 Australia (AUS) [AUS] [Z] 25 139 152 ... 144 155 181 480
4 Austria (AUT) 26 18 33 ... 77 111 116 304
The header
参数告诉您从哪里开始读取 .csv,在以下情况下,其作用与skiprows = 1
:
# this gives the same result as df1 = pd.read_csv(‘olympics.csv’, skiprows = 1)
df4 = pd.read_csv('olympics.csv', header = 1)
df4.head()
Unnamed: 0 № Summer 01 ! 02 ! ... 01 !.2 02 !.2 03 !.2 Combined total
0 Afghanistan (AFG) 13 0 0 ... 0 0 2 2
1 Algeria (ALG) 12 5 2 ... 5 2 8 15
2 Argentina (ARG) 23 18 24 ... 18 24 28 70
3 Armenia (ARM) 5 1 2 ... 1 2 9 12
4 Australasia (ANZ) [ANZ] 2 3 4 ... 3 4 5 12
但是,您不能使用 header 参数来跳过一堆不同的行。您将无法使用标头参数复制 df2。希望这能澄清事情。