我导入了一个 CSV 文件(包含文本列和数字列)
x <- fread('myfile.csv', header = TRUE, verbose =T, na.strings = c("null", "'null'", ""))
但导入后,当我运行summary(x)时,所有列都被视为字符
mycolumn
Length:100000
Class :character
Mode :character
有什么办法让它将数字列识别为数字吗?下面是详细输出(来自 nrows 运行),以使其更快。
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 10.162 GB
File is opened and mapped ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Looking for supplied sep '\t' on line 30 (the last non blank line in the first 'autostart') ... found ok
Found 166 columns
First row with 166 fields occurs on line 1 (either column names or first row of data)
'header' changed by user from 'auto' to TRUE
Count of eol after first data row: 6513865
Subtracted 1 for last eol and any trailing empty lines, leaving 6513864 data rows
nrow limited to nrows passed in (100000)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (first 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (+middle 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (+last 5 rows)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (after applying colClasses and integer64)
Type codes: 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444441444444444444444444444444444444444444444414444444444444444444444444444444444 (after applying drop or select (if supplied)
Allocating 166 column slots (166 - 0 NULL)
Read 100000 rows and 166 (of 166) columns from 10.162 GB file in 00:00:04
0.564s ( 15%) Memory map (rerun may be quicker)
0.001s ( 0%) sep and header detection
1.613s ( 43%) Count rows (wc -l)
0.030s ( 1%) Column type detection (first, middle and last 5 rows)
0.015s ( 0%) Allocation of 100000x166 result (xMB) in RAM
1.437s ( 38%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.080s ( 2%) Changing na.strings to NA
3.739s Total
手动指定列类的方法是通过colClasses
争论。但freads
应该能够自动猜测数字列,这让我认为您的数字列中有些条目不是数字。
也许你还没有捕捉到所有类型的NA
价值观?如果是这种情况,则未捕获的NA
值将被读取为字符串,这将导致整列被设置为类型character
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)