剧透:是的。见下文。
尝试优化字母计数器以匹配 C。我已经与它斗争到了 2 倍的赤字。
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.accumulate
(\a _ -> a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))
一些注意事项:
- 真的很慢直到我改变
Data.Vector
to Data.Vector.Unboxed
。这是为什么?
- 我以为大部分时间都会花在
accumulate
。我错了。
- 70%的时间花在
generate
.
- Haskell 代码不得不毫无意义地将 Word8 转换为 Int;还有,无用的军队
()
实际上可能会或可能不会被创建。
完整清单:
import qualified Data.ByteString as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.accumulate
(\a _ -> a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))
printCounts :: V.Vector Int -> IO ()
printCounts cs =
mapM_
(uncurry $ printf "%c: %d\n")
(zip (map toEnum [0..255] :: String) (V.toList cs))
main :: IO ()
main = do
filename <- fmap head getArgs
f <- B.readFile filename
let counts = letterCount f
printCounts counts
竞争C代码:
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <stdlib.h>
int letcnt [256];
int* letter_count(unsigned char *s, unsigned int len)
{
int i;
memset(letcnt, 0, 256 * sizeof(int));
for(i = 0; i < len; i++){
letcnt[*(s + i)]++;
}
return (letcnt);
}
void print_counts() {
int i;
for(i = 0; i < 256; i++) {
printf("'%c': %d\n", (unsigned char) i, letcnt[i]);
}
}
// st_size
int main(int argc, char **argv)
{
assert(argc == 2);
FILE* f = fopen(argv[1], "r");
struct stat st;
stat(argv[1], &st);
off_t len = st.st_size;
unsigned char* contents = calloc(len, 1);
fread(contents, len, 1, f);
fclose(f);
letter_count(contents, len);
print_counts();
return 0;
}
Timings;
$ time ./a.out /usr/share/dict/words > /dev/null
real 0m0.012s
user 0m0.005s
sys 0m0.005s
$ time ./lettercount /usr/share/dict/words > /dev/null
real 0m0.017s
user 0m0.009s
sys 0m0.007s
Update
我认为性能上限归因于这个错误:runST 不是免费的 https://ghc.haskell.org/trac/ghc/ticket/5916。并不是说我认为不可能进一步优化,但只要 runST 施加一些开销,就不太可能接近 C。
此外,还根据@Zeta 的评论修复了 C 代码。