intelvtune

英特尔融核上的 MKL 性能

我有一个例程对小矩阵 50 100 x 1000 个元素执行一些 MKL 调用以拟合模型然后我调用不同的模型在伪代码中 double doModelFit int model while done cblas dgemm cblas

c openmp intelmkl intelvtune intelmic

我目前正在为 Java 应用程序开发一个 C 模块需要一些性能改进请参阅提高网络编码性能 https stackoverflow com questions 7737488 improving performance of networ

Java c Optimization SSE intelvtune

我正在使用英特尔 VTune Amplifier XE 2011 来分析我的程序的性能我希望能够在分析结果中查看源代码文档说我需要提供符号信息不幸的是它没有说明在编译我的程序时如何生成该符号信息在 VTune 的 Windows

performance intel intelvtune profiling

这是现有线程的后续内容 http stackoverflow com questions 12724887 caching in a high performance financial application 我发现这不是阻碍我的应用程序

c profiling intelvtune

我正在编写一些模板代码来对使用浮点数和双精度数的数值算法进行基准测试以便与 GPU 实现进行比较我发现我的浮点代码速度较慢在使用 Intel 的 Vtune Amplifier 进行调查后我发现 g 正在生成额外的 x86 指令 c

c templates g floatingpointprecision intelvtune