分析 Truffle 解释器

使用 Truffle 编写的解释器分析工具不胜枚举。在 JVM 模式下运行时，您可以使用标准 JVM 工具，例如 VisualVM、Java Flight Recorder 和 Oracle Developer Studio。在 Native Image 中运行时，您可以使用 Valgrind 工具套件中的 callgrind 和其他系统工具（例如 strace）。作为在 GraalVM 上运行的语言，可以使用其他 GraalVM 工具。对于足够广泛的分析定义，您还可以使用理想图可视化工具 (IGV) 和 C1 可视化工具来检查编译器输出。

本指南的重点不是如何使用每种工具，而是关于如何从工具中提取最有用的信息的建议，假设您对它们的使用有基本了解。

使用 CPU Sampler 进行分析 #

分析应用程序级别（例如，查找大部分时间花费在哪个（些）访客语言函数中）的最简单方法是使用 CPU Sampler，它是 /tools 套件的一部分，也是 GraalVM 的一部分。只需将 --cpusampler 传递给您的语言启动器即可。

language-launcher --cpusampler --cpusampler.Delay=MILLISECONDS -e 'p :hello'

您可能希望使用 --cpusampler.Delay=MILLISECONDS 设置采样延迟，以便在预热后才开始分析。这样，您可以轻松识别哪些函数被编译，哪些没有编译但仍花费大量执行时间。

有关更多 --cpusampler 选项，请参阅 language-launcher --help:tools。

从 CPU Sampler 获取编译数据 #

CPU Sampler 不会显示有关编译代码中花费的时间的信息。这至少部分是由于引入了多层编译，其中“编译代码”不够具有描述性。使用 --cpusampler.ShowTiers 选项允许用户控制是否希望完全查看编译数据，以及精确指定报告中应考虑哪些编译层。例如，添加 --cpusampler.ShowTiers=true 将显示执行期间遇到的所有编译层，如下所示。

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 553 samples with period 10ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  T0: Percent of time spent in interpreter.
  T1: Percent of time spent in code compiled by tier 1 compiler.
  T2: Percent of time spent in code compiled by tier 2 compiler.
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Thread[main,5,main]
 Name              ||             Total Time    |   T0   |   T1   |   T2   ||              Self Time    |   T0   |   T1   |   T2   || Location
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 accept            ||             4860ms  87.9% |  31.1% |  18.3% |  50.6% ||             4860ms  87.9% |  31.1% |  18.3% |  50.6% || ../primes.js~13-22:191-419
 :program          ||             5530ms 100.0% | 100.0% |   0.0% |   0.0% ||              360ms   6.5% | 100.0% |   0.0% |   0.0% || ../primes.js~1-46:0-982
 next              ||             5150ms  93.1% |  41.7% |  39.4% |  18.8% ||              190ms   3.4% | 100.0% |   0.0% |   0.0% || ../primes.js~31-37:537-737
 DivisibleByFilter ||              190ms   3.4% |  89.5% |  10.5% |   0.0% ||              100ms   1.8% |  80.0% |  20.0% |   0.0% || ../primes.js~7-23:66-421
 AcceptFilter      ||               30ms   0.5% | 100.0% |   0.0% |   0.0% ||               20ms   0.4% | 100.0% |   0.0% |   0.0% || ../primes.js~1-5:0-63
 Primes            ||               40ms   0.7% | 100.0% |   0.0% |   0.0% ||                0ms   0.0% |   0.0% |   0.0% |   0.0% || ../primes.js~25-38:424-739
-----------------------------------------------------------------------------------------------------------------------------------------------------------

或者，--cpusampler.ShowTiers=0,2 将仅显示解释时间以及在第二层编译代码中花费的时间，如下所示。

-----------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 620 samples with period 10ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  T0: Percent of time spent in interpreter.
  T2: Percent of time spent in code compiled by tier 2 compiler.
-----------------------------------------------------------------------------------------------------------------------------------------
Thread[main,5,main]
 Name              ||             Total Time    |   T0   |   T2   ||              Self Time    |   T0   |   T2   || Location
-----------------------------------------------------------------------------------------------------------------------------------------
 accept            ||             5510ms  88.9% |  30.9% |  52.3% ||             5510ms  88.9% |  30.9% |  52.3% || ../primes.js~13-22:191-419
 :program          ||             6200ms 100.0% | 100.0% |   0.0% ||              320ms   5.2% | 100.0% |   0.0% || ../primes.js~1-46:0-982
 next              ||             5870ms  94.7% |  37.3% |  20.6% ||              190ms   3.1% |  89.5% |  10.5% || ../primes.js~31-37:537-737
 DivisibleByFilter ||              330ms   5.3% | 100.0% |   0.0% ||              170ms   2.7% | 100.0% |   0.0% || ../primes.js~7-23:66-421
 AcceptFilter      ||               20ms   0.3% | 100.0% |   0.0% ||               10ms   0.2% | 100.0% |   0.0% || ../primes.js~1-5:0-63
 Primes            ||               20ms   0.3% | 100.0% |   0.0% ||                0ms   0.0% |   0.0% |   0.0% || ../primes.js~25-38:424-739
-----------------------------------------------------------------------------------------------------------------------------------------

从 CPU Sampler 创建火焰图 #

CPUSampler 的直方图输出可能相当大，难以分析。此外，作为一种平面格式，不可能分析调用图，因为该信息未在输出中编码。火焰图显示了整个调用图。其结构使得查看应用程序时间花费在哪里变得相当简单。

创建火焰图是一个多阶段过程。首先，我们需要使用 JSON 格式化程序分析应用程序

language-launcher --cpusampler --cpusampler.SampleInternal --cpusampler.Output=json -e 'p :hello' > simple-app.json

如果您想分析内部源（例如标准库函数），请使用 --cpusampler.SampleInternal=true 选项。

JSON 格式化程序编码了直方图格式中不可用的调用图信息。但是，要从该输出生成火焰图，我们需要将其转换为将调用堆栈样本折叠成单行的格式。这可以使用 Benoit Daloze 的 FlameGraph 分支中的 stackcollapse-graalvm.rb 来完成。

如果您还没有，您应该将此 FlameGraph 分支克隆到父目录中。现在您可以运行脚本来转换输出，并将其管道输送到将生成 SVG 数据的脚本中

../FlameGraph/stackcollapse-graalvm.rb simple-app.json | ../FlameGraph/flamegraph.pl > simple-app.svg

此时，您应该在基于 Chromium 的网络浏览器中打开 SVG 文件。您的系统可能配置了不同的图像处理应用程序作为 SVG 文件的默认应用程序。虽然在此类应用程序中加载文件可能会渲染图表，但它可能无法处理火焰图的交互式组件。Firefox 也可以，但基于 Chromium 的浏览器目前似乎对火焰图文件有更好的支持和性能。

使用 Oracle Developer Studio 进行分析 #

Oracle Developer Studio 包含一个可与 GraalVM 配合使用的性能分析器。Developer Studio 可以从 OTN 下载，编写本文时的当前版本（12.6）提供了用于生产用途和商业应用程序开发的永久免费许可证。

使用 Developer Studio 性能分析器非常简单。将 Developer Studio 二进制文件的路径包含在您的 PATH 中，然后将您的常规命令行与 collect 前缀。例如

collect js mybenchmark.js

完成后，将创建一个“实验”(.er) 目录，其中包含命令执行的分析数据，默认为 test.1.er。要查看分析结果，请使用 analyzer 工具

analyzer test.1.er

analyzer GUI 允许您以多种不同方式查看捕获的分析信息，例如应用程序的时间线、平面函数列表、调用树、火焰图等。还有一个命令行工具 er_print，可用于以文本形式输出分析信息，以便进一步分析。

有关详细信息，请参阅性能分析器文档。

分析 Truffle 解释器

使用 CPU Sampler 进行分析 #

从 CPU Sampler 获取编译数据 #

从 CPU Sampler 创建火焰图 #

使用 Oracle Developer Studio 进行分析 #

联系我们