1億イベントのtreeデータのグラフ化の速度をPythonとROOTで比較する

この記事の Awkward のバージョンは1.X.Xです

データはここのを使う。64バイト(double 8個)×1億イベント(エントリー)である。

phst.hateblo.jp

結果のまとめ

言語	ライブラリ	ファイル形式	処理時間	スレッド数
Python	awkward	parquet	46 sec	1
C++	ROOT TTree	ROOT	100 sec	1
C++	ROOT RDataFrame	ROOT	22 sec (不安定)	8
Python	uproot+awkward	ROOT	76 sec	1

まずはPythonコード。Pythonでは標準であろうparquet形式を読み込み、テンポラリーなメンバー da を作成する。さらに、カット条件を適用したテンポラリー tree_cutも作成する。matplotlibで描画する。手元のPCでは約46秒かかった。

import time
import awkward as ak
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter

start = time.time()

tree = ak.from_parquet("tree.parquet", lazy=True)

tree["da"] = ((tree["ax0"]-tree["ax1"])**2+(tree["ay0"]-tree["ay1"])**2)**0.5
tree_cut = tree[tree["da"]<0.4]

fig, axes = plt.subplots(ncols=2,nrows=2)
ax=axes.flat[0]
ax.hist2d(tree["x"],tree["y"],bins=[300,300],range=[[0,300],[0,300]])

ax=axes.flat[1]
ax.hist2d(tree["ax"],tree["ay"],bins=[150,150],range=[[-1.5,1.5],[-1.5,1.5]])

ax=axes.flat[2]
ax.hist(tree["da"], bins=1000,range=[0,2])
ax.yaxis.set_major_formatter(ScalarFormatter(useMathText=True))
ax.ticklabel_format(axis="y", style='scientific', scilimits=[3,3])
ax.set_xlim(0,2.0)

ax=axes.flat[3]
ax.hist(tree_cut["da"], bins=1000,range=[0,2])
ax.yaxis.set_major_formatter(ScalarFormatter(useMathText=True))
ax.ticklabel_format(axis="y", style='scientific', scilimits=[3,3])
ax.set_xlim(0,2.0)

plt.savefig("plot_python.pdf")

plt.clf()
plt.cla()
plt.close()

end = time.time()
print(f"{(end-start)*1000:.0f} [msec]")

次に、ROOTで書く。普通の TFile TTreeを使った書き方である。手元のPCでは約100秒 (Pythonの約2倍)かかった。

#include <TFile.h>
#include <TTree.h>
#include <TH2D.h>
#include <TH1D.h>
#include <TCanvas.h>

#include <iostream>

void makeGraph1() {

    TFile* f = new TFile("output.root");
    TTree* tree = (TTree*)f->Get("tree");

    TH2D* h1 = new TH2D("h1", "pos. dist.", 300, 0, 300, 300, 0, 300);
    TH2D* h2 = new TH2D("h2", "ang. dist.", 150, -1.5, 1.5, 150, -1.5, 1.5);

    TH1D* h3 = new TH1D("h3", "before cut", 1000, 0, 2);
    TH1D* h4 = new TH1D("h4", "after cut", 1000, 0, 2);

    TCanvas* c1 = new TCanvas();
    c1->Divide(2, 2);
    c1->cd(1); tree->Draw("y:x>>h1", "", "colz");
    c1->cd(2); tree->Draw("ay:ax>>h2", "", "colz");
    c1->cd(3); tree->Draw("sqrt((ax0-ax1)*(ax0-ax1)+(ay0-ay1)*(ay0-ay1))>>h3");
    c1->cd(4); tree->Draw("sqrt((ax0-ax1)*(ax0-ax1)+(ay0-ay1)*(ay0-ay1))>>h4", "(ax0-ax1)*(ax0-ax1)+(ay0-ay1)*(ay0-ay1)<0.4*0.4");
    c1->Print("plot_root1.pdf");
}

RDataFrameを使うと高速化できるらしい。ただし、手元の環境では、コンパイルには成功するものの、実行すると、正常終了したり、コード -1073740940 で終了しました。 というエラーメッセージが出て終了してしまったりと、挙動が安定しない。成功すると、全スレッド使って約22秒で終了する。数スレッド並列にするだけで30秒台になったので、有効なのは間違いないが、Pythonは並列なしで46秒なのでコアあたりの速度はPythonの方がこの程度ではまだ優位である。

環境

root_v6.30.04.win64.vc17
VS 2022 17.9.1
Windows 11 Pro 23H2

#include <ROOT/RDataFrame.hxx>
#include <TCanvas.h>

void makeGraph2() {

    ROOT::EnableImplicitMT();
    ROOT::RDataFrame df("tree", "output.root");

    auto dang_cut = [](double ax0, double ay0, double ax1, double ay1) {return (ax0 - ax1) * (ax0 - ax1) + (ay0 - ay1) * (ay0 - ay1) < 0.4 * 0.4; };
    auto df_cut = df.Filter(dang_cut, { "ax0","ay0","ax1","ay1" });

    auto h1 = df.Histo2D({ "h1","pos. dist.",300,0,300,300,0,300 }, "x", "y");
    auto h2 = df.Histo2D({ "h2","ang. dist.",150,-1.5,1.5,150,-1.5,1.5 }, "ax", "ay");
    auto h3 = df.Define("da", "sqrt((ax0-ax1)*(ax0-ax1)+(ay0-ay1)*(ay0-ay1))").Histo1D({ "h3", "before cut", 1000, 0, 2 }, "da");
    auto h4 = df_cut.Define("da", "sqrt((ax0-ax1)*(ax0-ax1)+(ay0-ay1)*(ay0-ay1))").Histo1D({ "h4", "after cut", 1000, 0, 2 }, "da");

    TCanvas* c1 = new TCanvas();
    c1->Divide(2, 2);
    c1->cd(1); h1->Draw("colz");
    c1->cd(2); h2->Draw("colz");
    c1->cd(3); h3->Draw();
    c1->cd(4); h4->Draw();
    c1->Print("plot_root_MT.pdf");
}

コード -1073740940 はDWORDで C0000374。ヒープ破損の例外が飛んだことを意味する。

最後に、Pythonでrootファイルを読む。uprootというライブラリを使うと良い。読んだ後の形式は awkward=ak とする。awkward から直接 parquetファイルを読んだ場合よりも 30秒余分にかかったので、(少なくともデフォルトの)ROOTファイルのパフォーマンスはparquetより低いことが分かる。

import uproot
tree = uproot.concatenate("output.root:tree",library ="ak")

物理の駅 Physics station by 現役研究者

テクノロジーは共有されてこそ栄える

1億イベントのtreeデータのグラフ化の速度をPythonとROOTで比較する