\(
\newcommand{\P}[]{\unicode{xB6}}
\newcommand{\AA}[]{\unicode{x212B}}
\newcommand{\empty}[]{\emptyset}
\newcommand{\O}[]{\emptyset}
\newcommand{\Alpha}[]{Α}
\newcommand{\Beta}[]{Β}
\newcommand{\Epsilon}[]{Ε}
\newcommand{\Iota}[]{Ι}
\newcommand{\Kappa}[]{Κ}
\newcommand{\Rho}[]{Ρ}
\newcommand{\Tau}[]{Τ}
\newcommand{\Zeta}[]{Ζ}
\newcommand{\Mu}[]{\unicode{x039C}}
\newcommand{\Chi}[]{Χ}
\newcommand{\Eta}[]{\unicode{x0397}}
\newcommand{\Nu}[]{\unicode{x039D}}
\newcommand{\Omicron}[]{\unicode{x039F}}
\DeclareMathOperator{\sgn}{sgn}
\def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits}
\def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits}
\)
Apache Beam is a library for building parallel data pipelines.
Such pipelines are executed on a runner such as Apache Spark. Apache Beam is originally developed by Google.
Usage
Programming guide, examples in Python.
Background
Data are referred to as PCollection
Create
Map
ParDo
Pardo allows you to pass in a function and generate multiple items.
If you are yielding many items though, you should do a beam.Reshuffle()
afterwards to split and get more parallelism.