# Apache Beam


Apache Beam is a library for building parallel data pipelines.
Such pipelines are executed on a runner such as Apache Flink. Apache Beam is originally developed by Google.

## Usage

Programming guide, examples in Python.

### Background

Data are referred to as PCollection

### ParDo

Pardo allows you to pass in a function and generate multiple items.
If you are yielding many items though, you should do a beam.Reshuffle() afterwards to split and get more parallelism.