Apache Beam: Difference between revisions

Created page with "[https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br> Such pipelines are executed on a runner such as Apache Spark. Apache Beam is orig..."
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br>
[https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br>
Such pipelines are executed on a runner such as Apache Spark. Apache Beam is originally developed by Google.
Such pipelines are executed on a runner such as Apache Flink. Apache Beam is originally developed by Google.


==Usage==
==Usage==
Line 11: Line 11:
Pardo allows you to pass in a function and generate multiple items.<br>
Pardo allows you to pass in a function and generate multiple items.<br>
If you are yielding many items though, you should do a <code>beam.Reshuffle()</code> afterwards to split and get more parallelism.
If you are yielding many items though, you should do a <code>beam.Reshuffle()</code> afterwards to split and get more parallelism.
===GroupByKey===
==Administration==
How to setup Apache Beam running on Flick and Kubernetes.
===Resources===
* [https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md Apache Beam Python Jobs with Flicker K8s operator]
** [https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/tree/master/examples/beam/with_job_server flink on k8s yaml]
* [https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb Beam+Flink+Kubernetes+Python]
* [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#getting-started Flink on native kubernetes]