Apache Beam: Difference between revisions
Created page with "[https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br> Such pipelines are executed on a runner such as Apache Spark. Apache Beam is orig..." |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br> | [https://beam.apache.org/ Apache Beam] is a library for building parallel data pipelines.<br> | ||
Such pipelines are executed on a runner such as Apache | Such pipelines are executed on a runner such as Apache Flink. Apache Beam is originally developed by Google. | ||
==Usage== | ==Usage== | ||
Line 11: | Line 11: | ||
Pardo allows you to pass in a function and generate multiple items.<br> | Pardo allows you to pass in a function and generate multiple items.<br> | ||
If you are yielding many items though, you should do a <code>beam.Reshuffle()</code> afterwards to split and get more parallelism. | If you are yielding many items though, you should do a <code>beam.Reshuffle()</code> afterwards to split and get more parallelism. | ||
===GroupByKey=== | |||
==Administration== | |||
How to setup Apache Beam running on Flick and Kubernetes. | |||
===Resources=== | |||
* [https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md Apache Beam Python Jobs with Flicker K8s operator] | |||
** [https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/tree/master/examples/beam/with_job_server flink on k8s yaml] | |||
* [https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb Beam+Flink+Kubernetes+Python] | |||
* [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#getting-started Flink on native kubernetes] |