Skip to main content

Posts

Showing posts from March, 2019

Developing data processing job using Apache Beam - Windowing

Talking about Streaming data processing, it's definitively worth to mention "Windowing".  An example, showed in the previous post about streaming, was quite simple and didn't incorporate any aggregation operations like GroupByKey or Combine. But what if we need to perform such data transforms on unbounded stream? The answer is not as obvious as for bounded data collections because, in this case. we don't have an exact moment on a time line when we can conclude that all data has been arrived. So, for this purpose, most of the data processing engines use a conception of Windowing that allows to split unbounded data stream into time "windows" and perform such operation inside it. Beam is not an exception and it provides rich API for doing that. Problem description Let's take another example to see how windowing works in practice. Just to recall, the previous task was the following – there is unlimited stream of geo coordinates (X and Y) of some obje