flink Window的Timestamps/Watermarks和allowedLateness的区别

2021-03-30 08:26

阅读：533

标签：ast data- eth ring 部分 reg after example apach

Watermartks是通过additional的时间戳来控制窗口激活的时间，allowedLateness来控制窗口的销毁时间。

注：因为此特性包括官方文档在1.3～1.5版本均未做改变，所以此处使用1.5版的文档

在EventTime的情况下，

1. 一条记录的事件时间来控制此条记录属于哪一个窗口，Watermarks来控制这个窗口什么时候激活。

2. 假如一个窗口时间为00:00:00～00:00:05，Watermarks为5秒，那么当flink收到事件事件为00:00:10秒的数据时，即Watermarks到达00:00:05，激活这个窗口。

3. Watermarks激活窗口的方式，官方文档推荐为复写AssignerWithPeriodicWatermarks，与我们当前项目实现方式一致[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/event_timestamps_watermarks.html]

4. 或者也可以使用我们项目中已经用到的在env级别下的config中设置watermark的方式

env.getConfig().setAutoWatermarkInterval(applConfig.getWatermarkInterval());

当窗口被激活且运行完毕以后，此时这个窗口不一定被销毁，窗口状态有可能会被继续保持，这一点取决于allowedLateness

[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#allowed-lateness]

In a nutshell, a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness (see Allowed Lateness). Flink guarantees removal only for time-based windows and not for other types, e.g. global windows (see Window Assigners). For example, with an event-time-based windowing strategy that creates non-overlapping (or tumbling) windows every 5 minutes and has an allowed lateness of 1 min, Flink will create a new window for the interval between 12:00 and 12:05 when the first element with a timestamp that falls into this interval arrives, and it will remove it when the watermark passes the 12:06 timestamp.

In addition, each window will have a Trigger (see Triggers) and a function (ProcessWindowFunction, ReduceFunction, AggregateFunction or FoldFunction) (see Window Functions) attached to it. The function will contain the computation to be applied to the contents of the window, while the Trigger specifies the conditions under which the window is considered ready for the function to be applied. A triggering policy might be something like “when the number of elements in the window is more than 4”, or “when the watermark passes the end of the window”. A trigger can also decide to purge a window’s contents any time between its creation and removal. Purging in this case only refers to the elements in the window, and not the window metadata. This means that new data can still be added to that window.

Apart from the above, you can specify an Evictor (see Evictors) which will be able to remove elements from the window after the trigger fires and before and/or after the function is applied.[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#window-lifecycle]

1. 假如设置allowedLateness为60秒，那么窗口的状态会一直保持到事件时间为00:01:05的数据到达，或者如果最后一条数据早于00:01:05秒，则等到最后一条数据到达后再等待此数据于00:01:05的差值时间。

2. 那么在窗口被销毁前，可以通过一些方式再次激活。注意，allowedLateness只能控制窗口销毁行为，并不能控制窗口再次激活的行为，这是独立的两部分行为。

3. 官方文档推荐的方式为Getting late data as a side output，可以单独获得再次被激活的窗口流https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#getting-late-data-as-a-side-output

目前不确定原始流内是否也包含了再次被激活的窗口数据，待测试，从代码上看应该也包含在内。

已确认，原始流内窗口也会被重新激活一次

final OutputTag lateOutputTag = new OutputTag("late-data"){};

DataStream input = ...;

SingleOutputStreamOperator result = input

.keyBy()

.window()

.allowedLateness()

.sideOutputLateData(lateOutputTag)

.();

DataStream lateStream = result.getSideOutput(lateOutputTag);

4. 或者复写Triggers[https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/windows.html#triggers]

flink Window的Timestamps/Watermarks和allowedLateness的区别

标签：ast data- eth ring 部分 reg after example apach

原文地址：https://www.cnblogs.com/jiang-it/p/9280946.html

上一篇：用laravel dingo/api创建简单的api

下一篇：window下的redis安装配置

文章来自：搜素材网的编程语言模块，转载请注明文章出处。
文章标题：flink Window的Timestamps/Watermarks和allowedLateness的区别
文章链接：http://soscw.com/index.php/essay/69905.html

亲，登录后才可以留言！

flink Window的Timestamps/Watermarks和allowedLateness的区别

评论

热门文章

推荐文章

最新文章

置顶文章