Dataflow is a distributed stream-based batch processing engine for Big Data applications. You can write tasks to be executed on a dataset. The task is then compiled into execution graphs and passed as JSON commands to corresponding worker servers to be executed.

You can add Dataflow module to your project by inserting dependency in pom.xml:

<dependency>
    <groupId>io.datakernel</groupId>
    <artifactId>datakernel-dataflow</artifactId>
    <version>3.0.0-SNAPSHOT</version>
</dependency>

This module on GitHub repository