PROCESSORS

Processors are the built-in operators for processing the streaming data by performing various transformations and analytical operations.

For Spark pipelines, you can use the following processors:

 Processor

 Description

 Window

 Collects input streams over a time or range.

 Sort

 Sorts input streams’ values in ascending or descending order.

 Alert

 Generates alerts based on specified rules.

 Enrich

 Lookups into internal and external stores for data enrichment.

 Join

 Joins two or more input streams.

 Distinct

 Removes duplicates from an input stream.

 Group

 Groups input streams by a key.

 Union

 Joins two or more input streams.

 Filter

 Filters input stream values based on specified rules.

 SQL

 Run SQL queries on streaming data.

 Intersection

 Detects common and unique values from two or more input streams. 

 Aggregation

 Aggregates input stream values to perform min, max, count, sum, or avg.

 Associative    Aggregation

 Performs aggregation functions over batch data or group by input values.

 Cumulative Aggregation

 Performs aggregation functions on the input streams cumulatively.

 FlatMap

 Produces multiple outputs for one input data.

 MapToPair

 Returns Paired Dstream having dataset of (Key, Value) pairs.

 Scala Processor

 Implements your custom logic written in Scala in a pipeline.

 Persist

 Stores the Spark RDD data into memory.

 Repartition

 Reshuffles the data in the RDD to balance the data across partitions.

 TrasnformByKey

 Performs TrasnformByKey operation on the dataset.

 Take

 Performs take (n) operation on the dataset.

 Custom

 Implements your custom logic in a pipeline.

 

For Storm pipelines, you can use the following processors:

 Processor

 Description

 Timer

 Collects input streams over a time or range.

 Alert

 Generates alerts based on specified rules.

 Enrich

 Lookups into internal and external stores for data enrichment.

 Aggregation

 Performs min, max, count, sum, or avg operations on incoming message fields.

 Filter

 Filters input stream values based on specified rules.

 Custom

 Implements your custom logic in a pipeline.