Data Pipeline

StreamAnalytix provides a powerful visual tool-kit for developing real-time streaming analytics applications with minimal coding. With an interactive UI and drag & drop controls, you can easily manage and modify the processing data pipeline as per your use case requirements.

Data Pipeline is a structured flow of data, which collects, processes, and analyzes high-volume data to generate real-time insights.

Alternatively, you can say, Pipelines are applications—for the processing of data flows—created from components – Channels, Processors, and Emitters.

Types of Pipelines

A pipeline can be of type Spark or Storm depending upon which streaming processing engine you choose to build your application.

Storm Pipelines use core Apache Storm’s streaming APIs (Spouts and Bolts) to process tuple streams one at a time.

Spark Pipelines use Apache Spark Streaming APIs to process streaming data in micro-batches and enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

Pipeline Configuration

Data pipeline is a structured flow of data created using Channels, Processors, and Emitters, which collect, processes, and analyze high-volume data to generate real-time insights.

The general progression in a pipeline is:

  • Pre-run operations: any actions required before the pipeline can actually run, such as Group and Message configuration to define streaming data schema such as Workspaces, Groups and messages.
  • Data Ingestion: obtaining data from a source or sources by configuring Channels.
  • Data Transformation: manipulating the data acquired from the sources using Processors.
  • Data Analytics: applying math or analytical operations on the data of interest.
  • Data Publishing: saving the results in a data store or in a data visualization/BI tool for analysis using Emitters.
  • Post-run operations: getting notifications in real-time to take immediate actions on unwanted activities or interesting events.

Group and Messages

Group allows you to group similar fields from multiple messages together by mapping all the fields to a common field alias. Grouping similar types of messages together into a single message group improves efficiency and performance, which results into faster and efficient retrieval of data.

Message defines structure or schema of the streaming data.

Data Pipeline

StreamAnalytix provides a visual designer interface for building data pipelines.

Using the visual designer interface, you can:

  • Design, import and export real-time data pipeline.
  • Drag, drop and connect operators to create applications.
  • Monitor detailed metrics of each task and each instance.
  • Run PMML-based scripts in real-time on every incoming message.
  • Explore real-time data using interactive charts and dashboards.
  • Integrate with third party applications by publishing data to Kafka, WebSocket, or any other service.