Building Modern Data Streaming Architecture on AWS

Sep 23, 2022

6 MIN READ

Share this article

What’s new in AWS Kinesis?

Amazon Kinesis Data firehose now also supports dynamic partitioning, where it continuously groups in transit data using dynamically or statically defined data keys and delivers the data into the individual amazon S3 prefixes by key. This reduces time to insight by minutes, it also reduces the cost, and overall simplifies the architecture.

Working with Streaming Data

For working with streaming data using Apache Flink, we also have AWS kinesis data analytics service, as with Amazon Kinesis Datastream like Kinesis Data Firehose, this service is also a fully managed, serverless, Apache Flink environment to perform stateful processing with sub-second latency. It integrates with several AWS services, supports custom connectors, and has a notebook interface called KDA Studio (Kinesis Data Analytics Studio), a managed Apache Zeppelin notebook, to allow you to interact with streaming data.

Similar to Kinesis Data Analytics for Apache Flink, Amazon managed stream for Apache Kafka or MSK is a fully managed service for running highly available, event-driven, Apache Kafka applications.

Amazon MSK operates, maintains, and scales Apache Kafka clusters, provides enterprises with security features and supports Kafka connect, and also has multiple built-in AWS integrations.

Architecture for Real-Time Reporting

Here we derive insights from input data that are coming from diverse sources or generating near real-time dashboards. With the below architecture what you are seeing is, that you can stream near real-time data from source systems such as social media applications using Amazon MSK, Lambda, and Kinesis Data Firehose into Amazon S3, you can then use AWS glue for Data Processing and Load, Transform data into Amazon redshift using an AWS glue developed endpoint such as an Amazon Sagemaker Notebook. Once data is in Amazon Redshift, you can create a customer-centric business report using Amazon Quick sight.

This architecture helps in identifying an act on deviation from the forecasted data in near real-time. In the below architecture, data is collected from multiple sources using Kinesis Data Stream, it is then persisted in Amazon S3 by Kenisis Data firehose, initial data aggregation, and preparation is done using Amazon Athena and then stored in the AWS S3. Amazon Sagemaker is used to train a forecasting model and create behavioral predictions. As new data arrives it is aggregated and prepared in real-time by Kinesis Data Analytics. The results are compared to the previously generated forecast, Amazon Cloud Watch is used to store the forecast and actual value as metrics, and when actual value deviates and cloud watch alarms trigger an incident in AWS Systems Manager, Incident manager.

Real-time reporting

Architecture for Monitoring Streaming Data with Machine Learning

Monitoring streaming data

Conclusion

The key considerations, when working with AWS Streaming Services and Streaming Applications. When you need to choose a particular service or build a solution

Usage Patterns‍

Kinesis Data Stream is for collecting and storing data, and Kinesis Data Firehose is primarily for Loading and Transforming Data Streams into AWS Data Stores and Several Saas, endpoints. Kinesis Data Analytics essentially analyzes streaming data.

Throughput

Kinesis streams scale with shards and support up to 1Mb payloads, as mentioned earlier, you have a provisioning mode and an on-demand mode for scaling shard capacity. Kinesis firehose automatically scales to match the throughput of your data. The maximum streaming throughput a single Kinesis Data Analytics for SQL application can process is approximately 100 Mbps.

Latency

‍Kinesis Streams allows data delivery from producers to consumers in less than 70 milliseconds.

Ease of use and cost

‍All the streaming services on AWS are managed and serverless, including Amazon MSK serverless, this allows for ease of use by abstracting away the infrastructure management overhead and of course, considering the pricing model of each service for your unique use case.

‍