Understanding Shard Count in Kinesis Streams
Before diving into the impact of shard count modification, let’s briefly review what a shard is and how it functions within a Kinesis Stream. A shard is a uniquely identified sequence of data records in a stream. Each shard has a specified capacity for data ingestion and a corresponding number of read throughput units (RUs). The number of shards determines the overall throughput capacity of the stream.
Impact on Existing Data:
- Shard Splitting: When increasing the shard count, Kinesis Streams may automatically split existing shards to accommodate the new shard configuration. This process involves redistributing the data records among the newly created shards.
Example: Suppose we have a Kinesis Stream with 2 shards and decide to double the shard count to 4. The stream may split the existing shards into 4 smaller shards, redistributing the data records across the new shards.
- Data Redistribution: As part of shard splitting or merging, Kinesis Streams redistributes existing data records across the new shard configuration. This redistribution ensures that each shard contains a balanced distribution of data records.
Example: If a shard is split into two smaller shards, the data records from the original shard are evenly distributed between the two new shards based on a partition key.
Impact on Consumers:
- Rebalancing: When the shard count is modified, Kinesis Streams consumers (e.g., Kinesis Client Library applications or AWS Lambda functions) may need to rebalance their workload to accommodate the new shard configuration. This involves redistributing the processing of data records among the consumer instances.
Example: If a Kinesis Client Library application is consuming data from a Kinesis Stream with 2 shards and the shard count is increased to 4, the application may need to spawn additional worker threads or processes to handle the increased data throughput.
- Scaling: Modifying the shard count may necessitate scaling adjustments in downstream systems that consume data from the Kinesis Stream. Consumers must be able to scale their processing capacity to match the increased throughput of the stream.
Example: If a stream’s shard count is doubled, downstream data processing systems (e.g., Amazon Kinesis Data Analytics or Amazon Kinesis Data Firehose) must scale their resources accordingly to handle the higher data ingestion rate.