ShadowSync: Latency Long Tail Caused by Hidden Synchronization in Real-Time LSM-Tree Based Stream Processing Systems

Shungeng Zhang, Qingyang Wang, Yasuhiko Kanemasa, Julius Michaelis, Jianshu Liu, Calton Pu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Mission-critical, real-time, continuous stream processing applications that interact with the real world have stringent latency requirements. For example, e-commerce websites like Amazon improve their marketing strategy by performing real-time advertising based on customers' behavior, and latency long tail can cause significant revenue loss. Recent work [39] showed a positive correlation between latency long tail and variance in the execution time of synchronous invocation chains (critical paths) in microservices benchmarks. This paper shows that asynchronous, very short but intense resource demands (called millibottlenecks) outside of critical paths can also cause significant latency long tail.

Using a traffic analysis stream processing application benchmark, we evaluated the impact of asynchronous workload bursts generated by a multi-layer data structure called LSM-tree (log-structured merge-tree) for continuous checkpointing. Outside of the critical path, LSM-tree relies on maintenance operations (e.g., flushing/compaction during a checkpoint) to reorganize LSM-tree in memory and on disk to keep data access latency short. Although asynchronous, such recurrent maintenance operations can cause frequent millibottlenecks, particularly when they overlap, a problem we call ShadowSync . For scheduling and statistical reasons, significant latency long tail can arise from ShadowSync caused by asynchronous recurrent operations. Our experimental results show that with typical settings of benchmark components such as RocksDB, ShadowSync can prolong request message latency by up to 2 seconds. We show effective mitigation methods can alleviate both scheduled and statistical ShadowSync reducing the latency long tail to less than 20% of the original at the 99.9 th percentile.
Original languageAmerican English
Title of host publicationMiddleware '22: Proceedings of the 23rd ACM/IFIP International Middleware Conference
DOIs
StatePublished - 2022
Externally publishedYes

Keywords

  • performance instability
  • stream processing
  • synchronization

EGS Disciplines

  • Computer Sciences

Fingerprint

Dive into the research topics of 'ShadowSync: Latency Long Tail Caused by Hidden Synchronization in Real-Time LSM-Tree Based Stream Processing Systems'. Together they form a unique fingerprint.

Cite this