Spell: Online Streaming Parsing of Large Unstructured System Logs
IEEE Transactions on Knowledge and Data Engineering (), October 2018.
© Copyright 2018 IEEE
System event logs have been frequently used as a valuable resource in data-driven approaches to enhance system health and stability. A typical procedure in system log analytics is to first parse unstructured logs to structured data, and then apply data mining and machine learning techniques and/or build workflow models from the resulting structured data. Previous work on parsing system event logs focused on offline, batch processing of raw log files. But increasingly, applications demand online monitoring and processing. As a result, a streaming method to parse unstructured logs is needed. We propose an online streaming method Spell, which utilizes a longest common subsequence based approach, to parse system event logs. We show how to dynamically extract log patterns from incoming logs and how to maintain a set of discovered message types in streaming fashion. Enhancement to find more accurate message types is also proposed. We also propose and evaluate a method to automatically discover semantic meanings for parameter fields identified by Spell. We compare Spell against state-of-the-art methods to extract patterns from system event logs on large real data. The results demonstrate that, compared with other log parsing alternatives, Spell shows its superiority in terms of both efficiency and effectiveness.