In the last blog “Introducing Azure Anomaly Detector API”, I didn't provide enough details on one of the algorithms. As the algorithm paper was in the publishing process. The paper was accepted by KDD 2019 for oral presentation later, and this blog serves as an overview of the SR-CNN algorithm and for more details user can always read the paper. By the way, we have a 2-minute video here.
Before we go into details, let us revisit the problem definition of time series anomaly detection.
For any time-series anomaly detection system that is operating in production with a large scale, there are quite a few challenges, especially on the three areas below:
1. Lack of labels – As you can imagine, with signals generated from clients, services, and sensors every second, the huge amount of volume makes it infeasible to manually label the data.
2. Generalization – With real-world data, there are so many different types of time series with different characteristics, which make it hard to generalize and find a silver bullet to solve all the problems. Some examples can be found in the figure below.
3. Efficiency – For any online anomaly detection system, efficiency is one of the key challenges. The system is expected to have low compute cost and low latency for serving.
In the computer vision domain, there is this concept called “visual saliency detection”. Saliency is what “stands out” in a photo or scene, enabling our eye-brain to quickly focus on the most important regions, as shown in figures below.
Fig. Original image
Fig. The salient part of the original image
When we look at the time series chart, the most dominant and stand-out part is the anomalies. This similarity is where we got the inspiration and it turned out to generate great results.
Our solution then borrowed Spectral Residual (SR) from the visual saliency detection domain, then apply CNN on the results produced by the SR model
As you can see from the algorithm architecture, after SR transformation, the transformed result magnifies the anomalies and the resulting signal is easier to generalize, therefore it provides us a way to training CNN with synthetic data.
The spectral residual algorithm consists of three major steps:
- Fourier Transform to get the log amplitude spectrum
- Calculation of spectral residual
- Inverse Fourier Transform that transforms the sequence back to the spatial domain
- SR is unsupervised, efficient, and has good generality.
- The problem becomes much easier based on the output of the SR model.
- We can train CNN on the SR output using fully synthetic data with simple synthetic rule
- Randomly select several points in the saliency map and calculate the injection value to replace the original point.
We have performed online and offline experimentation, it outperformed state-of-the-arts consistently on open datasets and internal production datasets.