官术网_书友最值得收藏!

Bucketization

Bucketing input data is an important concept to understand in ML. Set with a key parameter at the job level called bucket_span, the input data from the datafeed (described next) is collected into mini batches for processing. Think of the bucket span as a pre-analysis aggregation interval—the window of time in which a portion of the data is aggregated over for the purposes of analysis. The shorter the duration of the bucket_span, the more granular the analysis, but also the higher the potential for noisy artifacts in the data.

The following graph shows the same dataset aggregated over three different intervals:

Aggregations of the same data over three different time intervals

Notice that the prominent anomalous spike seen in the version aggregated over the 5-minute interval becomes all but lost if the data is aggregated over a 60-minute interval due to the fact of the spike's short (<2 minute) duration. In fact, at this 60-minute interval, the spike doesn't even seem that anomalous anymore.

This is a practical consideration for the choice of bucket_span. On one hand, having a shorter aggregation period is helpful because it will increase the frequency of the analysis (and thus reduce the interval of notification on if there is something anomalous), but making it too short may highlight features in the data that you don't really care about. If the brief spike that's shown in the preceding data is a meaningful anomaly for you, then the 5-minute view of the data is sufficient. If, however, a perturbation of the data that's very brief seems like an unnecessary distraction, then avoid a low value of bucket_span.

Some additional practical considerations can be found on Elastic's blog: https://www.elastic.co/blog/explaining-the-bucket-span-in-machine-learning-for-elasticsearch.

主站蜘蛛池模板: 汤阴县| 南昌市| 民勤县| 安新县| 罗平县| 阳东县| 东至县| 九龙城区| 龙陵县| 汪清县| 临澧县| 哈尔滨市| 普定县| 肇东市| 长乐市| 荥经县| 抚松县| 鹤壁市| 宜君县| 新宾| 吉安县| 蓝山县| 大姚县| 沐川县| 牟定县| 马公市| 三穗县| 宜兰市| 禄劝| 平潭县| 孝义市| 天津市| 多伦县| 陇川县| 慈利县| 广灵县| 察哈| 祥云县| 定陶县| 和林格尔县| 库伦旗|