0.文章系列连结

SLS机器学习介绍（01）：时序统计建模-云栖社群-阿里云SLS机器学习介绍（02）：时序聚类建模-云栖社群-阿里云SLS机器学习介绍（03）：时序异常检测建模-云栖社群-阿里云SLS机器学习介绍（04）：规则模式挖掘-云栖社群-阿里云SLS机器学习介绍（05）：时间序列预测-云栖社群-阿里云一眼看尽上亿日志-SLS智慧聚类(LogReduce)释出https://www.atatech.org/articles/125117

SLS机器学习最佳实战：时序异常检测和报警-云栖社群-阿里云SLS机器学习最佳实战：时序预测-云栖社群-阿里云SLS机器学习最佳实战：日志聚类+异常告警-云栖社群-阿里云

1. 高频检测场景

1.1 场景一

丛集中有N台机器，每台机器中有M个时序指标（CPU、内存、IO、流量等），若单独的针对每条时序曲线做建模，要手写太多重复的SQL，且对平台的计算消耗特别大。该如何更好的应用SQL实现上述的场景需求？

1.2 场景二

针对系统中的N条时序曲线进行异常检测后，有要如何快速知道：这其中有哪些时序曲线是有异常的呢？

2. 平台实验

2.1 解决一

针对场景一中描述的问题，我们给出如下的资料约束。其中资料在日志服务的LogStore中按照如下结构储存：

timestamp : unix_time_stamp

machine: name1

metricName: cpu0

metricValue: 50

---

timestamp : unix_time_stamp

machine: name1

metricName: cpu1

metricValue: 50

---

timestamp : unix_time_stamp

machine: name1

metricName: mem

metricValue: 50

---

timestamp : unix_time_stamp

machine: name2

metricName: mem

metricValue: 60

在上述的LogStore中我们先获取N个指标的时序资讯：

* | select timestamp - timestamp % 60 as time, machine, metricName, avg(metricValue) from log group by time, machine, metricName

现在我们针对上述结果做批量的时序异常检测算法，并得到N个指标的检测结果：

* |

select machine, metricName, ts_predicate_aram(time, value, 5, 1, 1) as res from (

select

timestamp - timestamp % 60 as time,

machine, metricName,

avg(metricValue) as value

from log group by time, machine, metricName )

group by machine, metricName

通过上述SQL，我们得到的结果的结构如下

| machine | metricName | [[time, src, pred, upper, lower, prob]] |

| ------- | ---------- | --------------------------------------- |

针对上述结果，我们利用矩阵转置操作，将结果转换成如下格式，具体的SQL如下：

* |

select

machine, metricName,

res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs

from ( select machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res from (

select

timestamp - timestamp % 60 as time,

machine, metricName,

avg(metricValue) as value

from log group by time, machine, metricName )

group by machine, metricName )

经过对二维阵列的转换后，我们将每行的内容拆分出来，得到符合预期的结果，具体格式如下：

| ------- | ---------- | -- | -- | ----- | ------ | ------ | ----- |

2.2 解决二

针对批量检测的结果，我们该如何快速的将存在特定异常的结果过滤筛选出来呢？日志服务平台提供了针对异常检测结果的过滤操作。

select ts_anomaly_filter(lineName, ts, ds, preds, probs, nWatch, anomalyType)

其中，针对anomalyType有如下说明：

0：表示关注全部异常1：表示关注上升沿异常-1：表示下降沿异常其中，针对nWatch有如下说明：

表示从实际时序资料的最后一个有效的观测点开始到最近nWatch个观测点的长度。具体使用如下所示：

* |

select

ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint))

from

( select

concat(machine, '-', metricName) as lineName,

res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs

from ( select machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res from (

select

timestamp - timestamp % 60 as time,

machine, metricName,

avg(metricValue) as value

from log group by time, machine, metricName )

group by machine, metricName ) )

通过上述结果，我们拿到的是一个Row型别的资料，我们可以使用如下方式，将具体的结构提炼出来：

* |

select

res.name, res.ts, res.ds, res.preds, res.probs

from

( select

ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint)) as res

from

( select

concat(machine, '-', metricName) as lineName,

res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs

from (

select

machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res

from (

select

timestamp - timestamp % 60 as time,

machine, metricName, avg(metricValue) as value

from log group by time, machine, metricName )

group by machine, metricName ) ) )

通过上述操作，就可以实现对批量异常检测的结果进行过滤处理操作，帮助使用者更好的批量设定告警。

3.硬广时间

3.1 日志进阶

这里是日志服务的各种功能的演示日志服务整体介绍，各种Demo

https://promotion.aliyun.com/ntms/act/logdoclist.html

更多日志进阶内容可以参考：日志服务学习路径。

https://help.aliyun.com/learn/learningpath/log.html

作者：悟冥

SLS机器学习最佳实战：批量时序异常检测

1. 高频检测场景

2. 平台实验

3.硬广时间

品牌选车