paint-brush
Anomaly Detection Strategies for IoT Sensorsby@sharmi1206
2,443 reads
2,443 reads

Anomaly Detection Strategies for IoT Sensors

by Sharmistha ChatterjeeMay 3rd, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Anomaly Detection Strategies for IoT Sensors: Anomaly detection strategies. This article is structured into the following areas: Use of IoT sensors, use of supervised, unsupervised and semi-supervised algorithms in training IoT data models. The interpretability of IoT data creates a path for:Defining ML models.Increasing the accuracy of predicting events. Raising alarms, among other things. Anomaly events may be ignored as sensors can malfunction causing to rise the number of false negatives in the system.

Coin Mentioned

Mention Thumbnail
featured image - Anomaly Detection Strategies for IoT Sensors
Sharmistha Chatterjee HackerNoon profile picture

Motivation - Algorithms for IoT sensors

IoT has become a massive body of work in recent years. The growing trend of Internet of Things in last few years is evident from the graph below.

The above figure shows --Growth of IOT market in Billion dollars, Source -https://softwarestrategiesblog.com/2018/01/01/roundup-of-internet-of-things-forecasts-and-market-estimates-2018/

IoT spans multiple domains and diverse industries. Thousands of new jobs have been created thanks to the advanced in IoT technology.

The interpretability of IoT data creates a path for:

  • Defining ML models
  • Increasing the accuracy of predicting events
  • Raising alarms, among other things

My aim in this article is to bring forward the growing importance of IoT devices in different fields, understand the discrepancy of data generated by different class of devices, and highlight techniques to make the data interpretable for all cases.

This article is structured into the following areas:

  • Use of IoT sensors
  • Sources if Anomaly
  • Uses of IoT sensors
  • Imbalanced data sampling strategies
  • Role of supervised , semi-supervised and unsupervised algorithms for anomaly events prediction
  • Using IoT sensors on human body during cycling and showing anomaly trends with unsupervised algorithm and semi-supervised algorithms
  • Use Case of an IoT Sensor Deployment

IoT sensors or sensor networks are deployed in a variety of places, such as:

  • Industry to understand the level of functionality and state of machineries
  • Human body for health monitoring and detecting human activity
  • Battlefield to detect any incoming threat
  • Environment to study pollution level, weather, climate changes or incoming disaster such as earthquake
  • Transportation system as a preventive mechanism to control accident and handle traffic congestion

Each of the sensors deployed in any of the above scenarios send signal information at regular intervals to convey the state of the surroundings. However sensors can behave inconsistently by transmitting different signal strength information during sudden breakdown of a machine, during onset of a heath disease, or being attacked by an enemy.

Moreover, sensors can suffer from their own internal problems. The ability to detect untoward events from sensor signals often go at a toss, as the majority class dominates the prediction, diminishing the accuracy of the minority class and raising Type I Error (false positives) and Type II Error (false negatives).

Certain external conditions can send incorrect signal strength from sensors which are predicted as an anomaly, thus causing to rise the number of false positives. Similarly, true anomaly events may be ignored as sensors can malfunction causing to rise the number of false negatives in the system.

Due to increase in a variety of IoT devices and use of supervised, un-supervised and semi-supervised algorithms in training IoT data models, it becomes important to understand and interpret the different classes of sensors and their acceptable range of signal information. This information can be obtained from labelled as well unlabelled data or using a combination of both.

Sources of Anomaly

Once a source of anomaly enters into the system, it causes to disrupt the system data widely, till the period preventive actions comes to play. The sources of imbalanced data comes from different attacks to individual IoT devices or a set of IoT devices in sensor networks. The sources of attacks can be any of the following:

Intrusion detection : As IoT devices gets connected to the internet and remains vulnerable to security-related attacks. Such attacks involve denial-of-service (DoS) attacks and distributed denial-of-service (DDoS) attacks which incur heavy damage to IoT services and smart environment applications.

Fraud detection : IoT networks remain susceptible to stealing credit card information, bank account details, or other sensitive information during logins or online payments.

Data Leakage: Sensitive information from databases, file servers and other information sources can leak to any external entity. Such leakage not only results in loss of information, but also creates threat where the attacker can destroy confidential information from the system. Use of proper encryption mechanisms can prevent such leaks.

Anomalies in the IoT system can be detected based on its type like : Point-wise, Contextual or Collective.

Point-wise anomalies from individual devices are detected by stochastic descriptors and used when the evolution of the series is not predictable.

Collective anomalies can be detected by typical time series patterns such as shapes, recurring patterns or residuals from multiple IoT devices.

Contextual anomalies are detected when previous type of information or context is taken into account such as day of the week.

Imbalanced Data Sampling Strategies

Supervised Classification problems (binary or multi-level) often encounter un-uniform distribution of data among different classes. The classes with high proportion of data (≥ 40%) are called majority classes while those with low proportion of data (≤ 10% ) are called minority classes.

Imbalance ratio can even vary from 1:100 to 1:1000 or 1:5000. Such distributions need to be dealt with different statistical sampling techniques to highlight the importance of minority class. As a result the value of the minority class increases, and that gives an added insight on the problem/class label that the minority class represents.

Resampling : Imbalanced IoT datasets can be processed with various sampling strategies like under-sampling or over-sampling. Though both of these measures aim to increase the accuracy of the minority class by either removing samples from the majority class (under-sampling) and / or adding more examples from the minority class (over-sampling), they sometimes result in overfitting or cause loss of information.

Random under-sampling: This mechanism is used to down-sample majority class by randomly removing observations from it. The main purpose of this sampling technique is to remove the dominance of the majority class and combine the samples of majority and minority class. This is achieved by re-sampling without replacement, where the number of samples of the majority class becomes equal to the minority class. One of the methods used for under-sampling is generating cluster-centroids to group or condense similar data.

Random over-sampling: This mechanism is used to up-sample minority class by randomly duplicating observations to strengthen its impact. Minority class is resampled with replacement and then the samples are combined with the majority class. Random over-sampling is also achieved by generating new synthetic data of minority class by interpolation, through popular techniques SMOTE and ADASYN.

ADASYN generates samples next to the original samples which are wrongly classified using a k-Nearest Neighbors classifier.

SMOTE connects existing minority instances and generates synthetic samples anywhere between existing minority instances and K closest minority class neighbors. Or in other words, the interpolated new points lie between the marginal outliers and inliers.

The above figure illustrates Regular Smote.

Regular SMOTE

SMOTE algorithm comes into 3 flavors. Regular SMOTE randomly generates samples without any restriction. Borderline-SMOTE offers two types of parameters “Borderline-1” and “Borderline-2” , where it classifies

Each sample to be in different class than each of its nearest neighbors.Half of nearest neighbors are in same class as the sample.All nearest neighbors of the sample are in the same class.

“Borderline-1” generates synthetic samples belonging to same class, while “Borderline-2” generates synthetic samples belonging to any other class.

SMOTE Borderline-1

SMOTE “Borderline-1” and “Borderline-2” works as follows:

The above figure illustrates Smote Borderline-1

M nearest neighbors (NNs) for every sample in the minority class are selected. Minority samples surrounded by majority samples, includes all m nearest neighbors that belong to the majority class. Such members are considered as noisy samples. Samples with at most m/2 NNs from the majority class are considered to be safe. Both safe and noisy neighbors are also excluded from the synthetic sample generation process.Samples for which the number of NNs from the majority class is greater than m<2 (but not m) are considered in danger (near the borderline) and used to generate synthetic samples.

As highlighted in figures, “Borderline-1” and “Borderline-2” , synthetic samples are created both from nearest minority neighbors as well as nearest majority neighbors. However, synthetic samples created from majority neighbors in case of “Borderline-2”, are created closer to the minority samples than when created from minority neighbors.

The above figure illustrates SMOTE Borderline-2

The third type of SMOTE known as SVM SMOTE uses parameter proximity ratio of different types of samples, or the classification boundary C of SVM classifier to generate samples. All varieties to SMOTE defines “m_neighbors” to determine how the sample is generated and whether it falls in either a. or b. or c.

ADASYN generates synthetic outlier samples corresponding to any data point, proportional to the number of samples which are not from the same class, in a given neighborhood.

Random under-sampling and over-sampling with imbalanced-learn: Oversampling techniques used to generate new synthetic data using SMOTE can be pipelined with under-sampling techniques to clean and condense the generated data. Such combination algorithms are SMOTE-Tomek and SMOTENN.

The above figure illustrates SMOTETomek

A Tomek link is a connection between a pair of neighboring samples that belongs to different classes. Under-sampling is employed by removing either tomek links or the majority class sample from the oversampled dataset.

In Edited Nearest Neighbor (ENN), the majority class instance which differs from majority of its k nearest neighbors is removed.

The above figure illustrates SMOTEENN

How to use over-Sampling Methods:

#Details on columns of dataframe sma_avg is explained below in the next code snippet
print('Original dataset shape %s' % Counter(y))
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(sma_avg.iloc[:,[0,2,4]].dropna().values, y)
print('Resampled dataset shape %s' % Counter(y_res))

sm = BorderlineSMOTE(random_state=42, kind ='borderline-1', k_neighbors=50)
X_res, y_res = sm.fit_resample(sma_avg.iloc[:, [0, 2, 4]].dropna().values, y)
print('Resampled dataset shape %s' % Counter(y_res))

sm = BorderlineSMOTE(random_state=42, kind='borderline-2', k_neighbors=50)
X_res, y_res = sm.fit_resample(sma_avg.iloc[:, [0, 2, 4]].dropna().values, y)
print('Resampled dataset shape %s' % Counter(y_res))

sm = SVMSMOTE(random_state=42, k_neighbors = 50)
X_res, y_res = sm.fit_resample(sma_avg.iloc[:, [0, 2, 4]].dropna().values, y)
print('Resampled dataset shape %s' % Counter(y_res))

sm = ADASYN(random_state=42)
X_res, y_res = sm.fit_resample(sma_avg.iloc[:, [0, 2, 4]].dropna().values, y)
print('Resampled dataset shape %s' % Counter(y_res))

Unsupervised Anomaly Detection Algorithms

Unsupervised anomaly detection algorithms are used with unlabelled data to determine anomaly in the system. This is also used in semi-supervised algorithms to label the data with anomaly score that can be combined with active learning to improve the accuracy of prediction. Most commonly used non-supervised algorithms are:

kNNo is a distance-based unsupervised outlier detection technique (or kNN). This metric is computed based on the distance to its kth nearest neighbor in the data set.LOF (Local Outlier Factor) is a density-based unsupervised outlier detection technique. This absolute metric measures the ratio of the average density of the k- nearest neighbors of a data point and the local density of the point itself.CBLOF (Cluster-Based Local Outlier Factoris a cluster-based unsupervised outlier detection technique.SSAD is a semi-supervised anomaly detection approach based on one-class SVM.SSDO is semi-supervised anomaly detection algorithm that uses Constrained Clustering along with Active learning.

Semi-Supervised Anomaly Detection

Semi-supervised algorithms have come in place due to certain limitations of the supervised and non-supervised algorithms. First is the unavailability of ground truth information, second is the limitations of data collection techniques from the system due to different types of anomalous behavior, third is to use less robust unsupervised anomaly detector models.

The importance of semi-supervised algorithms is realized when labels in supervised algorithms are noisy and imbalanced data overlaps between different classes. Incorrect labelling may further increase the imbalance ratio and hide the actual ratios after rebalancing it with oversampling or under-sampling. In such scenarios, semi-supervised approach of active learning comes to use to detect the most anomalous instances and filter them out.

The above figure illustrates SSDO (Semi-Supervised Detection of Outliers).

SSDO (Semi-Supervised Detection of Outliers), works on data F = {f0, . . . , fn.}, with each example fi represented in the standard feature-vector format. The algorithms works :

1. Assigns an anomaly score to each example based on an example’s position with respect to the data distribution found using constraint-based clustering.Anomaly Score Detector on Individual features.

The above figure illustrates Anomaly Score Detector on Individual features

2. Instances with known labels propagate their labels to other neighboring instances. Thus each instance’s anomaly score is updated based on nearby instances’ known labels.

3. The algorithm employs an active learning strategy to acquire more labels, and the process is repeated whenever additional data or labels are provided.

Semi-supervised active learning strategies also incorporates user feedback and feedback from external authentic sources for assigning correct labels on the learned decision boundaries. This algorithm further uses Constrained Clustering that identifies anomalies by using distance metrics and cluster size, the larger is the distance and smaller the size of the cluster, greater is the probability the point being anomalous. The distance metrics used are

Distance of the instance from the cluster centroid.Distance of a cluster centroid from other cluster centroids.Size of the cluster. It reduces to standard k-means when no constraints are provided.

Mathematically,

With this clustering each instance f ∈ F , has the following anomaly score:

The formula is taken from Source — https://link.springer.com/article/10.1007/s41019-018-0080-6

,where point deviation is used to measure the deviation of instance f from its cluster center, where the normalization coefficient is computed by the maximum deviation from the center over all data points within its cluster.

The formula is taken from Source — https://link.springer.com/article/10.1007/s41019-018-0080-6

For each instance f, the cluster centroid c(f) is used to compute the cluster deviation. The cluster deviation measures how it differs from the other cluster centroids weighted by the maximum inter-cluster distance.

The formula is taken from Source — https://link.springer.com/article/10.1007/s41019-018-0080-6

Each instance f belongs to a cluster whose size is given by the relative size of the largest found cluster:

The formula is taken from Source — https://link.springer.com/article/10.1007/s41019-018-0080-6

Constrained Clustering with IOT Sensor Data:

The following figure depicts Constrained Clustering built with 5 clusters, on 3 features (average signal strength received from IOT devices during cycling at an interval of 120 secs).

A moving average of sensor readings is computed , with a window size of 10 and fed to Constrained Clustering model. It has must-link and cannot-link constraints that constrain a pair of data points to belong to the same cluster (must-link) or different clusters (cannot-link) or that simply describe whether a pair of data points are similar or dissimilar.

#Moving average on Signal Strength of IOT sensors
df = pd.read_csv(file) #where the csv has following columns   #time,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23

df["ma_avg_rss12"]=df.avg_rss12.rolling(window=10).mean()
sma_avg = pd.DataFrame()
sma_avg["ma_avg_rss12"] = df["ma_avg_rss12"].to_frame().dropna()

df["ma_var_rss12"]=df.var_rss12.rolling(window=10).mean()
sma_avg["ma_var_rss12"] = df["ma_var_rss12"].to_frame().dropna()


df["ma_avg_rss13"] = df.avg_rss13.rolling(window=10).mean()
df["ma_var_rss13"] = df.var_rss13.rolling(window=10).mean()
sma_avg["ma_avg_rss13"] = df["ma_avg_rss13"].to_frame().dropna()
sma_avg["ma_var_rss13"] = df["ma_var_rss13"].to_frame().dropna()


df["ma_avg_rss23"] = df.avg_rss23.rolling(window=10).mean()
df["ma_var_rss23"] = df.var_rss23.rolling(window=10).mean()
sma_avg["ma_avg_rss23"] = df["ma_avg_rss23"].to_frame().dropna()
sma_avg["ma_var_rss23"] = df["ma_var_rss23"].to_frame().dropna()

#Constrained Clustering on 3 features of IoT
from copkmeans.cop_kmeans import cop_kmeans
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
feature_values = sma_avg.iloc[:,[0,2,4]].dropna().values
clusters, centers = cop_kmeans(dataset=feature_values, k=5, ml=must_link, cl=cannot_link)
cluster_centers = np.array(centers)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(feature_values[:, 0], feature_values[:, 1], feature_values[:, 2], c=clusters,  cmap='viridis',
         edgecolor='k')
ax.scatter(cluster_centers[:, 0], cluster_centers[:, 1], cluster_centers[:, 2],  marker='*', c='r', s=1000, label = 'Centroid', linewidths=15)
ax.autoscale(enable=True, axis='x', tight=True)

The above figure illustrates Constrained Clustering with 5 clusters with IoT sensors on human body

Research and Insights

One of the foremost ways to detect anomaly trends is to extract every unusual pattern occurring in the system, develop new algorithms and store it in the cloud.Algorithms need to be designed to track characteristics and structure of minority class, learn new incoming trends online by comparing it with past historic data.Develop new algorithms to discover multi-class imbalanced learning to understand relationships between different classes.

References:

  1. https://link.springer.com/article/10.1007/s13748-016-0094-0
  2. https://people.cs.kuleuven.be/~vincent.vercruyssen/publications/2018/ICDM_conference_manuscript.pdf
  3. https://efficientgov.com/blog/2017/04/12/column-what-is-needed-for-iot-security/
  4. https://www.forbes.com/sites/louiscolumbus/2017/12/10/2017-roundup-of-internet-of-things-forecasts/#12cdaaa21480