Contents

Katz - Time Series Analysis Framework

Kats aims to provide the one-stop shop for time series analysis, from understanding the key statistics and detecting anomalies, to forecasting trends, feature extraction/embedding, multivariate analysis, etc.

Time series analysis is a fundamental domain in data science and machine learning, with massive applications in various sectors such as e-commerce, finance, capacity planning, supply chain management, medicine, weather, energy, astronomy, and many others.

Time Series Analysis

Time series analysis as a statistical technique is used to examine and model time-dependent data. Some common features of time series analysis tools include:

  • Time series decomposition: the ability to break down a time series into its component parts, such as trend, seasonality, and residuals
  • Forecasting: the ability to predict future values of a time series based on past data
  • Anomaly detection: the ability to identify unusual or unexpected behavior in a time series
  • Multivariate analysis: the ability to analyze multiple time series simultaneously, taking into account the relationships between them
  • Feature extraction/embedding: the ability to extract meaningful features from time series data or to represent time series data in a lower-dimensional space for further analysis.

These are just a few examples of the types of functionality that may be included in a time series analysis tool. Let’s see what Kats can provide us with.

Kats is a one-stop shop

Kats is a lightweight, easy-to-use, and generalizable framework for generic time series analysis, including forecasting, anomaly detection, multivariate analysis, and feature extraction/embedding.

Kats is the first comprehensive Python library for generic time series analysis, which provides both classical and advanced techniques to model time series data.

Kats connects various domains in time series analysis, where the users can explore the basic characteristics of their time series data, predict the future values, monitor the anomalies, and incorporate them into their ML models and pipelines.

What it does

Kats provides a set of algorithms and models for four domains in time series analysis: forecasting, detection, feature extraction and embedding, and multivariate analysis.

  • Forecasting: Kats provides a full set of tools for forecasting that includes 10+ individual forecasting models, ensembling, a self-supervised learning (meta-learning) model, backtesting, hyperparameter tuning, and empirical prediction intervals.

  • Detection: Kats supports functionalities to detect various patterns on time series data, including seasonalities, outlier, change point, and slow trend changes.

  • Feature extraction and embedding: The time series feature (TSFeature) extraction module in Kats can produce 65 features with clear statistical definitions, which can be incorporated in most machine learning (ML) models, such as classification and regression.

  • Useful utilities: Kats also provides a set of useful utilities, such as time series simulators.

Installation in Python

Kats is on PyPI, so you can use pip to install it.

1
2
pip install --upgrade pip
pip install kats

Forecasting Example

Using Prophet model to forecast the air_passengers data set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from kats.consts import TimeSeriesData
from kats.models.prophet import ProphetModel, ProphetParams

# take `air_passengers` data as an example
air_passengers_df = pd.read_csv("../kats/data/air_passengers.csv")

# convert to TimeSeriesData object
air_passengers_ts = TimeSeriesData(air_passengers_df)

# create a model param instance
# note that additive mode gives worse results
params = ProphetParams(seasonality_mode='multiplicative')

# create a prophet model instance
m = ProphetModel(air_passengers_ts, params)

# fit model simply by calling m.fit()
m.fit()

# make prediction for next 30 month
fcst = m.predict(steps=30, freq="MS")

Detection Examples

The following inferences can be obtained with Kats:

  • Outlier Detection: It detects an anomaly increase or decrease within the time series.
  • Change Point Detection: It detects sudden changes in the time series. There are 3 different algorithms in Kats for this process:
    • CUSUM Detection
    • Bayesian Online Change Point Detection (BOCPD)
    • Stat Sig Detection
  • Trend Change Detection: It detects the trend changes of the time series using the Mann-Kendall Detection algorithm.

Outlier Detection

A minimum of 24 lines of data is required for Outlier detection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from kats.detectors.outlier import OutlierDetector

fake_df = df[df["Product"] == "Product E"].drop(["VisitsCount", "Product"], axis=1)
fake_df = fake_df.append(pd.DataFrame({"time":["2021-12-31"], "SalesCount":[round(fake_df["SalesCount"][-3:].mean())]})).reset_index(drop=True)
fake_df["time"] = pd.to_datetime(fake_df["time"])

outlier_ts = TimeSeriesData(fake_df)
ts_outlierDetection = OutlierDetector(outlier_ts, 'multiplicative')
ts_outlierDetection.detector()

ts_outlierDetection.outliers[0]

Outliers detected with Kats can also be cleaned with the help of Kats. Kats offers 2 methods for this:

  • No Interpolation: Fills outliers with NaN without applying the interpolation operation.
  • With Interpolation: Fills out outliers by applying linear interpolation.

Change Point Detection

With Kats it is possible to detect the change points in the time series. There are 3 different algorithms in Kats for this process:

  • CUSUMDetector
  • BOCPDetector
  • RobustStatDetector

Using CUSUM detection algorithm on simulated data set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from kats.consts import TimeSeriesData
from kats.detectors.cusum_detection import CUSUMDetector

# simulate time series with increase
np.random.seed(10)
df_increase = pd.DataFrame(
    {
        'time': pd.date_range('2019-01-01', '2019-03-01'),
        'increase':np.concatenate([np.random.normal(1,0.2,30), np.random.normal(2,0.2,30)]),
    }
)

# convert to TimeSeriesData object
timeseries = TimeSeriesData(df_increase)

# run detector and find change points
change_points = CUSUMDetector(timeseries).detector()

Trend Change Detection

It is also possible to detect the trend direction of a series with Kats. Kats uses the MKDetector algorithm for this process. The basis of this algorithm is the Mann-Kendall Test, which is a non-parametric test.

1
2
3
4
5
from kats.detectors.trend_mk import MKDetector

detector = MKDetector(data=cpd_ts, threshold=.8)
detected_time_points = detector.detector(direction='up', window_size=10)
detector.plot(detected_time_points)

Conclusion

Kats is a time series analysis tool that uses a metalearning method to identify the most appropriate model and corresponding parameters for a given time series. It does this by using metadata obtained with TSFeatures and applying the Random Forest algorithm to determine the best model based on this metadata. This feature of Kats allows users to create their own automatic machine learning (autoML) tool.