SlideShare a Scribd company logo
1 of 42
Download to read offline
R and Time
Series Data

Time Series
Decomposi-
                Time Series Analysis and Mining with R
tion

Time Series
Forecasting

Time Series
                              Yanchang Zhao
Clustering

Time Series                    RDataMining.com
Classification
                         http://www.rdatamining.com/
R Functions &
Packages for
Time Series
                               18 July 2011
Conclusions




                                                         1/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             2/42
R



R and Time
Series Data
                    a free software environment for statistical computing and
Time Series
Decomposi-          graphics
tion

Time Series
                    runs on Windows, Linux and MacOS
Forecasting
                    widely used in academia and research, as well as industrial
Time Series
Clustering          applications
Time Series
Classification
                    over 3,000 packages
R Functions &       CRAN Task View: Time Series Analysis
Packages for
Time Series         http://cran.r-project.org/web/views/TimeSeries.html
Conclusions




                                                                                3/42
Time Series Data in R



R and Time
Series Data

Time Series
Decomposi-
                    class ts
tion
                    represents data which has been sampled at equispaced
Time Series
Forecasting         points in time
Time Series
Clustering          frequency=7: a weekly series
Time Series         frequency=12: a monthly series
Classification

R Functions &       frequency=4: a quarterly series
Packages for
Time Series

Conclusions




                                                                           4/42
Time Series Data in R

                > a <- ts(1:20, frequency=12, start=c(2011,3))
                > print(a)
R and Time
Series Data          Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Time Series     2011           1   2   3   4   5   6   7   8   9
Decomposi-
tion            2012 11 12 13 14 15 16 17 18 19 20
Time Series          Dec
Forecasting

Time Series
                2011 10
Clustering      2012
Time Series
Classification   > str(a)
R Functions &    Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8
Packages for
Time Series
                > attributes(a)
Conclusions
                $tsp
                [1] 2011.167 2012.750   12.000

                $class                                           5/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             6/42
What is Time Series Decomposition



R and Time
Series Data

Time Series     To decompose a time series into components:
Decomposi-
tion
                    Trend component: long term trend
Time Series
Forecasting         Seasonal component: seasonal variation
Time Series
Clustering          Cyclical component: repeated but non-periodic
Time Series         fluctuations
Classification

R Functions &
                    Irregular component: the residuals
Packages for
Time Series

Conclusions




                                                                    7/42
Data AirPassengers

                Data AirPassengers: Monthly totals of Box Jenkins
                international airline passengers, 1949 to 1960. It has
R and Time      144(=12×12) values.
Series Data

Time Series     > plot(AirPassengers)
Decomposi-
tion

Time Series
Forecasting
                                 600




Time Series
Clustering
                                 500




Time Series
Classification
                 AirPassengers

                                 400




R Functions &
Packages for
Time Series
                                 300




Conclusions
                                 200
                                 100




                                       1950   1952   1954   1956   1958   1960
                                                                                 8/42
Decomposition

                >           apts <- ts(AirPassengers, frequency = 12)
                >           f <- decompose(apts)
R and Time
Series Data     >           # seasonal figures
Time Series     >           plot(f$figure,type="b")
Decomposi-
tion

Time Series
Forecasting                                               q   q
                            60




Time Series
Clustering
                            40




                                                      q


Time Series
                            20




Classification                                                     q
                 f$figure




R Functions &
                            0




                                          q
                                                  q
Packages for                                  q

Time Series
                            −20




                                                                      q
                                  q
                                                                               q
Conclusions                           q
                            −40




                                                                           q



                                      2       4       6       8       10       12

                                                      Index


                                                                                    9/42
Decomposition

                > plot(f)
                                         Decomposition of additive time series
R and Time
Series Data
                            500
                 observed




Time Series
Decomposi-
                            300




tion
                              100




Time Series
                            450




Forecasting
                            350
                 trend




Time Series
                            250




Clustering
                            150




Time Series
Classification
                            40
                 seasonal




R Functions &
                            0




Packages for
                            60 −40




Time Series

Conclusions
                 random
                            0 20
                            −40




                                     2        4        6          8    10        12

                                                           Time
                                                                                      10/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             11/42
Time Series Forecasting



R and Time
Series Data

Time Series
Decomposi-          To forecast future events based on known past data
tion

Time Series
                    E.g., to predict the opening price of a stock based on its
Forecasting
                    past performance
Time Series
Clustering          Popular models
Time Series             Autoregressive moving average (ARMA)
Classification
                        Autoregressive integrated moving average (ARIMA)
R Functions &
Packages for
Time Series

Conclusions




                                                                                 12/42
Forecasting



R and Time
                >   # build an ARIMA model
Series Data     >   fit <- arima(AirPassengers, order=c(1,0,0),
Time Series
Decomposi-
                +          list(order=c(2,1,0), period=12))
tion
                >   fore <- predict(fit, n.ahead=24)
Time Series
Forecasting
                >   # error bounds at 95% confidence level
Time Series     >   U <- fore$pred + 2*fore$se
Clustering
                >   L <- fore$pred - 2*fore$se
Time Series
Classification   >   ts.plot(AirPassengers, fore$pred, U, L,
R Functions &   +           col=c(1,2,4,4), lty = c(1,1,2,2))
Packages for
Time Series     >   legend("topleft", col=c(1,2,4), lty=c(1,1,2),
Conclusions     +          c("Actual", "Forecast",
                +            "Error Bounds (95% Confidence)"))


                                                                    13/42
Forecasting

                              Actual
                              Forecast
R and Time          700       Error Bounds (95% Confidence)
Series Data
                    600


Time Series
Decomposi-
tion
                    500




Time Series
Forecasting
                    400




Time Series
Clustering

Time Series
                    300




Classification

R Functions &
Packages for
                    200




Time Series

Conclusions
                    100




                          1950    1952   1954    1956    1958   1960   1962

                                                 Time                         14/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             15/42
Time Series Clustering



R and Time
Series Data         To partition time series data into groups based on
Time Series         similarity or distance, so that time series in the same
Decomposi-
tion                cluster are similar
Time Series         Measure of distance/dissimilarity
Forecasting

Time Series
                        Euclidean distance
Clustering              Manhattan distance
Time Series             Maximum norm
Classification
                        Hamming distance
R Functions &
Packages for
                        The angle between two vectors (inner product)
Time Series             Dynamic Time Warping (DTW) distance
Conclusions             ...




                                                                              16/42
Dynamic Time Warping (DTW)

                DTW finds optimal alignment between two time series.
                > library(dtw)
R and Time
Series Data
                > idx <- seq(0, 2*pi, len=100)
Time Series
                > a <- sin(idx) + runif(100)/10
Decomposi-
tion
                > b <- cos(idx)
Time Series
                > align <- dtw(a, b, step=asymmetricP1, keep=T)
Forecasting     > dtwPlotTwoWay(align)
Time Series
Clustering
                                            1.0




Time Series
Classification
                                            0.5




R Functions &
                              Query value




Packages for
Time Series
                                            0.0




Conclusions
                                            −0.5
                                            −1.0




                                                   0   20   40           60   80   100

                                                                 Index                   17/42
Synthetic Control Chart Time Series


                    The dataset contains 600 examples of control charts
R and Time
Series Data         synthetically generated by the process in Alcock and
Time Series         Manolopoulos (1999).
Decomposi-
tion                Each control chart is a time series with 60 values.
Time Series
Forecasting
                    Six classes:
Time Series              1-100 Normal
Clustering               101-200 Cyclic
Time Series
Classification
                         201-300 Increasing trend
                         301-400 Decreasing trend
R Functions &
Packages for             401-500 Upward shift
Time Series
                         501-600 Downward shift
Conclusions
                    http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_
                    control.html


                                                                                    18/42
Synthetic Control Chart Time Series



R and Time
Series Data
                >   # read data into R
Time Series
Decomposi-      >   # sep="": the separator is white space, i.e., one
tion
                >   # or more spaces, tabs, newlines or carriage returns
Time Series
Forecasting     >   sc <- read.table("synthetic_control.data",
Time Series     +                    header=F, sep="")
Clustering
                >   # show one sample from each class
Time Series
Classification   >   idx <- c(1,101,201,301,401,501)
R Functions &   >   sample1 <- t(sc[idx,])
Packages for
Time Series     >   plot.ts(sample1, main="")
Conclusions




                                                                   19/42
Six Classes




                           36




                                                                             30
                           34
                           32




                                                                             20
                                                                       301
                     1
                           30
R and Time




                                                                             10
Series Data
                           28
                           26
Time Series




                                                                             0
Decomposi-
                           45 24




                                                                             45
tion




                                                                             40
Time Series
                           35




Forecasting
                     101




                                                                       401
                                                                             35
Time Series
                           25




Clustering




                                                                             30
                           15




Time Series




                                                                             35 25
Classification
                           45




R Functions &




                                                                             30
Packages for
                           40




                                                                             25
                     201




                                                                       501
Time Series
                           35




                                                                             20
Conclusions
                                                                             15
                           30




                                                                             10
                           25




                                   0   10   20    30    40   50   60                 0   10   20    30    40   50   60

                                                 Time                                              Time
                                                                                                                         20/42
Hierarchical Clustering with Euclidean distance



R and Time
Series Data     >   # sample n cases from every class
Time Series     >   n <- 10
Decomposi-
tion            >   s <- sample(1:100, n)
Time Series     >   idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
Forecasting

Time Series
                >   sample2 <- sc[idx,]
Clustering      >   observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),
Time Series
Classification
                +                       rep(4,n), rep(5,n), rep(6,n))
R Functions &
                >   # hierarchical clustering with Euclidean distance
Packages for    >   hc <- hclust(dist(sample2), method="ave")
Time Series

Conclusions
                >   plot(hc, labels=observedLabels, main="")




                                                                   21/42
Hierarchical Clustering with Euclidean distance



R and Time                140
                          120
Series Data

Time Series
Decomposi-
                          100



tion

Time Series
Forecasting
                 Height

                          80




Time Series
Clustering

Time Series
                          60




Classification




                                                      22
R Functions &
Packages for
                                             5
                                            6




Time Series
                                            6
                          40




                                        2 2
                                        2 2
Conclusions
                                          4




                                     5 2
                                         2
                                       66




                                    5 5
                                     66
                                       6
                                     64




                                    55
                                  3 5




                                      2
                                      2
                                  55
                                   44


                                  33




                                    1
                                  44
                                   4



                                33
                                   4
                                   4




                                   3
                                   3
                          20




                                11
                                  5
                                  3
                                  3
                                  4




                                11
                                 6
                                 6




                                11
                                3




                                1


                                1
                                1
                                                                  22/42
Hierarchical Clustering with Euclidean distance



R and Time      > # cut tree to get 8 clusters
Series Data
                > memb <- cutree(hc, k=8)
Time Series
Decomposi-      > table(observedLabels, memb)
tion

Time Series                   memb
Forecasting
                observedLabels 1     2   3   4   5   6 7 8
Time Series
Clustering                   1 10    0   0   0   0   0 0 0
Time Series                  2 0     3   1   1   3   2 0 0
Classification

R Functions &
                             3 0     0   0   0   0   0 10 0
Packages for
Time Series
                             4 0     0   0   0   0   0 0 10
Conclusions
                             5 0     0   0   0   0   0 10 0
                             6 0     0   0   0   0   0 0 10


                                                                  23/42
Hierarchical Clustering with DTW Distance


                >   myDist <- dist(sample2, method="DTW")
R and Time
                >   hc <- hclust(myDist, method="average")
Series Data
                >   plot(hc, labels=observedLabels, main="")
Time Series
Decomposi-      >   # cut tree to get 8 clusters
tion
                >   memb <- cutree(hc, k=8)
Time Series
Forecasting     >   table(observedLabels, memb)
Time Series
Clustering                    memb
Time Series     observedLabels 1     2   3   4   5   6 7 8
Classification
                             1 10    0   0   0   0   0 0 0
R Functions &
Packages for                 2 0     4   3   2   1   0 0 0
Time Series
                             3 0     0   0   0   0   6 4 0
Conclusions
                             4 0     0   0   0   0   0 0 10
                             5 0     0   0   0   0   0 10 0
                             6 0     0   0   0   0   0 0 10
                                                               24/42
Hierarchical Clustering with DTW Distance



R and Time                1000
                          800
Series Data

Time Series
Decomposi-
tion
                          600




Time Series
Forecasting
                 Height




Time Series
Clustering
                          400




Time Series
Classification

R Functions &
Packages for
Time Series
                          200




Conclusions




                                          2
                                      2 2

                                        22
                                         2
                                       22
                                       2
                                      2
                                  4 6
                                    66
                                   33
                                  35
                                  35




                                  46
                                  33




                                   6
                                 55
                                   3




                                   4
                                   6
                                   6
                                 35




                                   1
                                   1
                                  4

                                  6
                                  6
                                  6
                                 55
                                  5
                                 33




                                  4
                                  4
                                  4
                                  4




                                  1
                                  5
                                  5




                                  1
                                  1
                                 1
                                 1
                                 4
                                 4




                                 1
                                 1
                                 1
                          0




                                                            25/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             26/42
Time Series Classification

                Time Series Classification
                    To build a classification model based on labelled time
R and Time
Series Data         series
Time Series
Decomposi-
                    and then use the model to predict the lable of unlabelled
tion                time series
Time Series
Forecasting     Feature Extraction
Time Series
Clustering
                    Singular Value Decomposition (SVD)
Time Series         Discrete Fourier Transform (DFT)
Classification

R Functions &       Discrete Wavelet Transform (DWT)
Packages for
Time Series         Piecewise Aggregate Approximation (PAA)
Conclusions
                    Perpetually Important Points (PIP)
                    Piecewise Linear Representation
                    Symbolic Representation
                                                                                27/42
Decision Tree (ctree)



R and Time
Series Data     ctree from package party
Time Series
Decomposi-      >   classId <- c(rep("1",100), rep("2",100),
tion
                +                rep("3",100), rep("4",100),
Time Series
Forecasting     +                rep("5",100), rep("6",100))
Time Series     >   newSc <- data.frame(cbind(classId, sc))
Clustering
                >   library(party)
Time Series
Classification   >   ct <- ctree(classId ~ ., data=newSc,
R Functions &   +               controls = ctree_control(minsplit=20,
Packages for
Time Series     +                          minbucket=5, maxdepth=5))
Conclusions




                                                                   28/42
Decision Tree

                > pClassId <- predict(ct)
                > table(classId, pClassId)
R and Time
Series Data
                       pClassId
Time Series
Decomposi-      classId   1   2    3   4    5   6
tion
                      1 100   0    0   0    0   0
Time Series
Forecasting           2   1 97     2   0    0   0
Time Series
Clustering
                      3   0   0   99   0    1   0
Time Series
                      4   0   0    0 100    0   0
Classification         5   4   0    8   0   88   0
R Functions &
Packages for
                      6   0   3    0 90     0   7
Time Series

Conclusions
                > # accuracy
                > (sum(classId==pClassId)) / nrow(sc)
                [1] 0.8183333

                                                        29/42
DWT (Discrete Wavelet Transform)

                    Wavelet transform provides a multi-resolution
                    representation using wavelets.
R and Time
Series Data         Haar Wavelet Transform – the simplest DWT
Time Series
Decomposi-
                    http://dmr.ath.cx/gfx/haar/
tion

Time Series
Forecasting

Time Series
Clustering

Time Series
Classification

R Functions &
Packages for
Time Series

Conclusions


                    DFT (Discrete Fourier Transform): another popular
                    feature extraction technique
                                                                        30/42
DWT (Discrete Wavelet Transform)



R and Time      >   # extract DWT (with Haar filter) coefficients
Series Data

Time Series
                >   library(wavelets)
Decomposi-
tion
                >   wtData <- NULL
Time Series
                >   for (i in 1:nrow(sc)) {
Forecasting     +     a <- t(sc[i,])
Time Series
Clustering
                +     wt <- dwt(a, filter="haar", boundary="periodic")
Time Series     +     wtData <- rbind(wtData,
Classification
                +         unlist(c(wt@W, wt@V[[wt@level]])))
R Functions &
Packages for    +   }
Time Series
                >   wtData <- as.data.frame(wtData)
Conclusions
                >   wtSc <- data.frame(cbind(classId, wtData))



                                                                   31/42
Decision Tree with DWT

                > ct <- ctree(classId ~ ., data=wtSc, controls =
                +             ctree_control(minsplit=20,
R and Time
Series Data
                +             minbucket=5, maxdepth=5))
Time Series     > pClassId <- predict(ct)
Decomposi-
tion            > table(classId, pClassId)
Time Series            pClassId
Forecasting

Time Series
                classId 1 2 3 4 5 6
Clustering            1 98 2 0 0 0 0
Time Series
Classification
                      2 1 99 0 0 0 0
R Functions &
                      3 0 0 81 0 19 0
Packages for          4 0 0 0 74 0 26
Time Series

Conclusions
                      5 0 0 16 0 84 0
                      6 0 0 0 3 0 97
                > (sum(classId==pClassId)) / nrow(wtSc)
                [1] 0.8883333
                                                                   32/42
> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))
                                                                                             1
                                                                                         V57


                                                                   ≤ 117                               > 117
                                           2                                                                         9
                                       W43                                                                       V57


                                ≤ −4                 > −4                                                       ≤ 140                                > 140
                    3                                               6                                                                                                                    11
                   W5                                              W31                                                                                                                   V57


                                                                                                                                                                ≤ 178                                > 178
                                                                                                                                          12                                                                     17
                                                                                                                                         W22                                                                    W31


            ≤ −8         > −8                               ≤ −6           > −6                                                         ≤ −6         > −6                                                     ≤ −15         > −15
                                                                                                                                                                    14                                                                   19
                                                                                                                                                                W31                                                                      W43


                                                                                                                                                            ≤ −13         > −13                                                     ≤3          >3

  Node 4 (n = 68)         Node 5 (n = 6)         Node 7 (n = 9)            Node 8 (n = 86)        Node 10 (n = 31)        Node 13 (n = 80)      Node 15 (n = 9)           Node 16 (n = 99)      Node 18 (n = 12)      Node 20 (n = 103)        Node 21 (n = 97)
  1                      1                      1                          1                      1                       1                     1                         1                     1                      1                       1
 0.8                    0.8                    0.8                      0.8                      0.8                     0.8                   0.8                       0.8                   0.8                    0.8                     0.8
 0.6                    0.6                    0.6                      0.6                      0.6                     0.6                   0.6                       0.6                   0.6                    0.6                     0.6
 0.4                    0.4                    0.4                      0.4                      0.4                     0.4                   0.4                       0.4                   0.4                    0.4                     0.4
 0.2                    0.2                    0.2                      0.2                      0.2                     0.2                   0.2                       0.2                   0.2                    0.2                     0.2
  0                      0                      0                          0                      0                       0                     0                         0                     0                      0                       0
       123456                 123456                 123456                    123456                  123456                  123456                123456                    123456                123456                 123456                   123456
k-NN Classification

                     find the k nearest neighbours of a new instance
R and Time
                     label it by majority voting
Series Data
                     needs an efficient indexing structure for large datasets
Time Series
Decomposi-      >   k <- 20
tion

Time Series
                >   newTS <- sc[501,] + runif(100)*15
Forecasting
                >   distances <- dist(newTS, sc, method="DTW")
Time Series
Clustering      >   s <- sort(as.vector(distances), index.return=TRUE)
Time Series     >   # class IDs of k nearest neighbours
Classification
                >   table(classId[s$ix[1:k]])
R Functions &
Packages for
Time Series
                 4 6
Conclusions      3 17

                Results of Majority Voting
                Label of newTS ← class 6
                                                                              34/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             35/42
Functions - Construction, Plot & Smoothing



R and Time      Construction
Series Data

Time Series            ts() create time-series objects (stats)
Decomposi-
tion
                Plot
Time Series
Forecasting
                       plot.ts() plot time-series objects (stats)
Time Series
Clustering
                Smoothing & Filtering
Time Series
Classification
                       smoothts() time series smoothing (ast)
R Functions &
Packages for
Time Series
                       sfilter() remove seasonal fluctuation using moving
Conclusions
                       average (ast)




                                                                           36/42
Functions - Decomposition & Forecasting

                Decomposition
                    decomp() time series decomposition by square-root filter
R and Time
Series Data
                    (timsac)
Time Series         decompose() classical seasonal decomposition by moving
Decomposi-
tion                averages (stats)
Time Series         stl() seasonal decomposition of time series by loess
Forecasting
                    (stats)
Time Series
Clustering          tsr() time series decomposition (ast)
Time Series         ardec() time series autoregressive decomposition
Classification
                    (ArDec)
R Functions &
Packages for
Time Series
                Forecasting
Conclusions         arima() fit an ARIMA model to a univariate time series
                    (stats)
                    predict.Arima() forecast from models fitted by arima
                    (stats)
                                                                          37/42
Packages

                Packages
                    timsac time series analysis and control program
R and Time
Series Data         ast time series analysis
Time Series
Decomposi-
                    ArDec time series autoregressive-based decomposition
tion
                    ares a toolbox for time series analyses using generalized
Time Series
Forecasting         additive models
Time Series
Clustering
                    dse tools for multivariate, linear, time-invariant, time
Time Series
                    series models
Classification
                    forecast displaying and analysing univariate time series
R Functions &
Packages for        forecasts
Time Series
                    dtw Dynamic Time Warping – find optimal alignment
Conclusions
                    between two time series
                    wavelets wavelet filters, wavelet transforms and
                    multiresolution analyses
                                                                                38/42
Online Resources

                    An R Time Series Tutorial
                    http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm

R and Time          Time Series Analysis with R
Series Data
                    http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_
Time Series
Decomposi-          intro.pdf
tion
                    Using R (with applications in Time Series Analysis)
Time Series
Forecasting         http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf

Time Series         CRAN Task View: Time Series Analysis
Clustering
                    http://cran.r-project.org/web/views/TimeSeries.html
Time Series
Classification       R Functions for Time Series Analysis
R Functions &       http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
Packages for
Time Series         R Reference Card for Data Mining;
Conclusions         R and Data Mining: Examples and Case Studies
                    http://www.rdatamining.com/

                    Time Series Analysis for Business Forecasting
                    http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
                                                                                                      39/42
Outline


                1   R and Time Series Data
R and Time
Series Data

Time Series     2   Time Series Decomposition
Decomposi-
tion

Time Series     3   Time Series Forecasting
Forecasting

Time Series
Clustering      4   Time Series Clustering
Time Series
Classification
                5   Time Series Classification
R Functions &
Packages for
Time Series
                6   R Functions & Packages for Time Series
Conclusions


                7   Conclusions


                                                             40/42
Conclusions

                    Time series decomposition and forecasting: many R
                    functions and packages available
R and Time
Series Data         Time series classification and clustering: no R functions or
Time Series
Decomposi-
                    packages specially for this purpose; have to work it out by
tion
                    yourself
Time Series
Forecasting         Time series classification: extract and build features, and
Time Series         then apply existing classification techniques, such as SVM,
Clustering

Time Series
                    k-NN, neural networks, regression and decision trees
Classification
                    Time series clustering: work out your own
R Functions &
Packages for        distance/similarity metrics, and then use existing clustering
Time Series
                    techniques, such as k-means and hierarchical clustering
Conclusions
                    Techniques specially for classifying/clustering time series
                    data: a lot of research publications, but no R
                    implementations (as far as I know)
                                                                               41/42
The End

                 Email:               yanchangzhao@gmail.com
                 RDataMining:         http://www.rdatamining.com
R and Time
Series Data      Twitter:             http://twitter.com/rdatamining
Time Series
Decomposi-
                 Group on Linkedin:   http://group.rdatamining.com
tion             Group on Google:     http://group2.rdatamining.com
Time Series
Forecasting

Time Series
Clustering

Time Series
Classification

R Functions &
Packages for
Time Series

Conclusions




                                                                       42/42

More Related Content

What's hot

5. R basics
5. R basics5. R basics
5. R basicsFAO
 
The Ring programming language version 1.3 book - Part 27 of 88
The Ring programming language version 1.3 book - Part 27 of 88The Ring programming language version 1.3 book - Part 27 of 88
The Ring programming language version 1.3 book - Part 27 of 88Mahmoud Samir Fayed
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting SpatialFAO
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Scilab presentation
Scilab presentation Scilab presentation
Scilab presentation Nasir Ansari
 
Building Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William BentonBuilding Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William BentonSpark Summit
 
Jamie Pullar- Cats MTL in action
Jamie Pullar- Cats MTL in actionJamie Pullar- Cats MTL in action
Jamie Pullar- Cats MTL in actionRyan Adams
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
PythonScripting
PythonScriptingPythonScripting
PythonScriptingSait Elmas
 
Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Leonardo Borges
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentationEleni Stamatelou
 
linear transformation and rank nullity theorem
linear transformation and rank nullity theorem linear transformation and rank nullity theorem
linear transformation and rank nullity theorem Manthan Chavda
 
All You Need is Fold in the Key of C#
All You Need is Fold in the Key of C#All You Need is Fold in the Key of C#
All You Need is Fold in the Key of C#Mike Harris
 

What's hot (20)

5. R basics
5. R basics5. R basics
5. R basics
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
The Ring programming language version 1.3 book - Part 27 of 88
The Ring programming language version 1.3 book - Part 27 of 88The Ring programming language version 1.3 book - Part 27 of 88
The Ring programming language version 1.3 book - Part 27 of 88
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
 
ScalaMeter 2012
ScalaMeter 2012ScalaMeter 2012
ScalaMeter 2012
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
ABA Problem
ABA ProblemABA Problem
ABA Problem
 
Scilab presentation
Scilab presentation Scilab presentation
Scilab presentation
 
Building Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William BentonBuilding Machine Learning Algorithms on Apache Spark with William Benton
Building Machine Learning Algorithms on Apache Spark with William Benton
 
Jamie Pullar- Cats MTL in action
Jamie Pullar- Cats MTL in actionJamie Pullar- Cats MTL in action
Jamie Pullar- Cats MTL in action
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Test (S) on R
Test (S) on RTest (S) on R
Test (S) on R
 
PythonScripting
PythonScriptingPythonScripting
PythonScripting
 
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
 
Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012
 
Time series clustering presentation
Time series clustering presentationTime series clustering presentation
Time series clustering presentation
 
linear transformation and rank nullity theorem
linear transformation and rank nullity theorem linear transformation and rank nullity theorem
linear transformation and rank nullity theorem
 
All You Need is Fold in the Key of C#
All You Need is Fold in the Key of C#All You Need is Fold in the Key of C#
All You Need is Fold in the Key of C#
 

Viewers also liked

Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression modelsHamideh Iraj
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
STATA - Time Series Analysis
STATA - Time Series AnalysisSTATA - Time Series Analysis
STATA - Time Series Analysisstata_org_uk
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Krishna Petrochemicals
 
Automated seismic-to-well ties?
Automated seismic-to-well ties?Automated seismic-to-well ties?
Automated seismic-to-well ties?UT Technology
 
A little-book-of-r-for-time-series
A little-book-of-r-for-time-seriesA little-book-of-r-for-time-series
A little-book-of-r-for-time-seriesS DG
 
Logistic regression (generalized linear model)
Logistic regression (generalized linear model)Logistic regression (generalized linear model)
Logistic regression (generalized linear model)Indah Fitri Hapsari
 
Time Series Modelling in R-Forecasting.
Time Series Modelling in R-Forecasting.Time Series Modelling in R-Forecasting.
Time Series Modelling in R-Forecasting.Dr. Volkan OBAN
 

Viewers also liked (20)

Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Time Series
Time SeriesTime Series
Time Series
 
time series analysis
time series analysistime series analysis
time series analysis
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Time series
Time seriesTime series
Time series
 
STATA - Time Series Analysis
STATA - Time Series AnalysisSTATA - Time Series Analysis
STATA - Time Series Analysis
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Automated seismic-to-well ties?
Automated seismic-to-well ties?Automated seismic-to-well ties?
Automated seismic-to-well ties?
 
A little-book-of-r-for-time-series
A little-book-of-r-for-time-seriesA little-book-of-r-for-time-series
A little-book-of-r-for-time-series
 
Logistic regression (generalized linear model)
Logistic regression (generalized linear model)Logistic regression (generalized linear model)
Logistic regression (generalized linear model)
 
Time Series Modelling in R-Forecasting.
Time Series Modelling in R-Forecasting.Time Series Modelling in R-Forecasting.
Time Series Modelling in R-Forecasting.
 
Boxplot
BoxplotBoxplot
Boxplot
 
Scatterplots[1]
Scatterplots[1]Scatterplots[1]
Scatterplots[1]
 

More from Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

More from Yanchang Zhao (9)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Time series-mining-slides

  • 1. R and Time Series Data Time Series Decomposi- Time Series Analysis and Mining with R tion Time Series Forecasting Time Series Yanchang Zhao Clustering Time Series RDataMining.com Classification http://www.rdatamining.com/ R Functions & Packages for Time Series 18 July 2011 Conclusions 1/42
  • 2. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 2/42
  • 3. R R and Time Series Data a free software environment for statistical computing and Time Series Decomposi- graphics tion Time Series runs on Windows, Linux and MacOS Forecasting widely used in academia and research, as well as industrial Time Series Clustering applications Time Series Classification over 3,000 packages R Functions & CRAN Task View: Time Series Analysis Packages for Time Series http://cran.r-project.org/web/views/TimeSeries.html Conclusions 3/42
  • 4. Time Series Data in R R and Time Series Data Time Series Decomposi- class ts tion represents data which has been sampled at equispaced Time Series Forecasting points in time Time Series Clustering frequency=7: a weekly series Time Series frequency=12: a monthly series Classification R Functions & frequency=4: a quarterly series Packages for Time Series Conclusions 4/42
  • 5. Time Series Data in R > a <- ts(1:20, frequency=12, start=c(2011,3)) > print(a) R and Time Series Data Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Time Series 2011 1 2 3 4 5 6 7 8 9 Decomposi- tion 2012 11 12 13 14 15 16 17 18 19 20 Time Series Dec Forecasting Time Series 2011 10 Clustering 2012 Time Series Classification > str(a) R Functions & Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 Packages for Time Series > attributes(a) Conclusions $tsp [1] 2011.167 2012.750 12.000 $class 5/42
  • 6. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 6/42
  • 7. What is Time Series Decomposition R and Time Series Data Time Series To decompose a time series into components: Decomposi- tion Trend component: long term trend Time Series Forecasting Seasonal component: seasonal variation Time Series Clustering Cyclical component: repeated but non-periodic Time Series fluctuations Classification R Functions & Irregular component: the residuals Packages for Time Series Conclusions 7/42
  • 8. Data AirPassengers Data AirPassengers: Monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It has R and Time 144(=12×12) values. Series Data Time Series > plot(AirPassengers) Decomposi- tion Time Series Forecasting 600 Time Series Clustering 500 Time Series Classification AirPassengers 400 R Functions & Packages for Time Series 300 Conclusions 200 100 1950 1952 1954 1956 1958 1960 8/42
  • 9. Decomposition > apts <- ts(AirPassengers, frequency = 12) > f <- decompose(apts) R and Time Series Data > # seasonal figures Time Series > plot(f$figure,type="b") Decomposi- tion Time Series Forecasting q q 60 Time Series Clustering 40 q Time Series 20 Classification q f$figure R Functions & 0 q q Packages for q Time Series −20 q q q Conclusions q −40 q 2 4 6 8 10 12 Index 9/42
  • 10. Decomposition > plot(f) Decomposition of additive time series R and Time Series Data 500 observed Time Series Decomposi- 300 tion 100 Time Series 450 Forecasting 350 trend Time Series 250 Clustering 150 Time Series Classification 40 seasonal R Functions & 0 Packages for 60 −40 Time Series Conclusions random 0 20 −40 2 4 6 8 10 12 Time 10/42
  • 11. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 11/42
  • 12. Time Series Forecasting R and Time Series Data Time Series Decomposi- To forecast future events based on known past data tion Time Series E.g., to predict the opening price of a stock based on its Forecasting past performance Time Series Clustering Popular models Time Series Autoregressive moving average (ARMA) Classification Autoregressive integrated moving average (ARIMA) R Functions & Packages for Time Series Conclusions 12/42
  • 13. Forecasting R and Time > # build an ARIMA model Series Data > fit <- arima(AirPassengers, order=c(1,0,0), Time Series Decomposi- + list(order=c(2,1,0), period=12)) tion > fore <- predict(fit, n.ahead=24) Time Series Forecasting > # error bounds at 95% confidence level Time Series > U <- fore$pred + 2*fore$se Clustering > L <- fore$pred - 2*fore$se Time Series Classification > ts.plot(AirPassengers, fore$pred, U, L, R Functions & + col=c(1,2,4,4), lty = c(1,1,2,2)) Packages for Time Series > legend("topleft", col=c(1,2,4), lty=c(1,1,2), Conclusions + c("Actual", "Forecast", + "Error Bounds (95% Confidence)")) 13/42
  • 14. Forecasting Actual Forecast R and Time 700 Error Bounds (95% Confidence) Series Data 600 Time Series Decomposi- tion 500 Time Series Forecasting 400 Time Series Clustering Time Series 300 Classification R Functions & Packages for 200 Time Series Conclusions 100 1950 1952 1954 1956 1958 1960 1962 Time 14/42
  • 15. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 15/42
  • 16. Time Series Clustering R and Time Series Data To partition time series data into groups based on Time Series similarity or distance, so that time series in the same Decomposi- tion cluster are similar Time Series Measure of distance/dissimilarity Forecasting Time Series Euclidean distance Clustering Manhattan distance Time Series Maximum norm Classification Hamming distance R Functions & Packages for The angle between two vectors (inner product) Time Series Dynamic Time Warping (DTW) distance Conclusions ... 16/42
  • 17. Dynamic Time Warping (DTW) DTW finds optimal alignment between two time series. > library(dtw) R and Time Series Data > idx <- seq(0, 2*pi, len=100) Time Series > a <- sin(idx) + runif(100)/10 Decomposi- tion > b <- cos(idx) Time Series > align <- dtw(a, b, step=asymmetricP1, keep=T) Forecasting > dtwPlotTwoWay(align) Time Series Clustering 1.0 Time Series Classification 0.5 R Functions & Query value Packages for Time Series 0.0 Conclusions −0.5 −1.0 0 20 40 60 80 100 Index 17/42
  • 18. Synthetic Control Chart Time Series The dataset contains 600 examples of control charts R and Time Series Data synthetically generated by the process in Alcock and Time Series Manolopoulos (1999). Decomposi- tion Each control chart is a time series with 60 values. Time Series Forecasting Six classes: Time Series 1-100 Normal Clustering 101-200 Cyclic Time Series Classification 201-300 Increasing trend 301-400 Decreasing trend R Functions & Packages for 401-500 Upward shift Time Series 501-600 Downward shift Conclusions http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 18/42
  • 19. Synthetic Control Chart Time Series R and Time Series Data > # read data into R Time Series Decomposi- > # sep="": the separator is white space, i.e., one tion > # or more spaces, tabs, newlines or carriage returns Time Series Forecasting > sc <- read.table("synthetic_control.data", Time Series + header=F, sep="") Clustering > # show one sample from each class Time Series Classification > idx <- c(1,101,201,301,401,501) R Functions & > sample1 <- t(sc[idx,]) Packages for Time Series > plot.ts(sample1, main="") Conclusions 19/42
  • 20. Six Classes 36 30 34 32 20 301 1 30 R and Time 10 Series Data 28 26 Time Series 0 Decomposi- 45 24 45 tion 40 Time Series 35 Forecasting 101 401 35 Time Series 25 Clustering 30 15 Time Series 35 25 Classification 45 R Functions & 30 Packages for 40 25 201 501 Time Series 35 20 Conclusions 15 30 10 25 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time Time 20/42
  • 21. Hierarchical Clustering with Euclidean distance R and Time Series Data > # sample n cases from every class Time Series > n <- 10 Decomposi- tion > s <- sample(1:100, n) Time Series > idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s) Forecasting Time Series > sample2 <- sc[idx,] Clustering > observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), Time Series Classification + rep(4,n), rep(5,n), rep(6,n)) R Functions & > # hierarchical clustering with Euclidean distance Packages for > hc <- hclust(dist(sample2), method="ave") Time Series Conclusions > plot(hc, labels=observedLabels, main="") 21/42
  • 22. Hierarchical Clustering with Euclidean distance R and Time 140 120 Series Data Time Series Decomposi- 100 tion Time Series Forecasting Height 80 Time Series Clustering Time Series 60 Classification 22 R Functions & Packages for 5 6 Time Series 6 40 2 2 2 2 Conclusions 4 5 2 2 66 5 5 66 6 64 55 3 5 2 2 55 44 33 1 44 4 33 4 4 3 3 20 11 5 3 3 4 11 6 6 11 3 1 1 1 22/42
  • 23. Hierarchical Clustering with Euclidean distance R and Time > # cut tree to get 8 clusters Series Data > memb <- cutree(hc, k=8) Time Series Decomposi- > table(observedLabels, memb) tion Time Series memb Forecasting observedLabels 1 2 3 4 5 6 7 8 Time Series Clustering 1 10 0 0 0 0 0 0 0 Time Series 2 0 3 1 1 3 2 0 0 Classification R Functions & 3 0 0 0 0 0 0 10 0 Packages for Time Series 4 0 0 0 0 0 0 0 10 Conclusions 5 0 0 0 0 0 0 10 0 6 0 0 0 0 0 0 0 10 23/42
  • 24. Hierarchical Clustering with DTW Distance > myDist <- dist(sample2, method="DTW") R and Time > hc <- hclust(myDist, method="average") Series Data > plot(hc, labels=observedLabels, main="") Time Series Decomposi- > # cut tree to get 8 clusters tion > memb <- cutree(hc, k=8) Time Series Forecasting > table(observedLabels, memb) Time Series Clustering memb Time Series observedLabels 1 2 3 4 5 6 7 8 Classification 1 10 0 0 0 0 0 0 0 R Functions & Packages for 2 0 4 3 2 1 0 0 0 Time Series 3 0 0 0 0 0 6 4 0 Conclusions 4 0 0 0 0 0 0 0 10 5 0 0 0 0 0 0 10 0 6 0 0 0 0 0 0 0 10 24/42
  • 25. Hierarchical Clustering with DTW Distance R and Time 1000 800 Series Data Time Series Decomposi- tion 600 Time Series Forecasting Height Time Series Clustering 400 Time Series Classification R Functions & Packages for Time Series 200 Conclusions 2 2 2 22 2 22 2 2 4 6 66 33 35 35 46 33 6 55 3 4 6 6 35 1 1 4 6 6 6 55 5 33 4 4 4 4 1 5 5 1 1 1 1 4 4 1 1 1 0 25/42
  • 26. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 26/42
  • 27. Time Series Classification Time Series Classification To build a classification model based on labelled time R and Time Series Data series Time Series Decomposi- and then use the model to predict the lable of unlabelled tion time series Time Series Forecasting Feature Extraction Time Series Clustering Singular Value Decomposition (SVD) Time Series Discrete Fourier Transform (DFT) Classification R Functions & Discrete Wavelet Transform (DWT) Packages for Time Series Piecewise Aggregate Approximation (PAA) Conclusions Perpetually Important Points (PIP) Piecewise Linear Representation Symbolic Representation 27/42
  • 28. Decision Tree (ctree) R and Time Series Data ctree from package party Time Series Decomposi- > classId <- c(rep("1",100), rep("2",100), tion + rep("3",100), rep("4",100), Time Series Forecasting + rep("5",100), rep("6",100)) Time Series > newSc <- data.frame(cbind(classId, sc)) Clustering > library(party) Time Series Classification > ct <- ctree(classId ~ ., data=newSc, R Functions & + controls = ctree_control(minsplit=20, Packages for Time Series + minbucket=5, maxdepth=5)) Conclusions 28/42
  • 29. Decision Tree > pClassId <- predict(ct) > table(classId, pClassId) R and Time Series Data pClassId Time Series Decomposi- classId 1 2 3 4 5 6 tion 1 100 0 0 0 0 0 Time Series Forecasting 2 1 97 2 0 0 0 Time Series Clustering 3 0 0 99 0 1 0 Time Series 4 0 0 0 100 0 0 Classification 5 4 0 8 0 88 0 R Functions & Packages for 6 0 3 0 90 0 7 Time Series Conclusions > # accuracy > (sum(classId==pClassId)) / nrow(sc) [1] 0.8183333 29/42
  • 30. DWT (Discrete Wavelet Transform) Wavelet transform provides a multi-resolution representation using wavelets. R and Time Series Data Haar Wavelet Transform – the simplest DWT Time Series Decomposi- http://dmr.ath.cx/gfx/haar/ tion Time Series Forecasting Time Series Clustering Time Series Classification R Functions & Packages for Time Series Conclusions DFT (Discrete Fourier Transform): another popular feature extraction technique 30/42
  • 31. DWT (Discrete Wavelet Transform) R and Time > # extract DWT (with Haar filter) coefficients Series Data Time Series > library(wavelets) Decomposi- tion > wtData <- NULL Time Series > for (i in 1:nrow(sc)) { Forecasting + a <- t(sc[i,]) Time Series Clustering + wt <- dwt(a, filter="haar", boundary="periodic") Time Series + wtData <- rbind(wtData, Classification + unlist(c(wt@W, wt@V[[wt@level]]))) R Functions & Packages for + } Time Series > wtData <- as.data.frame(wtData) Conclusions > wtSc <- data.frame(cbind(classId, wtData)) 31/42
  • 32. Decision Tree with DWT > ct <- ctree(classId ~ ., data=wtSc, controls = + ctree_control(minsplit=20, R and Time Series Data + minbucket=5, maxdepth=5)) Time Series > pClassId <- predict(ct) Decomposi- tion > table(classId, pClassId) Time Series pClassId Forecasting Time Series classId 1 2 3 4 5 6 Clustering 1 98 2 0 0 0 0 Time Series Classification 2 1 99 0 0 0 0 R Functions & 3 0 0 81 0 19 0 Packages for 4 0 0 0 74 0 26 Time Series Conclusions 5 0 0 16 0 84 0 6 0 0 0 3 0 97 > (sum(classId==pClassId)) / nrow(wtSc) [1] 0.8883333 32/42
  • 33. > plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0)) 1 V57 ≤ 117 > 117 2 9 W43 V57 ≤ −4 > −4 ≤ 140 > 140 3 6 11 W5 W31 V57 ≤ 178 > 178 12 17 W22 W31 ≤ −8 > −8 ≤ −6 > −6 ≤ −6 > −6 ≤ −15 > −15 14 19 W31 W43 ≤ −13 > −13 ≤3 >3 Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97) 1 1 1 1 1 1 1 1 1 1 1 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 0 0 0 0 0 0 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
  • 34. k-NN Classification find the k nearest neighbours of a new instance R and Time label it by majority voting Series Data needs an efficient indexing structure for large datasets Time Series Decomposi- > k <- 20 tion Time Series > newTS <- sc[501,] + runif(100)*15 Forecasting > distances <- dist(newTS, sc, method="DTW") Time Series Clustering > s <- sort(as.vector(distances), index.return=TRUE) Time Series > # class IDs of k nearest neighbours Classification > table(classId[s$ix[1:k]]) R Functions & Packages for Time Series 4 6 Conclusions 3 17 Results of Majority Voting Label of newTS ← class 6 34/42
  • 35. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 35/42
  • 36. Functions - Construction, Plot & Smoothing R and Time Construction Series Data Time Series ts() create time-series objects (stats) Decomposi- tion Plot Time Series Forecasting plot.ts() plot time-series objects (stats) Time Series Clustering Smoothing & Filtering Time Series Classification smoothts() time series smoothing (ast) R Functions & Packages for Time Series sfilter() remove seasonal fluctuation using moving Conclusions average (ast) 36/42
  • 37. Functions - Decomposition & Forecasting Decomposition decomp() time series decomposition by square-root filter R and Time Series Data (timsac) Time Series decompose() classical seasonal decomposition by moving Decomposi- tion averages (stats) Time Series stl() seasonal decomposition of time series by loess Forecasting (stats) Time Series Clustering tsr() time series decomposition (ast) Time Series ardec() time series autoregressive decomposition Classification (ArDec) R Functions & Packages for Time Series Forecasting Conclusions arima() fit an ARIMA model to a univariate time series (stats) predict.Arima() forecast from models fitted by arima (stats) 37/42
  • 38. Packages Packages timsac time series analysis and control program R and Time Series Data ast time series analysis Time Series Decomposi- ArDec time series autoregressive-based decomposition tion ares a toolbox for time series analyses using generalized Time Series Forecasting additive models Time Series Clustering dse tools for multivariate, linear, time-invariant, time Time Series series models Classification forecast displaying and analysing univariate time series R Functions & Packages for forecasts Time Series dtw Dynamic Time Warping – find optimal alignment Conclusions between two time series wavelets wavelet filters, wavelet transforms and multiresolution analyses 38/42
  • 39. Online Resources An R Time Series Tutorial http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm R and Time Time Series Analysis with R Series Data http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_ Time Series Decomposi- intro.pdf tion Using R (with applications in Time Series Analysis) Time Series Forecasting http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf Time Series CRAN Task View: Time Series Analysis Clustering http://cran.r-project.org/web/views/TimeSeries.html Time Series Classification R Functions for Time Series Analysis R Functions & http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf Packages for Time Series R Reference Card for Data Mining; Conclusions R and Data Mining: Examples and Case Studies http://www.rdatamining.com/ Time Series Analysis for Business Forecasting http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm 39/42
  • 40. Outline 1 R and Time Series Data R and Time Series Data Time Series 2 Time Series Decomposition Decomposi- tion Time Series 3 Time Series Forecasting Forecasting Time Series Clustering 4 Time Series Clustering Time Series Classification 5 Time Series Classification R Functions & Packages for Time Series 6 R Functions & Packages for Time Series Conclusions 7 Conclusions 40/42
  • 41. Conclusions Time series decomposition and forecasting: many R functions and packages available R and Time Series Data Time series classification and clustering: no R functions or Time Series Decomposi- packages specially for this purpose; have to work it out by tion yourself Time Series Forecasting Time series classification: extract and build features, and Time Series then apply existing classification techniques, such as SVM, Clustering Time Series k-NN, neural networks, regression and decision trees Classification Time series clustering: work out your own R Functions & Packages for distance/similarity metrics, and then use existing clustering Time Series techniques, such as k-means and hierarchical clustering Conclusions Techniques specially for classifying/clustering time series data: a lot of research publications, but no R implementations (as far as I know) 41/42
  • 42. The End Email: yanchangzhao@gmail.com RDataMining: http://www.rdatamining.com R and Time Series Data Twitter: http://twitter.com/rdatamining Time Series Decomposi- Group on Linkedin: http://group.rdatamining.com tion Group on Google: http://group2.rdatamining.com Time Series Forecasting Time Series Clustering Time Series Classification R Functions & Packages for Time Series Conclusions 42/42