Source Code, Datasets, and Comparative Experimental Results

Uncertain Time-Series Similarity:
Return to the Basics

Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, and Themis Palpanas.

In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants and engineering facilities to ensure efficiency, product quality and safety, hydrologic and geologic observing systems, pollution management, and others. Due to the inherent imprecision of sensor observations, many investigations have recently turned into querying, mining and storing uncertain data. Uncertainty can also be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms.
In this study, we survey the techniques that have been proposed specifically for modeling and processing uncertain time series, an important model for temporal data. %an important type of data in the context of sensor measurements. We provide an analytical evaluation of the alternatives that have been proposed in the literature, highlighting the advantages and disadvantages of each approach, and further compare these alternatives with two additional techniques that were carefully studied before. We conduct an extensive experimental evaluation with 17 real datasets, and discuss some surprising results, which suggest that a fruitful research direction is to take into account the temporal correlations in the time series. Based on our evaluations, we also provide guidelines useful for the practitioners in the field.

Journal Publication

Source Code

You may freely use this code for research purposes, provided that you properly acknowledge the authors with the following reference:

Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, Themis Palpanas: Uncertain Time-Series Similarity: Return to the Basics. PVLDB 5(11): 1662-1673 (2012).

   author = {Michele Dallachiesa and
   Besmira Nushi and
   Katsiaryna Mirylenka and
   Themis Palpanas},
   title = {Uncertain Time-Series Similarity: Return to the Basics},
   journal = {PVLDB},
   volume = {5},
   number = {11},
   year = {2012},
   pages = {1662-1673},
   ee = {},
   bibsource = {DBLP,}

Real Datasets

The datasets were generated perturbating real data with uniform, normal, and exponentially distributed error. The exact parameters of these distributions used in each run are explicitly mentioned in the paper, in the discussion of each experiment.

These datasets come from the UCR Time Series Classification/Clustering collection, available at this link: