Timeseries Store Usage#
Starting with a fresh database#
You need a postgresql database. You can create one like this:
createdb mydb
Then, initialize the tshistory
tables, like this:
tsh init-db postgresql://me:password@localhost/mydb
From this you’re ready to go !
Creating a series#
However here’s a simple example:
>>> import pandas as pd
>>> from tshistory.api import timeseries
>>>
>>> tsa = timeseries('postgresql://me:password@localhost/mydb')
>>>
>>> series = pd.Series([1, 2, 3],
... pd.date_range(start=pd.Timestamp(2017, 1, 1),
... freq='D', periods=3))
# db insertion
>>> tsa.update('my_series', series, 'babar@pythonian.fr')
...
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 3.0
Freq: D, Name: my_series, dtype: float64
# note how our integers got turned into floats
# (there are no provisions to handle integer series as of today)
# retrieval
>>> tsa.get('my_series')
...
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 3.0
Name: my_series, dtype: float64
Note that we generally adopt the convention to name the time series api
object tsa
.
Updating a series#
This is good. Now, let’s insert more:
>>> series = pd.Series([2, 7, 8, 9],
... pd.date_range(start=pd.Timestamp(2017, 1, 2),
... freq='D', periods=4))
# db insertion
>>> tsa.update('my_series', series, 'babar@pythonian.fr')
...
2017-01-03 7.0
2017-01-04 8.0
2017-01-05 9.0
Name: my_series, dtype: float64
# you get back the *new information* you put inside
# and this is why the `2` doesn't appear (it was already put
# there in the first step)
# db retrieval
>>> tsa.get('my_series')
...
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 7.0
2017-01-04 8.0
2017-01-05 9.0
Name: my_series, dtype: float64
It is important to note that the third value was replaced, and the two
last values were just appended. As noted the point at 2017-1-2
wasn’t a new information so it was just ignored.
Retrieving history#
We can access the whole history (or parts of it) in one call:
>>> history = tsa.history('my_series')
...
>>>
>>> for idate, series in history.items(): # it's a dict
... print('insertion date:', idate)
... print(series)
...
insertion date: 2018-09-26 17:10:36.988920+02:00
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 3.0
Name: my_series, dtype: float64
insertion date: 2018-09-26 17:12:54.508252+02:00
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 7.0
2017-01-04 8.0
2017-01-05 9.0
Name: my_series, dtype: float64
Note how this shows the full serie state for each insertion date. Also the insertion date is timzeone aware.
Specific versions of a series can be retrieved individually using the
get
method as follows:
>>> tsa.get('my_series', revision_date=pd.Timestamp('2018-09-26 17:11+02:00'))
...
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 3.0
Name: my_series, dtype: float64
>>>
>>> tsa.get('my_series', revision_date=pd.Timestamp('2018-09-26 17:14+02:00'))
...
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 7.0
2017-01-04 8.0
2017-01-05 9.0
Name: my_series, dtype: float64
It is possible to retrieve only the differences between successive insertions:
>>> diffs = tsa.history('my_series', diffmode=True)
...
>>> for idate, series in diffs.items():
... print('insertion date:', idate)
... print(series)
...
insertion date: 2018-09-26 17:10:36.988920+02:00
2017-01-01 1.0
2017-01-02 2.0
2017-01-03 3.0
Name: my_series, dtype: float64
insertion date: 2018-09-26 17:12:54.508252+02:00
2017-01-03 7.0
2017-01-04 8.0
2017-01-05 9.0
Name: my_series, dtype: float64
You can see a series metadata:
>>> tsa.update_metadata('series', {'foo': 42})
>>> tsa.metadata('series')
{foo: 42}