API documentation#

All the api points are available through the timeseries object. As in:

from tshistory.api import timeseries
tsa = timeseries('http://refinery.datascience.com')
ts = tsa.get('banana-spot-price')

The available methods are the same and behave the same wether you use an http uri or a direct postgres uri.

The methods description below appear to belong to the mainsource object, which talks directly to postgres. This is an unimportant implementation detail.

Base Series Operations#

This constitutes the fundamental API to deal with series on an individual basis.

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

update(name, updatets, author, metadata=None, insertion_date=None, keepnans=False, **kw)

Update a series named by <name> with the input pandas series.

This creates a new version of the series. Only the _changes_ between the last version and the provided series are part of the new version.

A series made of the changed points is returned. If there was no change, an empty series is returned and no new version is created.

New points are added, changed points are changed, points with NaN are dropped if keepnans is False (by default) or _erased_ if True.

The author is mandatory. The metadata dictionary allows to associate any metadata with the new series revision.

It is possible to force an insertion_date, which can only be higher than the previous insertion_date.

>>> import pandas as pd
>>> from tshistory.api import timeseries
>>>
>>> tsa = timeseries('postgres://me:password@localhost/mydb')
>>>
>>> series = pd.Series([1, 2, 3],
...                    pd.date_range(start=pd.Timestamp(2017, 1, 1),
...                                  freq='D', periods=3))
# db insertion
>>> tsa.update('my_series', series, 'babar@pythonian.fr')
...
2017-01-01    1.0
2017-01-02    2.0
2017-01-03    3.0
Freq: D, Name: my_series, dtype: float64

Parameters:

name (str)
updatets (Series)
author (str)
metadata (dict | None)
insertion_date (datetime | None)
keepnans (bool | None)

Return type:

Series | None

replace(name, replacets, author, metadata=None, insertion_date=None, **kw)

Replace a series named by <name> with the input pandas series.

This creates a new version of the series. The series is completely replaced with the provided values.

The author is mandatory. The metadata dictionary allows to associate any metadata with the new series revision.

It is possible to force an insertion_date, which can only be higher than the previous insertion_date.

Parameters:

name (str)
replacets (Series)
author (str)
metadata (dict | None)
insertion_date (datetime | None)

Return type:

Series | None

exists(name)

Checks the existence of a series with a given name.

Parameters:: name (str)
Return type:: bool

source(name)

Provide the source name of a series.

When coming from the main source, it returns ‘local’.

Parameters:: name (str)
Return type:: str | None

get(name, revision_date=None, from_value_date=None, to_value_date=None, inferred_freq=False, _keep_nans=False, **kw)

Get a series by name.

By default one gets the latest version.

By specifying revision_date one can get the closest version matching the given date.

The from_value_date and to_value_date parameters permit to specify a narrower date range (by default all points are provided).

If the series does not exists, a None is returned.

>>> tsa.get('my_series')
...
2017-01-01    1.0
2017-01-02    2.0
2017-01-03    3.0
Name: my_series, dtype: float64

Parameters:

name (str)
revision_date (datetime | None)
from_value_date (datetime | None)
to_value_date (datetime | None)
inferred_freq (bool)
_keep_nans (bool)

Return type:

Series | None

insertion_dates(name, from_insertion_date=None, to_insertion_date=None, from_value_date=None, to_value_date=None, **kw)

Get the list of all insertion dates (as pandas timestamps).

Parameters:

name (str)
from_insertion_date (datetime | None)
to_insertion_date (datetime | None)
from_value_date (datetime | None)
to_value_date (datetime | None)

history(name, from_insertion_date=None, to_insertion_date=None, from_value_date=None, to_value_date=None, diffmode=False, _keep_nans=False, **kw)

Get all versions of a series in the form of a dict from insertion dates to series version.

It is possible to restrict the versions range by specifying from_insertion_date and to_insertion_date.

It is possible to restrict the values range by specifying from_value_date and to_value_date.

If diffmode is set to True, we don’t get full series values between two consecutive insertion date but only the difference series (with new points, updated points and deleted points). This is typically more costly to compute but can be much more compact, and it encodes the same information as with diffmode set to False.

>>> history = tsa.history('my_series')
...
>>>
>>> for idate, series in history.items(): # it's a dict
...     print('insertion date:', idate)
...     print(series)
...
insertion date: 2018-09-26 17:10:36.988920+02:00
2017-01-01    1.0
2017-01-02    2.0
2017-01-03    3.0
Name: my_series, dtype: float64
insertion date: 2018-09-26 17:12:54.508252+02:00
2017-01-01    1.0
2017-01-02    2.0
2017-01-03    7.0
2017-01-04    8.0
2017-01-05    9.0
Name: my_series, dtype: float64

Parameters:

name (str)
from_insertion_date (datetime | None)
to_insertion_date (datetime | None)
from_value_date (datetime | None)
to_value_date (datetime | None)
diffmode (bool)
_keep_nans (bool)

Return type:

Dict[datetime, Series] | None

staircase(name, delta, from_value_date=None, to_value_date=None)

Compute a series whose value dates are the most recent constrained to be delta time after the insertion dates of the series.

This kind of query typically makes sense for forecast series where the relationship between insertion date and value date is sound.

Parameters:

name (str)
delta (timedelta)
from_value_date (datetime | None)
to_value_date (datetime | None)

Return type:

Series | None

block_staircase(name, from_value_date=None, to_value_date=None, revision_freq=None, revision_time=None, revision_tz='UTC', maturity_offset=None, maturity_time=None)

Staircase a series by block

This is a more sophisticated and controllable version of the staircase method.

Computes a series rebuilt from successive blocks of history, each linked to a distinct revision date. The revision dates are taken at regular time intervals determined by revision_freq, revision_time and revision_tz. The time lag between revision dates and value dates of each block is determined by maturity_offset and maturity_time.

name: str unique identifier of the series

from_value_date: pandas.Timestamp from which values are retrieved

to_value_date: pandas.Timestamp to which values are retrieved

revision_freq: dict giving revision frequency, of which keys must be taken from: [‘years’, ‘months’, ‘weeks’, ‘bdays’, ‘days’, ‘hours’, ‘minutes’, ‘seconds’] and values as integers. Default is daily frequency, i.e. {‘days’: 1}
revision_time: dict giving revision time, of which keys should be taken from: [‘year’, ‘month’, ‘day’, ‘weekday’, ‘hour’, ‘minute’, ‘second’] and values must be integers. It is only used for revision date initialisation. The next revision dates are then obtained by successively adding revision_freq. Default is {‘hour’: 0}
revision_tz: str giving time zone in which revision date and time are expressed.: Default is ‘UTC’
maturity_offset: dict giving time lag between each revision date and start time: of related block values. Its keys must be taken from [‘years’, ‘months’, ‘weeks’, ‘bdays’, ‘days’, ‘hours’, ‘minutes’, ‘seconds’] and values as integers. Default is {}, i.e. the revision date is the block start date
maturity_time: dict fixing start time of each block, of which keys should be: taken from [‘year’, ‘month’, ‘day’, ‘weekday’, ‘hour’, ‘minute’, ‘second’] and values must be integers. The start date of each block is thus obtained by adding maturity_offset to revision date and then applying maturity_time. Default is {}, i.e. block start date is just the revision date shifted by maturity_offset

Parameters:

from_value_date (datetime | None)
to_value_date (datetime | None)
revision_freq (Dict[str, int] | None)
revision_time (Dict[str, int] | None)
revision_tz (str)
maturity_offset (Dict[str, int] | None)
maturity_time (Dict[str, int] | None)

interval(name)

Return a pandas interval object which provides the smallest and highest value date of a series.

Parameters:: name (str)
Return type:: Interval

metadata(name, all=None)

Return a series metadata dictionary.

Parameters:

name (str)
all (bool)

Return type:

Dict[str, Any] | None

internal_metadata(name)

Return a series internal metadata dictionary.

Parameters:: name (str)
Return type:: Dict[str, Any]

replace_metadata(name, metadata)

Replace a series metadata with a dictionary from strings to anything json-serializable.

Parameters:

name (str)
metadata (dict)

Return type:

None

update_metadata(name, metadata)

Update a series metadata with a dictionary from strings to anything json-serializable.

Parameters:

name (str)
metadata (dict)

Return type:

None

type(name)

Return the type of a series, for instance ‘primary’ or ‘formula’.

Parameters:: name (str)
Return type:: str

log(name, limit=None, fromdate=None, todate=None)

Return a list of revisions for a given series, in reverse chronological order, with filters.

Revisions are dicts of: * rev: revision id (int) * author: author name * date: timestamp of the revision * meta: the revision metadata

Parameters:

name (str)
limit (int | None)
fromdate (Timestamp | None)
todate (Timestamp | None)

Return type:

List[Dict[str, Any]]

rename(currname, newname, propagate=True)

Rename a series.

The target name must be available.

Parameters:

currname (str)
newname (str)
propagate (bool)

Return type:

None

delete(name)

Delete a series.

This is an irreversible operation.

Parameters:: name (str)

strip(name, insertion_date)

Remove revisions after a specific insertion date.

This is an irreversible operation.

Parameters:

name (str)
insertion_date (datetime)

Return type:

None

Operations on series sets#

These methods permit to enumerate all know series, find them using sophisticated search criteria (by name, metadata key/value, source).

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

catalog(allsources=True)

Produces a catalog of all series in the form of a mapping from source to a list of (name, kind) pair.

By default it provides the series from all sources.

If allsources is False, only the main source is listed.

Parameters:: allsources (bool)
Return type:: Dict[Tuple[str, str], List[Tuple[str, str]]]

find(query, limit=None, meta=False, _source='local')

Return a list of series descriptors matching the query.

A series descriptor is a string-like object (exhibiting the series name) with additional attributes. If meta has been set to True, the .meta (for normal metadata) and .imeta (for internal metadata) fields will be populated (non None). Lastly, the .source and .kind attributes provides the series source and kind.

Here is an example:

tsa.find(
   '(by.and '
   '  (by.tzaware)'
   '  (by.name "power capacity") '
   '  (by.metakey "plant")'
   '  (by.not (by.or '
   '    (by.metaitem "plant_type" "oil")'
   '    (by.metaitem "plant_type" "coal")))'
   '  (by.metaitem "unit" "mwh")'
   '  (by.metaitem "country" "fr"))'
)

This builds a query for timezone aware series about french power plants (in mwh) which are not of the coal or oil fuel type.

The following filters can be used from the search module:

by.tzaware: no parameter, yields time zone aware series names
by.name <str>: takes a space separated string of word, yields series names containing the substrings (in order)
by.metakey <str>: takes a string, strictly matches all series having this metadata key
by.metaitems <str> <str-or-number>: takes a string (key) and an str (or numerical) value and yields all series strictly matching this metadata item
by.and: takes a variable number of filters as above to combine them
by.or: takes a variable number of filters as above to combine them
by.not: produce the negation of a filter

Also inequalities on metadata values can be used:

<, <=, >, >=, =: take a string key, a value (str or num)

As in (<= “max_capacity” 900)

Parameters:

query (str)
limit (int | None)
meta (int | None)
_source (str | None)

Return type:

List[ts]

register_basket(name, query)

The search query has the same specification as the .find(…, query) api call.

Parameters:

name (str)
query (str)

Return type:

None

basket(name)

Returns the list of series descriptors associated with a basket.

Parameters:: name (str)
Return type:: List[str]

basket_definition(name)

Returns the query string associated with a basket.

Parameters:: name (str)
Return type:: str

list_baskets()

Return the list of available basket names.

Return type:: List[str]

delete_basket(name): Delete a basket.

Supervision#

The supervision feature exposes two API points for stored series.

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

edited(name, revision_date=None, from_value_date=None, to_value_date=None, inferred_freq=False, _keep_nans=False)

Returns the base series and a second boolean series whose entries indicate if an override has been made or not.

Parameters:

name (str)
revision_date (Timestamp | None)
from_value_date (Timestamp | None)
to_value_date (Timestamp | None)
inferred_freq (bool | None)
_keep_nans (bool)

Return type:

Tuple[Series, Series]

supervision_status(name)

Returns the supervision status of a series. Possible values are unsupervised, handcrafted and supervised.

Parameters:: name (str)
Return type:: str

Formulas#

The formulas adds computed series to the system ; most previously seen API points work with them. What does not: update and replace (obviously, since formula are by construction a read-only) feature. In the future it is possible that these methods will be implemented with override semantics.

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

register_formula(name, formula, reject_unknown=True)

Define a series as a named formula.

tsa.register_formula('sales.eu', '(add (series "sales.fr") (series "sales.be"))')

Parameters:

name (str)
formula (str)
reject_unknown (bool)

Return type:

None

eval_formula(formula, revision_date=None, from_value_date=None, to_value_date=None)

Execute a formula on the spot.

tsa.eval_formula('(add (series "sales.fr") (series "sales.be"))')

Parameters:

formula (str)
revision_date (Timestamp)
from_value_date (Timestamp)
to_value_date (Timestamp)

Return type:

Series

formula(name, display=True, expanded=False, remote=True, level=-1)

Get the formula associated with a name.

tsa.formula('sales.eu')
...
'(add (series "sales.fr") (series "sales.be"))')

Expanding means replacing all series expressions that are formulas with the formula contents.

It can be all-or-nothing with the expanded parameter or asked for a defined level (stopping the expansion process).

The maximum level can be obtained through the formula_depth api call.

Parameters:

name (str)
display (bool)
expanded (bool)
remote (bool)
level (int)

Return type:

str | None

formula_depth(name)

Compute the depth of a formula.

The depth is the maximum number of formula series sub expressions that have to be traversed to get to the bottom.

Parameters:: name (str)

formula_components(name, expanded=False)

Compute a mapping from series name (defined as formulas) to the names of the component series.

If expanded is true, it will expand the formula before computing the components. Hence only “ground” series (stored or autotrophic formulas) will show up in the leaves.

>>> tsa.formula_components('my-series')
{'my-series': ['component-a', 'component-b']}

>>> tsa.formula_components('my-series-2', expanded=True)
{'my-series-2': [{'sub-component-1': ['component-a', 'component-b']}, 'component-b']}

Parameters:

name (str)
expanded (bool)

Return type:

Dict[str, list] | None

Excel#

The API points listed there are mostly for use by the Excel client.

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

Formula cache#

The formula system allows to grow very complicated computed series (by building them bottom-up), which are by default computed on the fly. The downside can be sluggish performance as complex formulas read hundreds of base series and does computations on them. Hence it can be useful to put them into a “cache”.

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:

uri (str)
namespace (str)
tshclass (type)

new_cache_policy(name, initial_revdate, look_before, look_after, revdate_rule, schedule_rule)

Create a cache policy.

Parameters:

name (str)
initial_revdate (str)
look_before (str)
look_after (str)
revdate_rule (str)
schedule_rule (str)

Return type:

None

edit_cache_policy(name, initial_revdate, look_before, look_after, revdate_rule, schedule_rule)

Modify an existing cache policy (by name).

Parameters:

name (str)
initial_revdate (str)
look_before (str)
look_after (str)
revdate_rule (str)
schedule_rule (str)

Return type:

None

delete_cache_policy(name)

Delete a cache policy (by name).

Parameters:: name (str)
Return type:: None

set_cache_policy(policyname, seriesnames)

Associate series with a cache policy.

Parameters:

policyname (str)
seriesnames (List[str])

Return type:

None

unset_cache_policy(seriesnames)

Dis-associate series from a cache policy.

Parameters:: seriesnames (List[str])
Return type:: None

cache_free_series(allsources=True)

List the series that are available for association with a cache policy.

Parameters:: allsources (bool)

cache_policies(): Return a list of cache policies names.

cache_policy_series(policyname)

Return the list of series associated with a cache policy.

Parameters:: policyname (str)

has_cache(seriesname)

Predicate to verify is a series formula has a cache.

Parameters:: seriesname (str)

delete_cache(seriesname)

Purge the cache of a formula.

Parameters:: seriesname (str)