Formulas (computed series)#
Purpose#
This tshistory component provides a formula (time series domain specific) language to build computed time series.
Formulas are read-only series (you can’t update
or replace
them).
They also have versions and an history, which is built, time stamps wise, using the union of all constituent time stamps, and value wise, by applying the formula.
Because of this the staircase
operator is available on formulae.
Some staircase
operations can have a fast implementation if the
formula obeys commutativity rules.
Operators#
General Syntax#
Formulas are expressed in a lisp-like syntax using operators
,
positional (mandatory) parameters and keyword (optional) parameters.
The general form is:
(<operator> <param1> ... <paramN> #:<keyword1> <value1> ... #:<keywordN> <valueN>)
Here are a couple examples:
(add (series "wallonie") (series "bruxelles") (series "flandres"))
Here we see the two fundamental add
and series
operators at
work.
This would form a new synthetic series out of three base series (which can be either raw series or formulas themselves).
(round (series "foo") #:decimals 2)
This illustrates the keywords.
Some notes:
operator names can contain dashes or arbitrary caracters
literal values can be:
3
(integer),5.2
(float),"hello"
(string),#t
or#f
(true or false).
Registering new operators#
This is a fundamental need. Operators are fixed python functions exposed through a lispy syntax. Applications need a variety of fancy operators.
Declaring a new operator#
One just needs to decorate a python function with the func
decorator:
from tshistory_formula.registry import func
@func('identity')
def identity(series: pd.Series) -> pd.Series:
return series
The operator will be known to the outer world by the name given to
@func
, not the python function name (which can be arbitrary).
You must provide correct type annotations : the formula language is statically typed and the typechecker will refuse to work with an untyped operator.
This is enough to get a working transformation operator. However operators built to construct series rather than just transform pre-existing series are more complicated.
Autotrophic series operator#
We start with an example, a proxy
operator that gets a series from
an existing time series silo (on the fly) to be served as it came from
your local installation.
We would use it like this: (proxy "a-name" #:parameter 42.3)
As we can see it can look like the series
operator, though its
signature might be more complicated (this will be entirely dependent
on the way to enumerate series in the silo).
Hence proxy
must be understood as an alternative to series
itself. Here is how the initial part would look:
from tshistory_formula.registry import func, finder, metadata, history, insertion_dates
@func('proxy', auto=True)
def proxy(__interpreter__,
__from_value_date__,
__to_value_date__,
__revision_date__,
name: str,
parameter=0):
# we assume there is some python client available
# for the tier timeseries silo
return silo_client.get(
fromdate=__from_value_date__,
todate=__to_value_date__,
revdate=__revision_date__
)
This is a possible implementation of the API get protocol.
Ths dunder methods are a mandatory part of the signature. The other parameters (positional or keyword) are at your convenience and will be exposed to the formula users.
We must also provide an helper for the formula system to detect the presence of this particular kind of operator in a formula (because it is not like other mere transformation operators).
Let’s have it:
@finder('proxy')
def proxy_finder(cn, tsh, tree):
return {
tree[1]: tree
}
Let us explain the parameters:
cn is a reference to the current database connection
tsh is a reference to the internal API implementation object (and you will need the cn object to use it)
tree is a representation of the formula restricted to the proxy operator use
When implementing a proxy-like operator, one generally won’t need the first two items. But here is an example of what the tree would look like:
['proxy, 'a-name', '#:parameter, 77]
Yes, the half-quoted ‘proxy and ‘#:parameters are not typos. These are respectively a:
symbol (simimlar to a variable name in Python)
keyword (similar to a Python keyword)
In the finder return dictionary, only the key of the dictionary is important: it should be globally unique and will be used to provide an (internal) alias for the provided series name. For instance, in our example, if parameter has an impact on the returned series identity, it should be part of the key. Like this:
@finder('proxy')
def proxy_finder(cn, tsh, tree):
return {
f'tree[1]-tree[2]': tree
}
We also have to map the metadata, insertion_dates and the history API methods.
@metadata('proxy')
def proxy_metadata(cn, tsh, tree):
return {
f'proxy:{tree[1]}-{tree[2]}': {
'tzaware': True,
'source': 'silo-proxy',
'index_type': 'datetime64[ns, UTC]',
'value_type': 'float64',
'index_dtype': '|M8[ns]',
'value_dtype': '<f8'
}
}
@history('proxy')
def proxy_history(__interpreter__,
from_value_date=None,
to_value_date=None,
from_insertion_date=None,
to_insertion_date=None):
# write the implementation there :)
@insertion_dates('proxy')
def proxy_idates(__interpreter__,
from_value_date=None,
to_value_date=None,
from_insertion_date=None,
to_insertion_date=None):
# write the implementation there :)
Pre-defined operators#
- abs(series)
Return the absolute value element-wise.
Example: (abs (series “series-with-negative-values”))
- Parameters:
series (Series)
- Return type:
Series
- asof(revision_date, series)
Fetch the series in the asof scope with the specified revision date.
Example: (asof (shifted (now) #:days -1) (series “i-have-many-versions”))
- Parameters:
revision_date (Timestamp)
series (Series)
- Return type:
Series
- block_staircase(__interpreter__, __from_value_date__, __to_value_date__, name, revision_freq_hours=None, revision_freq_days=None, revision_time_hours=None, revision_time_days=None, revision_tz='UTC', maturity_offset_hours=None, maturity_offset_days=None, maturity_time_hours=None, maturity_time_days=None)
Computes a series rebuilt from successive blocks of history, each linked to a distinct revision date. The revision dates are taken at regular time intervals determined by revision_freq, revision_time and revision_tz. The time lag between revision dates and value dates of each block is determined by maturity_offset and maturity_time.
Example:
(block-staircase “forecast-series” #:revision_freq_days 1 #:revision_time_hours 11 #:maturity_offset_days 1)
- Parameters:
name (seriesname)
revision_freq_hours (int | None)
revision_freq_days (int | None)
revision_time_hours (int | None)
revision_time_days (int | None)
revision_tz (str)
maturity_offset_hours (int | None)
maturity_offset_days (int | None)
maturity_time_hours (int | None)
maturity_time_days (int | None)
- Return type:
Series
- byand(*queries)
Yields a query filter doing a logical AND to its input query filters.
Example: (add (findseries (by.and (by.name “capacity”) (by.metakey “plant”))))
- Parameters:
queries (query)
- Return type:
query
- bybasket(__interpreter__, basketname)
Yields a query filter operating on series names.
Example: (add (findseries (by.basket “fr.powerplants”)))
This will yield the series matching the basket definition.
- Parameters:
basketname (str)
- Return type:
query
- bymetaitems(key, value)
Yields a query filter operating on metadata items.
Example: (add (findseries (by.metaitem “plant_status” “running”)))
This will filter the series having “running” as a value for the “plant_status” key in their metadata.
- Parameters:
key (str)
value (str | Number)
- Return type:
query
- bymetakey(keyquery)
Yields a query filter operating on metadata key.
Example: (add (findseries (by.metakey “plant_status”)))
This will filter the series having “plant_status” in their metadata.
- Parameters:
keyquery (str)
- Return type:
query
- byname(namequery)
Yields a query filter operating on series names.
Example: (add (findseries (by.name “fr capacity”)))
This will filter the series whose names contain, in order, the “fr” and “capacity” fragments.
- Parameters:
namequery (str)
- Return type:
query
- bynot(query)
Yields a query filter negating its input query filter.
Example: (add (findseries (by.not (by.name “capacity”))))
This will filter the series NOT having “capacity” in their name.
- Parameters:
query (query)
- Return type:
query
- byor(*queries)
Yields a query filter doing a logical OR to its input query filters.
Example: (add (findseries (by.or (by.name “capacity”) (by.metakey “plant”))))
- Parameters:
queries (query)
- Return type:
query
- byvalue(key, operator, value)
Yields a query filter operating on metadata items.
Example: (add (findseries (by.value “weigth” “<=” 42)))
This will filter the series having a “weight” metadata entry and keep those whose values is <= 42.
The available operators are <, <=, >, >=.
- Parameters:
key (str)
operator (str)
value (str | Number)
- Return type:
query
- constant(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, value, fromdate, todate, freq, revdate)
Produces a constant-valued timeseries over a pre-defined horizon and a given granularity and for a given revision date.
Example: (constant 42.5 (date “1900-1-1”) (date “2039-12-31”) “D” (date “1900-1-1”))
This will yield a daily series of value 42.5 between 1900 and 2040, dated from 1900.
- Parameters:
value (Number)
fromdate (Timestamp)
todate (Timestamp)
freq (str)
revdate (Timestamp)
- Return type:
Series
- cumsum(series)
Return cumulative sum over a series.
Example: (cumsum (series “sum-me”))
- Parameters:
series (Series)
- Return type:
Series
- different_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- end_of_month(date)
Produces a timezone-aware timestamp equal to the last day of the given date current month.
Example: (end-of-month (date “1973-05-20 09:00”))
- Parameters:
date (Timestamp)
- Return type:
Timestamp
- equal_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- findseries(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, q, naive=False, fill=None)
Yields a series list out of a metadata/name/… filtering query.
Examples: (add (findseries (by.value “weight” “<” 43)))
The findseries operator accepts two keywords:
naive (defaults to #f) to filter on tzaware or naive series (we don’t want to mix those there; this is an important difference with the find API point).
fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.
- Parameters:
q (query)
naive (bool)
fill (str | Number | None)
- Return type:
List[Series]
- get_holidays(__interpreter__, __from_value_date__, __to_value_date__, country, naive=False)
Compute a series whose values will be either 0 or 1 to signal the holydays.
Takes a string for the 2-letters country code and an optional naive keyword (values #t or #f, #f by default) to force a naive series output.
Example: (holidays “fr”)
- Parameters:
country (str)
naive (bool | None)
- Return type:
Series
- inferior_or_equal_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- inferior_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- integration(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, stock_name, flow_name, fill=False)
Integrate a given flow series to the last known value of a stock series.
Example: (integration “stock-series-name” “flow-series-name”)
- Parameters:
stock_name (str)
flow_name (str)
fill (bool | None)
- Return type:
Series
- naive(series, tzone)
Allow demoting a series from a tz-aware index to a tz-naive index.
One must provide a target timezone.
Example: (naive (series “tz-aware-series-from-poland”) “Europe/Warsaw”)
- Parameters:
series (Series)
tzone (str)
- Return type:
Series
- now(__interpreter__, naive=False, tz=None)
Produces a timezone-aware timestamp as of now
The naive keyword forces production of a naive timestamp. The tz keyword allows to specify an alternate time zone (if unpecified and not naive). Both tz and naive keywords are mutually exlcusive.
Example: (now)
- Parameters:
naive (bool | None)
tz (str | None)
- Return type:
Timestamp
- options(series, fill=None, limit=None, weight=None)
The options operator takes a series and three keywords to modify the behaviour of series.
fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.
limit: if fill is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If fill is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
weight to provide a weight (float) value to be used by other operators like e.g. row-mean
The fill, limit and weight options are put on the series object for later use.
- Parameters:
series (Series)
fill (str | Number | None)
limit (int | None)
weight (Number | None)
- Return type:
Series
- resample(__interpreter__, __revision_date__, __from_value_date__, __to_value_date__, series, freq, method='mean')
Resamples its input series using freq and the aggregation method method (as described in the pandas documentation).
Example: (resample (series “hourly”) “D”)
- Parameters:
series (Series)
freq (str)
method (str)
- Return type:
Series
- rolling(series, window, method='mean')
Computes a calculation method (mean by default) to a rolling window (as described in the pandas documentation).
Example: (rolling (series “foo”) 30 #:method “median”))
- Parameters:
series (Series)
window (int)
method (str)
- Return type:
Series
- round(series, decimals=0)
Round element-wise considering the number of decimals specified (0 by default).
Example: (round (series “series-with-decimals”) #:decimals 2)
- Parameters:
series (Series)
decimals (Number | None)
- Return type:
Series
- row_max(*serieslist, skipna=True)
Computes the row-wise maximum of its input series.
Example: (row-max (series “station0”) (series “station1”) (series “station2”))
Example: (row-max (series “station0”) (series “station1”) #:skipna #f)
The skipna keyword (which is true by default) controls the behaviour with nan values.
- Parameters:
serieslist (Series)
skipna (bool | None)
- Return type:
Series
- row_mean(*serieslist, skipna=True)
This operator computes the row-wise mean of its input series using the series weight option if present. The missing points are handled as if the whole series were absent.
Example: (row-mean (series “station0”) (series “station1” #:weight 2) (series “station2”))
Weights are provided as a keyword to series. No weight is interpreted as 1.
- Parameters:
serieslist (Series)
skipna (bool | None)
- Return type:
Series
- row_min(*serieslist, skipna=True)
Computes the row-wise minimum of its input series.
Example: (row-min (series “station0”) (series “station1”) (series “station2”))
Example: (row-min (series “station0”) (series “station1”) #:skipna #f)
The skipna keyword (which is true by default) controls the behaviour with nan values.
- Parameters:
serieslist (Series)
skipna (bool | None)
- Return type:
Series
- row_std(*serieslist, skipna=True)
Computes the standard deviation over its input series.
Example: (std (series “station0”) (series “station1”) (series “station2”))
Example: (std (series “station0”) (series “station1”) #:skipna #f)
The skipna keyword (which is true by default) controls the behaviour with nan values.
- Parameters:
serieslist (Series)
skipna (bool | None)
- Return type:
Series
- scalar_add(num, num_or_series)
Add a constant quantity to a series.
Example: (+ 42 (series “i-feel-undervalued”))
- Parameters:
num (Number)
num_or_series (Number | Series)
- Return type:
Number | Series
- scalar_div(num_or_series, num)
Perform a scalar division between numbers or a series and a scalar.
Example: (/ (series “div-me”) (/ 3 2))
- Parameters:
num_or_series (Number | Series)
num (Number)
- Return type:
Number | Series
- scalar_pow(series, num)
Performs an exponential power on a series.
Example: (** (series “positive-things”) 2)
- Parameters:
series (Series)
num (Number)
- Return type:
Series
- scalar_prod(num, num_or_series)
Performs a scalar product on a series.
Example: (* -1 (series “positive-things”))
- Parameters:
num (Number)
num_or_series (Number | Series)
- Return type:
Number | Series
- series(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, name, fill=None, limit=None, weight=None)
Returns a time series by name.
The series operator accepts several keywords:
fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.
limit: if fill is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If fill is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
weight to provide a weight (float) value to be used by other operators like e.g. row-mean
Example: (add (series “a” #:fill 0) (series “b”))
In the example, we make sure that series a, if shorter than series b will get zeroes instead of nans where b provides values.
- Parameters:
name (seriesname)
fill (str | Number | None)
limit (int | None)
weight (Number | None)
- Return type:
Series
- series_add(*serieslist)
Linear combination of two or more series. Takes a variable number of series as input.
Example: (add (series “wallonie”) (series “bruxelles”) (series “flandres”))
To specify the behaviour of the add operation in the face of missing data, the series can be built with the fill keyword. This option is only really applied when several series are combined. By default, if an input series has missing values for a given time stamp, the resulting series has no value for this timestamp (unless a fill rule is provided).
- Parameters:
serieslist (Series)
- Return type:
Series
- series_clip(series, min=None, max=None, replacemin=False, replacemax=False)
Set an upper/lower threshold for a series. Takes a series as positional parameter and accepts four optional keywords min and max which must be numbers, replacemin and replacemax to control filling out of bounds data with min and max respectively.
Example: (clip (series “must-be-positive”) #:min 0 #:replacemin #t)
- Parameters:
series (Series)
min (Number | None)
max (Number | None)
replacemin (bool | None)
replacemax (bool | None)
- Return type:
Series
- series_div(s1, s2)
Element wise division of two series.
Example: (div (series “$-to-€”) (series “€-to-£”))
- Parameters:
s1 (Series)
s2 (Series)
- Return type:
Series
- series_multiply(*serieslist)
Element wise multiplication of series. Takes a variable number of series as input.
Example: (mul (series “banana-spot-price ($)”) (series “$-to-€” #:fill “ffill”))
This might convert a series priced in dollars to a series priced in euros, using a currency exchange rate series with a forward-fill option.
- Parameters:
serieslist (Series)
- Return type:
Series
- series_priority(*serieslist)
The priority operator combines its input series as layers. For each timestamp in the union of all series time stamps, the value comes from the first series that provides a value.
Example: (priority (series “realized”) (series “nominated”) (series “forecasted”))
Here realized values show up first, and any missing values come from nominated first and then only from forecasted.
- Parameters:
serieslist (Series)
- Return type:
Series
- shifted(date, years=0, months=0, weeks=0, days=0, hours=0, minutes=0)
Takes a timestamp and a number of years, months, weekds, days, hours, minutes (int) and computes a new date according to the asked delta elements.
Example: (shifted (date “2020-1-1”) #:weeks 1 #:hours 2)
- Parameters:
date (Timestamp)
years (int)
months (int)
weeks (int)
days (int)
hours (int)
minutes (int)
- Return type:
Timestamp
- slice(series, fromdate=None, todate=None)
This allows cutting a series at date points. It takes one positional parameter (the series) and two optional keywords fromdate and todate.
Example: (slice (series “cut-me”) #:fromdate (date “2018-01-01”))
- Parameters:
series (Series)
fromdate (Timestamp | None)
todate (Timestamp | None)
- Return type:
Series
- start_of_month(date)
Produces a timezone-aware timestamp equal to the first day of the given date current month.
Example: (start-of-month (date “1973-05-20 09:00”))
- Parameters:
date (Timestamp)
- Return type:
Timestamp
- sub(series1, series2)
Return the substraction of two series element-wise.
Example: (sub (series “series1”) (series “series2”))
- Parameters:
series1 (Series)
series2 (Series)
- Return type:
Series
- superior_or_equal_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- superior_to(series, num_or_series, false_value=0, true_value=1)
Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.
Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.
- Parameters:
series (Series)
num_or_series (Number | Series)
false_value (Number | None)
true_value (Number | None)
- Return type:
Series
- time_shifted(series, weeks=0, days=0, hours=0, minutes=0)
Shift the dates of a series.
Takes the following keywords: weeks, days, hours, with positive or negative values.
Example (time-shifted (series “shifted”) #:days 2 #:hours 7)
- Parameters:
series (Series)
weeks (int)
days (int)
hours (int)
minutes (int)
- Return type:
Series
- timestamp(strdate, tz='UTC')
Produces an utc timestamp from its input string date in iso format.
The tz keyword allows to specify an alternate time zone. The naive keyword forces production of a naive timestamp. Both tz and naive keywords are mutually exlcusive.
- Parameters:
strdate (str)
tz (str | None)
- Return type:
Timestamp
- trig_arccosinus(series)
Trigonometric inverse cosine on a series of values [-1, 1] with a degree output.
Example: (trig.arcos (series “coordinates”))
- Parameters:
series (Series)
- Return type:
Series
- trig_arcsinus(series)
Trigonometric inverse sine on a series of values [-1, 1] with a degree output.
Example: (trig.arcsin (series “coordinates”))
- Parameters:
series (Series)
- Return type:
Series
- trig_arctangent(series)
Trigonometric inverse tangent on a series of values [-1, 1] with a degree output.
Example: (trig.arctan (series “coordinates”))
- Parameters:
series (Series)
- Return type:
Series
- trig_arctangent2(series1, series2)
Arc tangent of x1/x2 choosing the quadrant correctly with a degree output.
Example: (trig.row-arctan2 (series “coordinates1”) (series “coordinates2”))
- Parameters:
series1 (Series)
series2 (Series)
- Return type:
Series
- trig_cosinus(series)
Cosine element-wise on a degree series.
Example: (trig.cos (series “degree-series”))
- Parameters:
series (Series)
- Return type:
Series
- trig_sinus(series)
Trigonometric sine element-wise on a degree series.
Example: (trig.sin (series “degree-series”))
- Parameters:
series (Series)
- Return type:
Series
- trig_tangent(series)
Compute tangent element-wise on a degree series.
Example: (trig.tan (series “degree-series”))
- Parameters:
series (Series)
- Return type:
Series
- tzaware(series, tzone)
Allow promoting a series from a tz-naive index to a tz-aware index.
One must provide a target timezone.
Example: (tzaware (series “tz-naive-series-from-poland”) “Europe/Warsaw”)
- Parameters:
series (Series)
tzone (str)
- Return type:
Series