Formulas (computed series)#

Purpose#

This tshistory component provides a formula (time series domain specific) language to build computed time series.

Formulas are read-only series (you can’t update or replace them).

They also have versions and an history, which is built, time stamps wise, using the union of all constituent time stamps, and value wise, by applying the formula.

Because of this the staircase operator is available on formulae. Some staircase operations can have a fast implementation if the formula obeys commutativity rules.

Operators#

General Syntax#

Formulas are expressed in a lisp-like syntax using operators, positional (mandatory) parameters and keyword (optional) parameters.

The general form is:

(<operator> <param1> ... <paramN> #:<keyword1> <value1> ... #:<keywordN> <valueN>)

Here are a couple examples:

  • (add (series "wallonie") (series "bruxelles") (series "flandres"))

Here we see the two fundamental add and series operators at work.

This would form a new synthetic series out of three base series (which can be either raw series or formulas themselves).

  • (round (series "foo") #:decimals 2)

This illustrates the keywords.

Some notes:

  • operator names can contain dashes or arbitrary caracters

  • literal values can be: 3 (integer), 5.2 (float), "hello" (string), #t or #f (true or false).

Registering new operators#

This is a fundamental need. Operators are fixed python functions exposed through a lispy syntax. Applications need a variety of fancy operators.

Declaring a new operator#

One just needs to decorate a python function with the func decorator:

from tshistory_formula.registry import func

@func('identity')
def identity(series: pd.Series) -> pd.Series:
    return series

The operator will be known to the outer world by the name given to @func, not the python function name (which can be arbitrary).

You must provide correct type annotations : the formula language is statically typed and the typechecker will refuse to work with an untyped operator.

This is enough to get a working transformation operator. However operators built to construct series rather than just transform pre-existing series are more complicated.

Autotrophic series operator#

We start with an example, a proxy operator that gets a series from an existing time series silo (on the fly) to be served as it came from your local installation.

We would use it like this: (proxy "a-name" #:parameter 42.3)

As we can see it can look like the series operator, though its signature might be more complicated (this will be entirely dependent on the way to enumerate series in the silo).

Hence proxy must be understood as an alternative to series itself. Here is how the initial part would look:

from tshistory_formula.registry import func, finder, metadata, history, insertion_dates

@func('proxy', auto=True)
def proxy(__interpreter__,
          __from_value_date__,
          __to_value_date__,
          __revision_date__,
          name: str,
          parameter=0):

    # we assume there is some python client available
    # for the tier timeseries silo
    return silo_client.get(
        fromdate=__from_value_date__,
        todate=__to_value_date__,
        revdate=__revision_date__
    )

This is a possible implementation of the API get protocol.

Ths dunder methods are a mandatory part of the signature. The other parameters (positional or keyword) are at your convenience and will be exposed to the formula users.

We must also provide an helper for the formula system to detect the presence of this particular kind of operator in a formula (because it is not like other mere transformation operators).

Let’s have it:

@finder('proxy')
def proxy_finder(cn, tsh, tree):
    return {
        tree[1]: tree
    }

Let us explain the parameters:

  • cn is a reference to the current database connection

  • tsh is a reference to the internal API implementation object (and you will need the cn object to use it)

  • tree is a representation of the formula restricted to the proxy operator use

When implementing a proxy-like operator, one generally won’t need the first two items. But here is an example of what the tree would look like:

['proxy, 'a-name', '#:parameter, 77]

Yes, the half-quoted ‘proxy and ‘#:parameters are not typos. These are respectively a:

  • symbol (simimlar to a variable name in Python)

  • keyword (similar to a Python keyword)

In the finder return dictionary, only the key of the dictionary is important: it should be globally unique and will be used to provide an (internal) alias for the provided series name. For instance, in our example, if parameter has an impact on the returned series identity, it should be part of the key. Like this:

@finder('proxy')
def proxy_finder(cn, tsh, tree):
    return {
        f'tree[1]-tree[2]': tree
    }

We also have to map the metadata, insertion_dates and the history API methods.

@metadata('proxy')
def proxy_metadata(cn, tsh, tree):
    return {
        f'proxy:{tree[1]}-{tree[2]}': {
            'tzaware': True,
            'source': 'silo-proxy',
            'index_type': 'datetime64[ns, UTC]',
            'value_type': 'float64',
            'index_dtype': '|M8[ns]',
            'value_dtype': '<f8'
        }
    }
@history('proxy')
def proxy_history(__interpreter__,
                  from_value_date=None,
                  to_value_date=None,
                  from_insertion_date=None,
                  to_insertion_date=None):
    # write the implementation there :)


@insertion_dates('proxy')
def proxy_idates(__interpreter__,
                 from_value_date=None,
                 to_value_date=None,
                 from_insertion_date=None,
                 to_insertion_date=None):
    # write the implementation there :)

Pre-defined operators#

abs(series)

Return the absolute value element-wise.

Example: (abs (series “series-with-negative-values”))

Parameters:

series (Series)

Return type:

Series

asof(revision_date, series)

Fetch the series in the asof scope with the specified revision date.

Example: (asof (shifted (now) #:days -1) (series “i-have-many-versions”))

Parameters:
  • revision_date (Timestamp)

  • series (Series)

Return type:

Series

block_staircase(__interpreter__, __from_value_date__, __to_value_date__, name, revision_freq_hours=None, revision_freq_days=None, revision_time_hours=None, revision_time_days=None, revision_tz='UTC', maturity_offset_hours=None, maturity_offset_days=None, maturity_time_hours=None, maturity_time_days=None)

Computes a series rebuilt from successive blocks of history, each linked to a distinct revision date. The revision dates are taken at regular time intervals determined by revision_freq, revision_time and revision_tz. The time lag between revision dates and value dates of each block is determined by maturity_offset and maturity_time.

Example:

(block-staircase “forecast-series” #:revision_freq_days 1 #:revision_time_hours 11 #:maturity_offset_days 1)

Parameters:
  • name (seriesname)

  • revision_freq_hours (int | None)

  • revision_freq_days (int | None)

  • revision_time_hours (int | None)

  • revision_time_days (int | None)

  • revision_tz (str)

  • maturity_offset_hours (int | None)

  • maturity_offset_days (int | None)

  • maturity_time_hours (int | None)

  • maturity_time_days (int | None)

Return type:

Series

byand(*queries)

Yields a query filter doing a logical AND to its input query filters.

Example: (add (findseries (by.and (by.name “capacity”) (by.metakey “plant”))))

Parameters:

queries (query)

Return type:

query

bybasket(__interpreter__, basketname)

Yields a query filter operating on series names.

Example: (add (findseries (by.basket “fr.powerplants”)))

This will yield the series matching the basket definition.

Parameters:

basketname (str)

Return type:

query

bymetaitems(key, value)

Yields a query filter operating on metadata items.

Example: (add (findseries (by.metaitem “plant_status” “running”)))

This will filter the series having “running” as a value for the “plant_status” key in their metadata.

Parameters:
  • key (str)

  • value (str | Number)

Return type:

query

bymetakey(keyquery)

Yields a query filter operating on metadata key.

Example: (add (findseries (by.metakey “plant_status”)))

This will filter the series having “plant_status” in their metadata.

Parameters:

keyquery (str)

Return type:

query

byname(namequery)

Yields a query filter operating on series names.

Example: (add (findseries (by.name “fr capacity”)))

This will filter the series whose names contain, in order, the “fr” and “capacity” fragments.

Parameters:

namequery (str)

Return type:

query

bynot(query)

Yields a query filter negating its input query filter.

Example: (add (findseries (by.not (by.name “capacity”))))

This will filter the series NOT having “capacity” in their name.

Parameters:

query (query)

Return type:

query

byor(*queries)

Yields a query filter doing a logical OR to its input query filters.

Example: (add (findseries (by.or (by.name “capacity”) (by.metakey “plant”))))

Parameters:

queries (query)

Return type:

query

byvalue(key, operator, value)

Yields a query filter operating on metadata items.

Example: (add (findseries (by.value “weigth” “<=” 42)))

This will filter the series having a “weight” metadata entry and keep those whose values is <= 42.

The available operators are <, <=, >, >=.

Parameters:
  • key (str)

  • operator (str)

  • value (str | Number)

Return type:

query

constant(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, value, fromdate, todate, freq, revdate)

Produces a constant-valued timeseries over a pre-defined horizon and a given granularity and for a given revision date.

Example: (constant 42.5 (date “1900-1-1”) (date “2039-12-31”) “D” (date “1900-1-1”))

This will yield a daily series of value 42.5 between 1900 and 2040, dated from 1900.

Parameters:
  • value (Number)

  • fromdate (Timestamp)

  • todate (Timestamp)

  • freq (str)

  • revdate (Timestamp)

Return type:

Series

cumsum(series)

Return cumulative sum over a series.

Example: (cumsum (series “sum-me”))

Parameters:

series (Series)

Return type:

Series

different_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

end_of_month(date)

Produces a timezone-aware timestamp equal to the last day of the given date current month.

Example: (end-of-month (date “1973-05-20 09:00”))

Parameters:

date (Timestamp)

Return type:

Timestamp

equal_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

findseries(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, q, naive=False, fill=None)

Yields a series list out of a metadata/name/… filtering query.

Examples: (add (findseries (by.value “weight” “<” 43)))

The findseries operator accepts two keywords:

  • naive (defaults to #f) to filter on tzaware or naive series (we don’t want to mix those there; this is an important difference with the find API point).

  • fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.

Parameters:
  • q (query)

  • naive (bool)

  • fill (str | Number | None)

Return type:

List[Series]

get_holidays(__interpreter__, __from_value_date__, __to_value_date__, country, naive=False)

Compute a series whose values will be either 0 or 1 to signal the holydays.

Takes a string for the 2-letters country code and an optional naive keyword (values #t or #f, #f by default) to force a naive series output.

Example: (holidays “fr”)

Parameters:
  • country (str)

  • naive (bool | None)

Return type:

Series

inferior_or_equal_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

inferior_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

integration(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, stock_name, flow_name, fill=False)

Integrate a given flow series to the last known value of a stock series.

Example: (integration “stock-series-name” “flow-series-name”)

Parameters:
  • stock_name (str)

  • flow_name (str)

  • fill (bool | None)

Return type:

Series

naive(series, tzone)

Allow demoting a series from a tz-aware index to a tz-naive index.

One must provide a target timezone.

Example: (naive (series “tz-aware-series-from-poland”) “Europe/Warsaw”)

Parameters:
  • series (Series)

  • tzone (str)

Return type:

Series

now(__interpreter__, naive=False, tz=None)

Produces a timezone-aware timestamp as of now

The naive keyword forces production of a naive timestamp. The tz keyword allows to specify an alternate time zone (if unpecified and not naive). Both tz and naive keywords are mutually exlcusive.

Example: (now)

Parameters:
  • naive (bool | None)

  • tz (str | None)

Return type:

Timestamp

options(series, fill=None, limit=None, weight=None)

The options operator takes a series and three keywords to modify the behaviour of series.

  • fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.

  • limit: if fill is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If fill is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • weight to provide a weight (float) value to be used by other operators like e.g. row-mean

The fill, limit and weight options are put on the series object for later use.

Parameters:
  • series (Series)

  • fill (str | Number | None)

  • limit (int | None)

  • weight (Number | None)

Return type:

Series

resample(__interpreter__, __revision_date__, __from_value_date__, __to_value_date__, series, freq, method='mean')

Resamples its input series using freq and the aggregation method method (as described in the pandas documentation).

Example: (resample (series “hourly”) “D”)

Parameters:
  • series (Series)

  • freq (str)

  • method (str)

Return type:

Series

rolling(series, window, method='mean')

Computes a calculation method (mean by default) to a rolling window (as described in the pandas documentation).

Example: (rolling (series “foo”) 30 #:method “median”))

Parameters:
  • series (Series)

  • window (int)

  • method (str)

Return type:

Series

round(series, decimals=0)

Round element-wise considering the number of decimals specified (0 by default).

Example: (round (series “series-with-decimals”) #:decimals 2)

Parameters:
  • series (Series)

  • decimals (Number | None)

Return type:

Series

row_max(*serieslist, skipna=True)

Computes the row-wise maximum of its input series.

Example: (row-max (series “station0”) (series “station1”) (series “station2”))

Example: (row-max (series “station0”) (series “station1”) #:skipna #f)

The skipna keyword (which is true by default) controls the behaviour with nan values.

Parameters:
  • serieslist (Series)

  • skipna (bool | None)

Return type:

Series

row_mean(*serieslist, skipna=True)

This operator computes the row-wise mean of its input series using the series weight option if present. The missing points are handled as if the whole series were absent.

Example: (row-mean (series “station0”) (series “station1” #:weight 2) (series “station2”))

Weights are provided as a keyword to series. No weight is interpreted as 1.

Parameters:
  • serieslist (Series)

  • skipna (bool | None)

Return type:

Series

row_min(*serieslist, skipna=True)

Computes the row-wise minimum of its input series.

Example: (row-min (series “station0”) (series “station1”) (series “station2”))

Example: (row-min (series “station0”) (series “station1”) #:skipna #f)

The skipna keyword (which is true by default) controls the behaviour with nan values.

Parameters:
  • serieslist (Series)

  • skipna (bool | None)

Return type:

Series

row_std(*serieslist, skipna=True)

Computes the standard deviation over its input series.

Example: (std (series “station0”) (series “station1”) (series “station2”))

Example: (std (series “station0”) (series “station1”) #:skipna #f)

The skipna keyword (which is true by default) controls the behaviour with nan values.

Parameters:
  • serieslist (Series)

  • skipna (bool | None)

Return type:

Series

scalar_add(num, num_or_series)

Add a constant quantity to a series.

Example: (+ 42 (series “i-feel-undervalued”))

Parameters:
  • num (Number)

  • num_or_series (Number | Series)

Return type:

Number | Series

scalar_div(num_or_series, num)

Perform a scalar division between numbers or a series and a scalar.

Example: (/ (series “div-me”) (/ 3 2))

Parameters:
  • num_or_series (Number | Series)

  • num (Number)

Return type:

Number | Series

scalar_pow(series, num)

Performs an exponential power on a series.

Example: (** (series “positive-things”) 2)

Parameters:
  • series (Series)

  • num (Number)

Return type:

Series

scalar_prod(num, num_or_series)

Performs a scalar product on a series.

Example: (* -1 (series “positive-things”))

Parameters:
  • num (Number)

  • num_or_series (Number | Series)

Return type:

Number | Series

series(__interpreter__, __from_value_date__, __to_value_date__, __revision_date__, name, fill=None, limit=None, weight=None)

Returns a time series by name.

The series operator accepts several keywords:

  • fill to specify a filling policy to avoid nans when the series will be add`ed with others; accepted values are `”ffill” (forward-fill), “bfill” (backward-fill) or any floating value.

  • limit: if fill is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If fill is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

  • weight to provide a weight (float) value to be used by other operators like e.g. row-mean

Example: (add (series “a” #:fill 0) (series “b”))

In the example, we make sure that series a, if shorter than series b will get zeroes instead of nans where b provides values.

Parameters:
  • name (seriesname)

  • fill (str | Number | None)

  • limit (int | None)

  • weight (Number | None)

Return type:

Series

series_add(*serieslist)

Linear combination of two or more series. Takes a variable number of series as input.

Example: (add (series “wallonie”) (series “bruxelles”) (series “flandres”))

To specify the behaviour of the add operation in the face of missing data, the series can be built with the fill keyword. This option is only really applied when several series are combined. By default, if an input series has missing values for a given time stamp, the resulting series has no value for this timestamp (unless a fill rule is provided).

Parameters:

serieslist (Series)

Return type:

Series

series_clip(series, min=None, max=None, replacemin=False, replacemax=False)

Set an upper/lower threshold for a series. Takes a series as positional parameter and accepts four optional keywords min and max which must be numbers, replacemin and replacemax to control filling out of bounds data with min and max respectively.

Example: (clip (series “must-be-positive”) #:min 0 #:replacemin #t)

Parameters:
  • series (Series)

  • min (Number | None)

  • max (Number | None)

  • replacemin (bool | None)

  • replacemax (bool | None)

Return type:

Series

series_div(s1, s2)

Element wise division of two series.

Example: (div (series “$-to-€”) (series “€-to-£”))

Parameters:
  • s1 (Series)

  • s2 (Series)

Return type:

Series

series_multiply(*serieslist)

Element wise multiplication of series. Takes a variable number of series as input.

Example: (mul (series “banana-spot-price ($)”) (series “$-to-€” #:fill “ffill”))

This might convert a series priced in dollars to a series priced in euros, using a currency exchange rate series with a forward-fill option.

Parameters:

serieslist (Series)

Return type:

Series

series_priority(*serieslist)

The priority operator combines its input series as layers. For each timestamp in the union of all series time stamps, the value comes from the first series that provides a value.

Example: (priority (series “realized”) (series “nominated”) (series “forecasted”))

Here realized values show up first, and any missing values come from nominated first and then only from forecasted.

Parameters:

serieslist (Series)

Return type:

Series

shifted(date, years=0, months=0, weeks=0, days=0, hours=0, minutes=0)

Takes a timestamp and a number of years, months, weekds, days, hours, minutes (int) and computes a new date according to the asked delta elements.

Example: (shifted (date “2020-1-1”) #:weeks 1 #:hours 2)

Parameters:
  • date (Timestamp)

  • years (int)

  • months (int)

  • weeks (int)

  • days (int)

  • hours (int)

  • minutes (int)

Return type:

Timestamp

slice(series, fromdate=None, todate=None)

This allows cutting a series at date points. It takes one positional parameter (the series) and two optional keywords fromdate and todate.

Example: (slice (series “cut-me”) #:fromdate (date “2018-01-01”))

Parameters:
  • series (Series)

  • fromdate (Timestamp | None)

  • todate (Timestamp | None)

Return type:

Series

start_of_month(date)

Produces a timezone-aware timestamp equal to the first day of the given date current month.

Example: (start-of-month (date “1973-05-20 09:00”))

Parameters:

date (Timestamp)

Return type:

Timestamp

sub(series1, series2)

Return the substraction of two series element-wise.

Example: (sub (series “series1”) (series “series2”))

Parameters:
  • series1 (Series)

  • series2 (Series)

Return type:

Series

superior_or_equal_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

superior_to(series, num_or_series, false_value=0, true_value=1)

Returns a series of length of the first argument if the second one is a scalar, or of the length of the index intersection in the case of two series.

Series values are dependent on the condition: * the values are set to true_value (default 1) where the condition is verified. * the values are set to false_value (default 0) where the condition is NOT verified.

Parameters:
  • series (Series)

  • num_or_series (Number | Series)

  • false_value (Number | None)

  • true_value (Number | None)

Return type:

Series

time_shifted(series, weeks=0, days=0, hours=0, minutes=0)

Shift the dates of a series.

Takes the following keywords: weeks, days, hours, with positive or negative values.

Example (time-shifted (series “shifted”) #:days 2 #:hours 7)

Parameters:
  • series (Series)

  • weeks (int)

  • days (int)

  • hours (int)

  • minutes (int)

Return type:

Series

timestamp(strdate, tz='UTC')

Produces an utc timestamp from its input string date in iso format.

The tz keyword allows to specify an alternate time zone. The naive keyword forces production of a naive timestamp. Both tz and naive keywords are mutually exlcusive.

Parameters:
  • strdate (str)

  • tz (str | None)

Return type:

Timestamp

trig_arccosinus(series)

Trigonometric inverse cosine on a series of values [-1, 1] with a degree output.

Example: (trig.arcos (series “coordinates”))

Parameters:

series (Series)

Return type:

Series

trig_arcsinus(series)

Trigonometric inverse sine on a series of values [-1, 1] with a degree output.

Example: (trig.arcsin (series “coordinates”))

Parameters:

series (Series)

Return type:

Series

trig_arctangent(series)

Trigonometric inverse tangent on a series of values [-1, 1] with a degree output.

Example: (trig.arctan (series “coordinates”))

Parameters:

series (Series)

Return type:

Series

trig_arctangent2(series1, series2)

Arc tangent of x1/x2 choosing the quadrant correctly with a degree output.

Example: (trig.row-arctan2 (series “coordinates1”) (series “coordinates2”))

Parameters:
  • series1 (Series)

  • series2 (Series)

Return type:

Series

trig_cosinus(series)

Cosine element-wise on a degree series.

Example: (trig.cos (series “degree-series”))

Parameters:

series (Series)

Return type:

Series

trig_sinus(series)

Trigonometric sine element-wise on a degree series.

Example: (trig.sin (series “degree-series”))

Parameters:

series (Series)

Return type:

Series

trig_tangent(series)

Compute tangent element-wise on a degree series.

Example: (trig.tan (series “degree-series”))

Parameters:

series (Series)

Return type:

Series

tzaware(series, tzone)

Allow promoting a series from a tz-naive index to a tz-aware index.

One must provide a target timezone.

Example: (tzaware (series “tz-naive-series-from-poland”) “Europe/Warsaw”)

Parameters:
  • series (Series)

  • tzone (str)

Return type:

Series