Groups#

Table of Contents#

Introduction and Stored Groups#

Groups in tshistory are collections of related time series that share the same time index and are managed as a unit. They are particularly useful for handling scenarized time series or multivariate time series data where multiple series need to be kept in sync.

../_images/groups.png

They come in two flavors: primary groups (stored) and formula groups.

Here’s how creating a primary group looks like:

import pandas as pd
from tshistory.api import timeseries

tsa = timeseries()

group_data = pd.DataFrame({
    'low': [17.0, 21.6, 18.2],
    'mid': [20.5, 22.1, 23.4],
    'high': [21.3, 22.6, 24.9]
}, index=pd.date_range('2025-01-01', periods=3, freq='M'))

tsa.group_update('subsidiary1.revenues.fcst', group_data, 'operator')

Note

Groups in tshistory have a fixed schema once created: you can’t change the columns nor, like with time series, fundamental attributes such as tz-awareness.

All group_update and group_replace operations must match the exact column structure of the original group.

This constraint ensures:

  • Data Integrity: Prevents accidental schema changes that could break downstream consumers

  • Version Consistency: All historical versions of a group maintain the same structure

Group Formulas#

Group formulas enable powerful computed groups using the formula language. Like series formulas, they are evaluated on-demand and inherit versioning from their components.

Group Formula Operators#

The formula language provides specialized operators for working with groups:

group

Retrieves a group from storage or formula, similar to the series operator:

(group "subsidiary1.revenues.fcst")

group-add

Performs element-wise addition of multiple groups. All groups must have compatible indexes:

;; obtain low, mid and high revenue scenarios for a company with 3 subsidiaries
(group-add (group "subsidiary1.revenues.fcst")
           (group "subsidiary2.revenues.fcst")
           (group "subsidiary3.revenues.fcst"))

This operator aligns the groups by their time index and adds corresponding columns.

group-add-series

Adds a series to every column of a group. Useful for adjustments or calibrations:

;; convert temperatures from kelvin to celsius
(group-add-series (group "temperatures_kelvin")
                  (series "kelvin_to_celsius_offset"))

The series is broadcast to all scenarios in the group.

bind and group-from-series

Constructs a new group by binding multiple series together as named scenarios:

;; create scenarios from individual series
(group-from-series
    (bind "high" (series "forecast_high"))
    (bind "mid" (series "forecast_mid"))
    (bind "low" (series "forecast_low")))

Each bind creates a named column in the resulting group. This is particularly useful for creating scenario-based groups from individual forecast series.

Creating and Using Group Formulas#

Register a group formula using the API:

>>> tsa.register_group_formula(
...     'eu_production_mwh',
...     '(group-add (group "france_production_mwh") '
...     '           (group "germany_production_mwh") '
...     '           (group "spain_production_mwh"))'
... )

Once registered, use it like any other group:

>>> df = tsa.group_get('eu_production_mwh')
>>> print(df.head())
                      low    mid    high
2025-01-01 00:00:00  48.5   52.2   55.8
2025-01-02 00:00:00  49.1   52.7   56.4

Group Formula Metadata#

Group formulas can have metadata like primary groups:

>>> tsa.update_group_metadata('eu_production_mwh', {
...     'unit': 'mwh',
...     'frequency': 'daily',
...     'scenarios': 'energy production forecasts'
... })

Formula-Specific Methods#

Several methods are specific to group formulas:

>>> # get the formula definition
>>> formula = tsa.group_formula('eu_production_mwh')
>>> print(formula)
'(group-add (group "france_production_mwh") ...)'

>>> # get expanded formula (resolving nested formulas)
>>> expanded = tsa.group_formula('eu_production_mwh', expanded=True)

>>> # test a formula without registering
>>> result = tsa.group_eval_formula(
...     '(group-add (group "test1") (group "test2"))'
... )

Formula Bindings: Creating Groups from Series Formulas#

The bindings system is a powerful mechanism that transforms series formulas into group formulas by replacing selected series references with groups. This allows you to apply the same calculation logic across multiple scenarios simultaneously.

Core Concept

Given a series formula that combines multiple series, you can “bind” some of those series to groups. The formula then evaluates column-wise across the bound groups, producing a group as output.

The Family Concept

A “family” groups together series/groups that play equivalent roles in the formula. Key rules:

  • All groups within a family must have the same number of columns (scenarios)

  • The formula is evaluated column-by-column across families

  • Column 1 of each group in a family is used together, then column 2, etc.

Example: Weather Scenario Modeling

Consider a formula that combines temperature and wind data with adjustments:

# original series formula
tsa.register_formula(
    'weather_index',
    '(add (mul (series "temp_base") 0.7) '
    '     (mul (series "wind_base") 0.3) '
    '     (series "seasonal_adjustment"))'
)

Now create groups for different weather scenarios:

# temperature scenarios (3 scenarios: cold, normal, warm)
temp_scenarios = pd.DataFrame({
    'cold': [5, 6, 7],
    'normal': [15, 16, 17],
    'warm': [25, 26, 27]
}, index=dates)
tsa.group_replace('temp_scenarios', temp_scenarios, 'operator')

# wind scenarios (must also have 3 scenarios to match)
wind_scenarios = pd.DataFrame({
    'calm': [5, 5, 5],
    'moderate': [15, 15, 15],
    'strong': [30, 30, 30]
}, index=dates)
tsa.group_replace('wind_scenarios', wind_scenarios, 'operator')

Bind the formula to create a group:

# define the binding
binding = pd.DataFrame([
    ['temp_base', 'temp_scenarios', 'weather'],
    ['wind_base', 'wind_scenarios', 'weather'],
    # seasonal_adjustment remains a regular series
], columns=['series', 'group', 'family'])

# register the bound group
tsa.register_formula_bindings(
    'weather_index_scenarios',  # new group name
    'weather_index',            # source formula
    binding
)

Result:

>>> result = tsa.group_get('weather_index_scenarios')
>>> print(result.columns)
['scenario_1', 'scenario_2', 'scenario_3']

# each column computed as:
# scenario_1: temp_scenarios['cold'] * 0.7 + wind_scenarios['calm'] * 0.3 + seasonal_adjustment
# scenario_2: temp_scenarios['normal'] * 0.7 + wind_scenarios['moderate'] * 0.3 + seasonal_adjustment
# scenario_3: temp_scenarios['warm'] * 0.7 + wind_scenarios['strong'] * 0.3 + seasonal_adjustment

Multiple Families Example

Families are useful when you want different binding strategies for different parts of the formula:

# formula with different types of inputs
tsa.register_formula(
    'complex_calc',
    '(add (series "regional_data") '
    '     (mul (series "global_factor") (series "local_factor")))'
)

# binding with two families
binding = pd.DataFrame([
    ['regional_data', 'regions_group', 'regions'],      # 5 regions
    ['local_factor', 'local_scenarios', 'scenarios'],   # 3 scenarios
    # global_factor remains unbound (same for all combinations)
], columns=['series', 'group', 'family'])

This would create a group with 15 columns (5 regions × 3 scenarios), exploring all combinations.

Key Points

  • Unbound series in the formula remain as series (broadcast to all columns)

  • All groups in the same family must have identical column counts

  • The binding creates a “bound” type group that dynamically evaluates the formula

  • Use bindings_for(name) to retrieve the binding configuration for a group

Common Use Cases#

Groups are ideal for:

  • Ensemble Forecasts: Scenarized stochastic weather scenarios

They may (depending on various factors) be interesting with:

  • Financial Data: OHLC (Open, High, Low, Close) price data

  • IoT Sensors: Multiple sensor readings from the same device

  • Economic Indicators: Related economic metrics that should be kept in sync

In many cases, it will be more convenient to handle data acquisition as individual time series, and then create a group from them (using the group-from-series formulaic operator).

Group API Reference#

Primary Group Operations#

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:
  • uri (str)

  • namespace (str)

  • tshclass (type)

group_exists(name)

Checks the existence of a group with a given name.

Parameters:

name (str)

Return type:

bool

group_type(name)

Return the type of a group, for instance ‘primary’, ‘formula’ or ‘bound’

Parameters:

name (str)

Return type:

str

group_get(name, revision_date=None, from_value_date=None, to_value_date=None)

Get a group by name.

By default one gets the latest version.

By specifying revision_date one can get the closest version matching the given date.

The from_value_date and to_value_date parameters permit to specify a narrower date range (by default all points are provided).

If the group does not exists, a None is returned.

Parameters:
  • name (str)

  • revision_date (Timestamp | None)

  • from_value_date (Timestamp | None)

Return type:

DataFrame | None

group_insertion_dates(name, from_insertion_date=None, to_insertion_date=None)

Get the list of all insertion dates for any given group

Parameters:
  • name (str)

  • from_insertion_date (Timestamp | None)

  • to_insertion_date (Timestamp | None)

Return type:

List[Timestamp]

group_history(name, from_value_date=None, to_value_date=None, from_insertion_date=None, to_insertion_date=None)

Get all versions of a group in the form of a dict from insertion dates to dataframe.

It is possible to restrict the versions range by specifying from_insertion_date and to_insertion_date.

It is possible to restrict the values range by specifying from_value_date and to_value_date.

Parameters:
  • name (str)

  • from_value_date (Timestamp | None)

  • to_value_date (Timestamp | None)

  • from_insertion_date (Timestamp | None)

  • to_insertion_date (Timestamp | None)

Return type:

Dict[Timestamp, DataFrame]

group_replace(name, df, author, insertion_date=None)

Replace a group named by <name> with the input dataframe.

This creates a new version of the group. The group is completely replaced with the provided values.

The author is mandatory. The metadata dictionary allows to associate any metadata with the new group revision.

It is possible to force an insertion_date, which can only be higher than the previous insertion_date.

Parameters:
  • name (str)

  • df (DataFrame)

  • author (str)

  • insertion_date (Timestamp | None)

Return type:

None

group_delete(name)

Delete a group.

This is an irreversible operation.

Parameters:

name (str)

Return type:

None

group_internal_metadata(name)

Return a group internal metadata dictionary.

Parameters:

name (str)

Return type:

Dict[str, Any] | None

group_metadata(name, all=False)

Return a group metadata dictionary.

Parameters:
  • name (str)

  • all (bool)

Return type:

Dict[str, Any] | None

update_group_metadata(name, meta)

Update a group metadata with a dictionary from strings to anything json-serializable.

Parameters:
  • name (str)

  • meta (Dict[str, Any])

Return type:

None

group_catalog()

Produces a catalog of all groups in the form of a mapping from source to a list of (name, kind) pair.

Return type:

Dict[Tuple[str, str], List[Tuple[str, str]]]

Formula Group Operations#

class mainsource(uri, namespace='tsh', tshclass=<class 'tshistory.tsio.timeseries'>, othersources=None)

API façade for the main source (talks directly to the storage)

The api documentation is carried by this object. The http client provides exactly the same methods.

Parameters:
  • uri (str)

  • namespace (str)

  • tshclass (type)

register_group_formula(name, formula)

Define a group as a named formula.

You can use any operator (including those working on series) provided the top-level expression is a group.

Parameters:
  • name (str)

  • formula (str)

Return type:

None

group_formula(name, expanded=False)

Get the group formula associated with a name.

Parameters:
  • name (str)

  • expanded (bool)

Return type:

str | None

register_formula_bindings(groupname, formulaname, bindings)

Define a group by association of an existing series formula and a bindings object.

The designated series formula will be then interpreted as a group formula.

And the bindings object provides mappings that tell which components of the formula are to be interpreted as groups.

Given a formula named “form1”:

(add (series “foo”) (series “bar”) (series “quux”))

… were one wants to treat “foo” and “bar” as groups. The binding is expressed as a dataframe:

binding = pd.DataFrame( [

[‘foo’, ‘foo-group’, ‘group’], [‘bar’, ‘bar-group’, ‘group’],

], columns=(‘series’, ‘group’, ‘family’)

)

The complete registration looks like:

register_formula_bindings(

‘groupname’, ‘form1’, pd.DataFrame( [

[‘foo’, ‘foo-group’, ‘group’], [‘bar’, ‘bar-group’, ‘group’],

], columns=(‘series’, ‘group’, ‘family’)

))

Within a given family, all groups must have the same number of members (series) and the member roles are considered equivalent (e.g. meteorological scenarios).

Parameters:
  • groupname (str)

  • formulaname (str)

  • bindings (DataFrame)

Return type:

None