Timeseries Store Usage ====================== Table of Contents ----------------- - `Creating a series`_ - `Updating a series`_ - `Working with versions`_ - `Point and version erasure`_ - `Retrieving history`_ - `Working with metadata`_ - `Replacing a series entirely`_ - `Checking series existence`_ - `Renaming a series`_ - `Deleting a series`_ - `Finding series`_ - `Getting series information`_ - `Working with logs`_ - `Staircase operations`_ - `Time Series Operations API`_ Creating a series ----------------- Here’s a simple example: .. code:: python >>> import pandas as pd >>> from tshistory.api import timeseries >>> >>> tsa = timeseries('postgresql://me:password@localhost/mydb') >>> >>> series = pd.Series([1, 2, 3], ... pd.date_range(start=pd.Timestamp(2017, 1, 1), ... freq='D', periods=3)) # db insertion >>> tsa.update('my_series', series, 'babar@pythonian.fr') ... 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 3.0 Freq: D, Name: my_series, dtype: float64 # note how our integers got turned into floats # (there are no provisions to handle integer series as of today) # retrieval >>> tsa.get('my_series') ... 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 3.0 Name: my_series, dtype: float64 Note that we generally adopt the convention to name the time series api object ``tsa``. Updating a series ----------------- The ``update`` method is the fundamental operation for time series management, designed for incrementally updating series as new data arrives over time. This is good. Now, let's insert more: .. code:: python >>> series = pd.Series([2, 7, 8, 9], ... pd.date_range(start=pd.Timestamp(2017, 1, 2), ... freq='D', periods=4)) # db insertion >>> tsa.update('my_series', series, 'babar@pythonian.fr') ... 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 You get back the *new information* you put inside and this is why the `2` doesn't appear (it was already put there in the first step). .. code:: python >>> tsa.get('my_series') ... 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 It is important to note that the third value was *replaced*, and the two last values were just *appended*. As noted the point at ``2017-1-2`` wasn't a new information so it was just ignored. Working with versions --------------------- The ``insertion_dates`` method is one of the three fundamental API points (along with ``get`` and ``update``). Every update creates a new version of the series, and this method returns when each version was created: .. code:: python >>> tsa.insertion_dates('my_series') [pd.Timestamp('2018-09-26 17:10:36.988920+02:00'), pd.Timestamp('2018-09-26 17:12:54.508252+02:00')] >>> # get insertions within a date range >>> tsa.insertion_dates('my_series', ... from_insertion_date=pd.Timestamp('2018-09-26 17:11:00+02:00')) [pd.Timestamp('2018-09-26 17:12:54.508252+02:00')] These timestamps identify the versions of your series and are what you use with ``get`` to retrieve any past state. Point and version erasure -------------------------- Point erasure with NaN ~~~~~~~~~~~~~~~~~~~~~~~ You can erase specific points in a series by updating with NaN values: .. code:: python >>> # erase the point at 2017-01-02 >>> erasure = pd.Series([np.nan], index=[pd.Timestamp('2017-01-02')]) >>> tsa.update('my_series', erasure, 'cleanup@example.com', keepnans=True) >>> # by default, erased points are not shown >>> tsa.get('my_series') 2017-01-01 1.0 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 >>> # use keepnans=True to see erased points >>> tsa.get('my_series', keepnans=True) 2017-01-01 1.0 2017-01-02 NaN 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 Version erasure with strip ~~~~~~~~~~~~~~~~~~~~~~~~~~ **WARNING: This is a DESTRUCTIVE operation that should only be used as a LAST RESORT.** The ``strip`` method permanently removes all versions after a given insertion date: .. code:: python >>> # check existing versions >>> tsa.insertion_dates('my_series') [pd.Timestamp('2018-09-26 17:10:36.988920+02:00'), pd.Timestamp('2018-09-26 17:12:54.508252+02:00'), pd.Timestamp('2018-09-26 17:15:00.000000+02:00')] >>> # DANGER: permanently remove versions after 17:12 >>> tsa.strip('my_series', pd.Timestamp('2018-09-26 17:12:00+02:00')) >>> # versions are gone forever >>> tsa.insertion_dates('my_series') [pd.Timestamp('2018-09-26 17:10:36.988920+02:00')] This operation cannot be undone. Use with extreme caution. Retrieving history ------------------ We can access the whole history (or parts of it) in one call: .. code:: python >>> history = tsa.history('my_series') ... >>> >>> for idate, series in history.items(): # it's a dict ... print('insertion date:', idate) ... print(series) ... insertion date: 2018-09-26 17:10:36.988920+02:00 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 3.0 Name: my_series, dtype: float64 insertion date: 2018-09-26 17:12:54.508252+02:00 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 Note how this shows the full serie state for each insertion date. Also the insertion date is timzeone aware. Specific versions of a series can be retrieved individually using the ``get`` method with the ``revision_date`` parameter (using timestamps obtained from ``insertion_dates``): .. code:: python >>> tsa.get('my_series', revision_date=pd.Timestamp('2018-09-26 17:11+02:00')) ... 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 3.0 Name: my_series, dtype: float64 >>> >>> tsa.get('my_series', revision_date=pd.Timestamp('2018-09-26 17:14+02:00')) ... 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 It is possible to retrieve only the differences between successive insertions: .. code:: python >>> diffs = tsa.history('my_series', diffmode=True) ... >>> for idate, series in diffs.items(): ... print('insertion date:', idate) ... print(series) ... insertion date: 2018-09-26 17:10:36.988920+02:00 2017-01-01 1.0 2017-01-02 2.0 2017-01-03 3.0 Name: my_series, dtype: float64 insertion date: 2018-09-26 17:12:54.508252+02:00 2017-01-03 7.0 2017-01-04 8.0 2017-01-05 9.0 Name: my_series, dtype: float64 Working with metadata --------------------- Series can have metadata attached to help document and organize them: .. code:: python >>> tsa.update_metadata('temperature_sensor', { ... 'unit': 'celsius', ... 'location': 'building_a', ... 'sensor_type': 'PT100', ... 'frequency': 'hourly' ... }) >>> tsa.metadata('temperature_sensor') {'unit': 'celsius', 'location': 'building_a', 'sensor_type': 'PT100', 'frequency': 'hourly'} >>> # update metadata (merges with existing) >>> tsa.update_metadata('temperature_sensor', {'calibrated': '2023-01-15'}) >>> tsa.metadata('temperature_sensor') {'unit': 'celsius', 'location': 'building_a', 'sensor_type': 'PT100', 'frequency': 'hourly', 'calibrated': '2023-01-15'} >>> # replace all metadata >>> tsa.replace_metadata('temperature_sensor', {'unit': 'fahrenheit', 'status': 'active'}) >>> tsa.metadata('temperature_sensor') {'unit': 'fahrenheit', 'status': 'active'} >>> # view metadata history >>> tsa.old_metadata('temperature_sensor') [(pd.Timestamp('2023-01-01 10:00:00+00:00'), {'unit': 'celsius', 'location': 'building_a', 'sensor_type': 'PT100', 'frequency': 'hourly'}), (pd.Timestamp('2023-01-02 11:00:00+00:00'), {'unit': 'celsius', 'location': 'building_a', 'sensor_type': 'PT100', 'frequency': 'hourly', 'calibrated': '2023-01-15'}), (pd.Timestamp('2023-01-03 09:00:00+00:00'), {'unit': 'fahrenheit', 'status': 'active'})] Beyond managing metadata for individual series, you can also discover what metadata keys are used across all series in your refinery instance: .. code:: python >>> # list all metadata keys in use >>> tsa.list_metadata_keys() ['calibrated', 'frequency', 'location', 'sensor_type', 'status', 'unit'] Replacing a series entirely ---------------------------- In specific circumstances, you may need to completely replace a series - for example, when you are working with forecast data and you need only the last updated forecast. The ``replace`` method provides this capability: .. code:: python >>> # create initial series >>> series = pd.Series([10, 20, 30], ... pd.date_range(start=pd.Timestamp(2025, 1, 2), ... freq='D', periods=3)) >>> tsa.update('stock_levels_forecast', series, 'operator@example.com', insertion_date=pd.Timestamp(2025,1,1)) >>> # later, replace the entire series with a new forecast >>> new_series = pd.Series([70, 50, 60], ... pd.date_range(start=pd.Timestamp(2025, 1, 3), ... freq='D', periods=3)) >>> tsa.replace('stock_levels_forecast', new_series, 'admin@example.com', insertion_date=pd.Timestamp(2025,1,2)) >>> tsa.get('stock_levels_forecast') 2025-01-03 70.0 2025-01-04 50.0 2025-01-05 60.0 Freq: D, Name: stock_levels_forecast, dtype: float64 The ``replace`` method completely overwrites the series with new data, removing any points not present in the new series. .. note:: It's important to note that ``replace`` preserves the complete version history. The replace operation appears as a new insertion date in the series history: .. code:: python >>> tsa.insertion_dates('stock_levels_forecast') [pd.Timestamp('2025-01-01 00:00:00+0000', tz='UTC'), # original update pd.Timestamp('2025-01-02 00:00:00+0000', tz='UTC')] # replace operation >>> # history shows both the original and replaced versions >>> history = tsa.history('stock_levels_forecast') >>> for idate, series in history.items(): ... print(f'insertion date: {idate}') ... print(series) ... insertion date: 2025-01-01 00:00:00+00:00 2025-01-02 10.0 2025-01-03 20.0 2025-01-04 30.0 Name: stock_levels_forecast, dtype: float64 insertion date: 2025-01-02 00:00:00+00:00 2025-01-03 70.0 2025-01-04 50.0 2025-01-05 60.0 Name: stock_levels_forecast, dtype: float64 This means you can always retrieve previous states of the series before the replace operation using ``revision_date``. Checking series existence -------------------------- To check if a series exists: .. code:: python >>> tsa.exists('my_series') True >>> tsa.exists('non_existent') False Renaming a series ----------------- To rename a series: .. code:: python >>> tsa.rename('old_name', 'new_name') >>> tsa.exists('old_name') False >>> tsa.exists('new_name') True Deleting a series ----------------- To remove a series from the database: .. code:: python >>> tsa.delete('my_series') >>> tsa.get('my_series') # returns None Finding series -------------- To find series in the database: .. code:: python >>> # find all series >>> tsa.find() ['my_series', 'temperature_fr', 'calculated_avg', 'temperature_paris'] >>> # find with metadata >>> results = tsa.find('(by.name "temperature")', meta=True) >>> results ['temperature_fr', 'temperature_paris'] >>> # access directly the metadata of found series >>> results[0].meta {'unit': 'celsius', 'location': 'fr'} See the :ref:`search_language` documentation for comprehensive query capabilities. The older ``catalog()`` method is still available but returns everything at once from all sources in a slightly cumbersome structure: .. code:: python >>> tsa.catalog() {'local': [('my_series', 'primary'), ('temperature_fr', 'primary'), ('temperature_paris', 'primary')], 'remote': [('calculated_avg', 'formula')]} The ``find()`` API is generally preferred for its flexibility. Getting series information --------------------------- To get detailed information about a series: .. code:: python >>> tsa.type('my_series') 'primary' >>> tsa.interval('my_series') (pd.Timestamp('2025-01-01'), pd.Timestamp('2025-01-05')) >>> tsa.source('my_series') 'local' >>> # get inferred frequency >>> tsa.inferred_freq('my_series') 'D' # daily frequency >>> # get various informations with internal_metadata. >>> # tzawareness, value type (float, string) and supervision_status >>> tsa.internal_metadata('my_series') {'left': '2025-01-01T00:00:00', 'right': '2025-01-05T00:00:00', 'tzaware': False, 'tablename': 'my_series', 'index_type': 'datetime64[ns]', 'value_type': 'float64', 'index_dtype': '>> tsa.log('my_series', limit=5) [{'date': pd.Timestamp('2018-09-26 17:10:36.988920+02:00'), 'author': 'babar@pythonian.fr', 'meta': {}, 'rev': 1}, {'date': pd.Timestamp('2018-09-26 17:12:54.508252+02:00'), 'author': 'babar@pythonian.fr', 'meta': {}, 'rev': 2}] Staircase operations -------------------- The staircase operations are specialized methods for forecast backtesting and time-consistent analysis. They reconstruct series as they were available at specific lead times, which is essential for evaluating forecast accuracy without look-ahead bias. The ``staircase`` method shows what data was available at ``(value_date - delta)``. For each value date, it looks back delta time to find what was known at that historical moment: .. code:: python >>> staircase = tsa.staircase('forecast_series', delta=pd.Timedelta(days=1)) The ``block_staircase`` method is more sophisticated. It rebuilds a series from successive blocks of history taken at regular revision intervals. Each block corresponds to data from a specific revision date. This is useful for analyzing how forecasts evolve over time with a consistent publication schedule. For example, with daily revisions at 10am and a 24-hour maturity offset, the method assembles blocks where each day's values come from the revision published 24 hours before: .. code:: python >>> bsc = tsa.block_staircase( ... name='forecast_series', ... from_value_date=pd.Timestamp('2020-01-03', tz='utc'), ... to_value_date=pd.Timestamp('2020-01-05', tz='utc'), ... revision_freq={'days': 1}, ... revision_time={'hour': 10}, ... revision_tz='UTC', ... maturity_offset={'hours': 24}, ... maturity_time={'hour': 4} ... ) The result is a series where different time periods come from different revisions, allowing you to see how the forecast performed with a consistent lead time across the entire period. Time Series Operations API --------------------------- The time series API provides comprehensive methods for managing time series data: .. autoclass:: tshistory.api.mainsource :noindex: :member-order: bysource :members: get, update, replace, delete, exists, rename, insertion_dates, history, strip, log, metadata, internal_metadata, update_metadata, replace_metadata, old_metadata, list_metadata_keys, type, source, interval, inferred_freq, find, catalog, staircase, block_staircase