Best Practices
==============

This guide provides best practices for working effectively with the
Timeseries Refinery, based on production experience and proven
patterns.

Naming Conventions
------------------

**Series Naming Guidelines**

Follow a consistent hierarchical structure using dots as separators:

.. code:: text

    domain.category.subcategory.source.location.unit.frequency

Examples:

* ``energy.electricity.price.spot.france.eur_mwh.h`` - Hourly French electricity spot prices
* ``weather.temperature.air.meteo_france.paris.celsius.d`` - Daily temperature in Paris
* ``finance.fx.rate.ecb.eur_usd.rate.d`` - Daily EUR/USD exchange rate

**Guidelines:**

* Use lowercase letters and underscores for multi-word components
* Keep names descriptive but concise (aim for 6-8 components maximum)
* Place most general categories first, most specific last
* Include units and frequency when relevant
* Avoid abbreviations unless they are standard in your domain

**Metadata Naming Standards**

Use consistent metadata keys across your organization:

**Standard Keys:**

* ``source`` - Data provider or system of origin
* ``unit`` - Measurement unit
* ``frequency`` - Native data frequency
* ``geography`` - Geographic scope or location
* ``category`` - Business domain classification
* ``quality`` - Data quality indicators
* ``contact`` - Responsible person or team

**Guidelines:**

* Use snake_case for metadata keys
* Prefer established vocabularies when possible
* Document your metadata schema
* Keep values consistent (use controlled vocabularies)
* **Metadata keys should remain stable** - avoid frequently changing metadata values
* Use ``tsa.insertion_dates()`` to get timing information instead of storing it in metadata
* Per update metadata can be provided: ``tsa.update(name, series,
  author, metadata={...})`` to document what is going on in a specific
  revision

Data Governance
---------------

**Team Collaboration Guidelines**

**Establish Clear Ownership:**

* Assign data stewards for each domain or category

* Use the ``contact`` metadata field: ``tsa.update_metadata('series',
  {'contact': 'energy.team@company.com'})``
* Find series by owner: ``tsa.find('(by.metaitem "contact" "energy.team")')``
* Use supervision with ``tsa.update('series', data, 'author', manual=True)``
  to track manual interventions

**Communication Protocols:**

* All formula changes, updates, and metadata modifications are automatically logged
* Use ``tsa.history('series', diffmode=True)`` (sparingly though, it
  is an expensive api point) to see what changed between versions
* Set up ``tswatch`` alerts for critical series that stop updating
* Use the web UI's series browser to explore dependencies before changes

**Change Management Process**

**Before Making Changes:**

* Test formulas with ``tsa.eval_formula('(+ (series "a") (series "b"))')`` before registering
* Check ``tsa.formula_depth('complex_formula')`` to understand computational complexity
* Use the formula editor in the web UI for validation and testing

**Implementation:**

* Formula registration is automatically versioned: ``tsa.register_formula('name', 'new_formula')``
* Use cache policies for performance - see :ref:`getting_started/tutorials/advanced:Formulas: when to use a cache/materialized view`
* Leverage the rework task system for scheduled updates - see :ref:`getting_started/tutorials/advanced:Tasks system: how to organize and schedule tasks`
* Use the mini scraping framework to link scrapers to tasks and series (see ``scrap.py`` and ``refresh`` task)

**After Changes:**

* Use ``tsa.get('series', revision_date=timestamp)`` to compare before/after states
* Update dashboard configurations if series structure changed
* Monitor cache performance and policies

**Data Quality Standards**

**Validation Using Refinery Features:**

* Use ``tsa.supervision_status('series')`` to check if manual overrides exist
* Implement quality checks in rework tasks that run on schedule
* Use ``tsa.edited('series')`` to identify series with manual interventions
* Store quality indicators in metadata: ``{'quality': 'validated', 'source': 'verified'}``

**Audit Trail Management:**

* Every ``tsa.update()``, ``tsa.update_metadata()``, and ``tsa.replace_metadata()`` is automatically logged
* Use ``tsa.log('series')`` to see change history (and per-update metadata)
* Supervision via ``manual=True`` maintains audit trail for corrections
* ``tsa.insertion_dates('series')`` shows when data was added to the system

**Data Lineage:**

* Use ``tsa.formula('computed_series')`` to see formula definition
* ``tsa.source('series')`` identifies the database source
* Formula dependencies are tracked automatically
* Web UI provides visual dependency graphs for complex formulas

Formula Development
-------------------

**Data Medallion Architecture for Formulas**

**Bronze Layer - Raw Ingestion:**

* Direct from sources: ``energy.prices.nordpool.raw``, ``weather.meteo.paris.raw``
* No processing, preserve original structure and timestamps

**Silver Layer - Cleaned and Standardized:**

* Handle data quality issues here: missing values, duplicates, basic validation
* Resampling to standard frequencies happens here,
  example: ``(resample (series "energy.prices.raw") "H")``
* Standardized units, timezone-aware, validated
* Outlier removal, gap filling
* Business rule application: ``(slice ... #:from (date "2020-01-01"))`` for data quality cutoffs

**Gold Layer - Business Logic:**

* ``energy.daily_average_price`` - business KPIs and aggregations, ML model inputs
* ``energy.price_forecast`` - ML model outputs
* ``trading.settlement_prices`` - complex business calculations
* Cross-domain joins and enrichment

**Platinum Layer - Presentation:**

* ``dashboard.energy.price_summary`` - optimized for specific dashboards
* ``api.energy.latest_prices`` - formatted for external APIs
* User-specific views and permissions

**Formula Composition Strategies by Layer**

**Bronze → Silver Transformations:**

* Focus on data quality
* Standardization
* Basic gap filling
* Resampling for stable time granularities

**Silver → Gold Business Logic:**

* Domain calculations: ``(/ (+ (series "clean.demand") (series "clean.losses")) (series "clean.capacity"))``
* Aggregations (by geography or other domains) at different levels
* Cross-referencing: ``(priority (series "validated") (series "estimated"))``

**Gold → Platinum Optimization:**

* Performance caching for heavy calculations
* User-specific filters and permissions
* Dashboard-optimized time ranges and granularity

**Production Architecture Patterns**

**The "Source of Truth" Pattern:**

* Each business concept has ONE gold-layer source of truth
* All downstream uses reference this canonical series
* Example: ``computed.energy.official_price`` used by all trading, reporting, and billing systems

**The "Temporal Consistency" Pattern:**

* Maintain consistent time horizons across related series
* ``computed.energy.rolling_30d_average`` and ``computed.energy.rolling_30d_volatility``
* Use shared time windows: ``(rolling ... #:window "30D" #:center False)``

**The "Lineage Preservation" Pattern:**

* Embed source attribution in formula names
* ``computed.energy.price.from_nordpool_entsoe`` vs ``computed.energy.price.from_local_market``
* Completes tracking data provenance through the medallion layers

**Anti-Patterns from Production Experience**

**The "Layer Bypass" Anti-Pattern:**

* Gold formulas directly reading raw data: ``(series "messy_data.raw")``
* Skips cleaning and validation, leads to hazardous results
* Always flow through the medallion: raw → clean → computed

**The "Mixed-Layer Formula" Anti-Pattern:**

* One formula mixing concerns: cleaning + business logic + presentation formatting
* Makes debugging and maintenance difficult
* Keep each formula focused on one medallion layer's responsibilities