Best Practices#

This guide provides best practices for working effectively with the Timeseries Refinery, based on production experience and proven patterns.

Naming Conventions#

Series Naming Guidelines

Follow a consistent hierarchical structure using dots as separators:

domain.category.subcategory.source.location.unit.frequency

Examples:

  • energy.electricity.price.spot.france.eur_mwh.h - Hourly French electricity spot prices

  • weather.temperature.air.meteo_france.paris.celsius.d - Daily temperature in Paris

  • finance.fx.rate.ecb.eur_usd.rate.d - Daily EUR/USD exchange rate

Guidelines:

  • Use lowercase letters and underscores for multi-word components

  • Keep names descriptive but concise (aim for 6-8 components maximum)

  • Place most general categories first, most specific last

  • Include units and frequency when relevant

  • Avoid abbreviations unless they are standard in your domain

Metadata Naming Standards

Use consistent metadata keys across your organization:

Standard Keys:

  • source - Data provider or system of origin

  • unit - Measurement unit

  • frequency - Native data frequency

  • geography - Geographic scope or location

  • category - Business domain classification

  • quality - Data quality indicators

  • contact - Responsible person or team

Guidelines:

  • Use snake_case for metadata keys

  • Prefer established vocabularies when possible

  • Document your metadata schema

  • Keep values consistent (use controlled vocabularies)

  • Metadata keys should remain stable - avoid frequently changing metadata values

  • Use tsa.insertion_dates() to get timing information instead of storing it in metadata

  • Per update metadata can be provided: tsa.update(name, series, author, metadata={...}) to document what is going on in a specific revision

Data Governance#

Team Collaboration Guidelines

Establish Clear Ownership:

  • Assign data stewards for each domain or category

  • Use the contact metadata field: tsa.update_metadata('series', {'contact': 'energy.team@company.com'})

  • Find series by owner: tsa.find('(by.metaitem "contact" "energy.team")')

  • Use supervision with tsa.update('series', data, 'author', manual=True) to track manual interventions

Communication Protocols:

  • All formula changes, updates, and metadata modifications are automatically logged

  • Use tsa.history('series', diffmode=True) (sparingly though, it is an expensive api point) to see what changed between versions

  • Set up tswatch alerts for critical series that stop updating

  • Use the web UI’s series browser to explore dependencies before changes

Change Management Process

Before Making Changes:

  • Test formulas with tsa.eval_formula('(+ (series "a") (series "b"))') before registering

  • Check tsa.formula_depth('complex_formula') to understand computational complexity

  • Use the formula editor in the web UI for validation and testing

Implementation:

After Changes:

  • Use tsa.get('series', revision_date=timestamp) to compare before/after states

  • Update dashboard configurations if series structure changed

  • Monitor cache performance and policies

Data Quality Standards

Validation Using Refinery Features:

  • Use tsa.supervision_status('series') to check if manual overrides exist

  • Implement quality checks in rework tasks that run on schedule

  • Use tsa.edited('series') to identify series with manual interventions

  • Store quality indicators in metadata: {'quality': 'validated', 'source': 'verified'}

Audit Trail Management:

  • Every tsa.update(), tsa.update_metadata(), and tsa.replace_metadata() is automatically logged

  • Use tsa.log('series') to see change history (and per-update metadata)

  • Supervision via manual=True maintains audit trail for corrections

  • tsa.insertion_dates('series') shows when data was added to the system

Data Lineage:

  • Use tsa.formula('computed_series') to see formula definition

  • tsa.source('series') identifies the database source

  • Formula dependencies are tracked automatically

  • Web UI provides visual dependency graphs for complex formulas

Formula Development#

Data Medallion Architecture for Formulas

Bronze Layer - Raw Ingestion:

  • Direct from sources: energy.prices.nordpool.raw, weather.meteo.paris.raw

  • No processing, preserve original structure and timestamps

Silver Layer - Cleaned and Standardized:

  • Handle data quality issues here: missing values, duplicates, basic validation

  • Resampling to standard frequencies happens here, example: (resample (series "energy.prices.raw") "H")

  • Standardized units, timezone-aware, validated

  • Outlier removal, gap filling

  • Business rule application: (slice ... #:from (date "2020-01-01")) for data quality cutoffs

Gold Layer - Business Logic:

  • energy.daily_average_price - business KPIs and aggregations, ML model inputs

  • energy.price_forecast - ML model outputs

  • trading.settlement_prices - complex business calculations

  • Cross-domain joins and enrichment

Platinum Layer - Presentation:

  • dashboard.energy.price_summary - optimized for specific dashboards

  • api.energy.latest_prices - formatted for external APIs

  • User-specific views and permissions

Formula Composition Strategies by Layer

Bronze → Silver Transformations:

  • Focus on data quality

  • Standardization

  • Basic gap filling

  • Resampling for stable time granularities

Silver → Gold Business Logic:

  • Domain calculations: (/ (+ (series "clean.demand") (series "clean.losses")) (series "clean.capacity"))

  • Aggregations (by geography or other domains) at different levels

  • Cross-referencing: (priority (series "validated") (series "estimated"))

Gold → Platinum Optimization:

  • Performance caching for heavy calculations

  • User-specific filters and permissions

  • Dashboard-optimized time ranges and granularity

Production Architecture Patterns

The “Source of Truth” Pattern:

  • Each business concept has ONE gold-layer source of truth

  • All downstream uses reference this canonical series

  • Example: computed.energy.official_price used by all trading, reporting, and billing systems

The “Temporal Consistency” Pattern:

  • Maintain consistent time horizons across related series

  • computed.energy.rolling_30d_average and computed.energy.rolling_30d_volatility

  • Use shared time windows: (rolling ... #:window "30D" #:center False)

The “Lineage Preservation” Pattern:

  • Embed source attribution in formula names

  • computed.energy.price.from_nordpool_entsoe vs computed.energy.price.from_local_market

  • Completes tracking data provenance through the medallion layers

Anti-Patterns from Production Experience

The “Layer Bypass” Anti-Pattern:

  • Gold formulas directly reading raw data: (series "messy_data.raw")

  • Skips cleaning and validation, leads to hazardous results

  • Always flow through the medallion: raw → clean → computed

The “Mixed-Layer Formula” Anti-Pattern:

  • One formula mixing concerns: cleaning + business logic + presentation formatting

  • Makes debugging and maintenance difficult

  • Keep each formula focused on one medallion layer’s responsibilities