Best Practices#

This guide provides best practices for working effectively with the Timeseries Refinery, based on production experience and proven patterns.

Naming Conventions#

Series Naming Guidelines

Follow a consistent hierarchical structure using dots as separators:

domain.category.subcategory.source.location.unit.frequency

Examples:

energy.electricity.price.spot.france.eur_mwh.h - Hourly French electricity spot prices
weather.temperature.air.meteo_france.paris.celsius.d - Daily temperature in Paris
finance.fx.rate.ecb.eur_usd.rate.d - Daily EUR/USD exchange rate

Guidelines:

Use lowercase letters and underscores for multi-word components
Keep names descriptive but concise (aim for 6-8 components maximum)
Place most general categories first, most specific last
Include units and frequency when relevant
Avoid abbreviations unless they are standard in your domain

Metadata Naming Standards

Use consistent metadata keys across your organization:

Standard Keys:

source - Data provider or system of origin
unit - Measurement unit
frequency - Native data frequency
geography - Geographic scope or location
category - Business domain classification
quality - Data quality indicators
contact - Responsible person or team

Guidelines:

Use snake_case for metadata keys
Prefer established vocabularies when possible
Document your metadata schema
Keep values consistent (use controlled vocabularies)
Metadata keys should remain stable - avoid frequently changing metadata values
Use tsa.insertion_dates() to get timing information instead of storing it in metadata
Per update metadata can be provided: tsa.update(name, series, author, metadata={...}) to document what is going on in a specific revision

Data Governance#

Team Collaboration Guidelines

Establish Clear Ownership:

Assign data stewards for each domain or category
Use the contact metadata field: tsa.update_metadata('series', {'contact': 'energy.team@company.com'})
Find series by owner: tsa.find('(by.metaitem "contact" "energy.team")')
Use supervision with tsa.update('series', data, 'author', manual=True) to track manual interventions

Communication Protocols:

All formula changes, updates, and metadata modifications are automatically logged
Use tsa.history('series', diffmode=True) (sparingly though, it is an expensive api point) to see what changed between versions
Set up tswatch alerts for critical series that stop updating
Use the web UI’s series browser to explore dependencies before changes

Change Management Process

Before Making Changes:

Test formulas with tsa.eval_formula('(+ (series "a") (series "b"))') before registering
Check tsa.formula_depth('complex_formula') to understand computational complexity
Use the formula editor in the web UI for validation and testing

Implementation:

Formula registration is automatically versioned: tsa.register_formula('name', 'new_formula')
Use cache policies for performance - see Formulas: when to use a cache/materialized view
Leverage the rework task system for scheduled updates - see Tasks system: how to organize and schedule tasks
Use the mini scraping framework to link scrapers to tasks and series (see scrap.py and refresh task)

After Changes:

Use tsa.get('series', revision_date=timestamp) to compare before/after states
Update dashboard configurations if series structure changed
Monitor cache performance and policies

Data Quality Standards

Validation Using Refinery Features:

Use tsa.supervision_status('series') to check if manual overrides exist
Implement quality checks in rework tasks that run on schedule
Use tsa.edited('series') to identify series with manual interventions
Store quality indicators in metadata: {'quality': 'validated', 'source': 'verified'}

Audit Trail Management:

Every tsa.update(), tsa.update_metadata(), and tsa.replace_metadata() is automatically logged
Use tsa.log('series') to see change history (and per-update metadata)
Supervision via manual=True maintains audit trail for corrections
tsa.insertion_dates('series') shows when data was added to the system

Data Lineage:

Use tsa.formula('computed_series') to see formula definition
tsa.source('series') identifies the database source
Formula dependencies are tracked automatically
Web UI provides visual dependency graphs for complex formulas

Formula Development#

Data Medallion Architecture for Formulas

Bronze Layer - Raw Ingestion:

Direct from sources: energy.prices.nordpool.raw, weather.meteo.paris.raw
No processing, preserve original structure and timestamps

Silver Layer - Cleaned and Standardized:

Handle data quality issues here: missing values, duplicates, basic validation
Resampling to standard frequencies happens here, example: (resample (series "energy.prices.raw") "H")
Standardized units, timezone-aware, validated
Outlier removal, gap filling
Business rule application: (slice ... #:from (date "2020-01-01")) for data quality cutoffs

Gold Layer - Business Logic:

energy.daily_average_price - business KPIs and aggregations, ML model inputs
energy.price_forecast - ML model outputs
trading.settlement_prices - complex business calculations
Cross-domain joins and enrichment

Platinum Layer - Presentation:

dashboard.energy.price_summary - optimized for specific dashboards
api.energy.latest_prices - formatted for external APIs
User-specific views and permissions

Formula Composition Strategies by Layer

Bronze → Silver Transformations:

Focus on data quality
Standardization
Basic gap filling
Resampling for stable time granularities

Silver → Gold Business Logic:

Domain calculations: (/ (+ (series "clean.demand") (series "clean.losses")) (series "clean.capacity"))
Aggregations (by geography or other domains) at different levels
Cross-referencing: (priority (series "validated") (series "estimated"))

Gold → Platinum Optimization:

Performance caching for heavy calculations
User-specific filters and permissions
Dashboard-optimized time ranges and granularity

Production Architecture Patterns

The “Source of Truth” Pattern:

Each business concept has ONE gold-layer source of truth
All downstream uses reference this canonical series
Example: computed.energy.official_price used by all trading, reporting, and billing systems

The “Temporal Consistency” Pattern:

Maintain consistent time horizons across related series
computed.energy.rolling_30d_average and computed.energy.rolling_30d_volatility
Use shared time windows: (rolling ... #:window "30D" #:center False)

The “Lineage Preservation” Pattern:

Embed source attribution in formula names
computed.energy.price.from_nordpool_entsoe vs computed.energy.price.from_local_market
Completes tracking data provenance through the medallion layers

Anti-Patterns from Production Experience

The “Layer Bypass” Anti-Pattern:

Gold formulas directly reading raw data: (series "messy_data.raw")
Skips cleaning and validation, leads to hazardous results
Always flow through the medallion: raw → clean → computed

The “Mixed-Layer Formula” Anti-Pattern:

One formula mixing concerns: cleaning + business logic + presentation formatting
Makes debugging and maintenance difficult
Keep each formula focused on one medallion layer’s responsibilities