Looking at the architecture of OpenSTEF helps to understand OpenSTEF concepts.

Software architecture#

OpenSTEF is set up as a package that performs machine learning to forecast energy loads on the energy grid. It contains:

  • Prediction job: input configuration for a task and/or pipeline (e.g. train an XGB model for a certain location).

  • Tasks: can be called to perform training, forecasting, or evaluation. All tasks use corresponding pipelines. Tasks include getting data from a database, raising task exceptions, and writing data to a database.

  • Pipelines: can be called to perform training, forecasting or evaluation by giving input data to the pipeline. Users can choose to use tasks (which fetch/write data for you), or use pipelines directly (which requires fetching/writing data yourself).

  • Data validation: is called by pipelines to validate data (e.g. checking for flatliners).

  • Feature engineering: is called by pipelines to select required features for training/forecasting based on the configuration from the prediction job (e.g. create new features for energy load of yesterday, last week).

  • Machine learning: is called by pipelines to perform training, forecasting, or evaluation based on the configuration from the prediction job (e.g. train an XGB quantile model).

  • Model storage: is called by pipelines to store or fetch trained machine learning model with MLFlow (e.g. store model locally in disk/database/s3_bucket on AWS).

  • Post processing: is called by pipelines to post process forecasting (e.g. combine forecast dataframe with extra configuration information from prediction job or split load forecast into solar, wind, and energy usage forecast).

If tasks are used, the openstef-dbc package is required as an interface to the database for reading/writing. The current interface in openstef-dbc is for a MySQL database for configuration data (e.g. information for prediction jobs) and Influx database for feature data (e.g. weather, load, energy price data) and energy forecast data.

Application architecture#

OpenSTEF is just a software package by itself and needs more parts to run as an application.

It requires:

  • Github repository:

    • (create yourself) Data fetcher: software package to fetch data and write it to a database (e.g. a scheduled cron job to fetch weather data in Kubernetes).

    • (create yourself) Data API: API to provide data from a database or other source to applications and users (e.g. a REST API).

    • (create yourself) Forecaster: software package to fetch config/data and run openstef tasks/pipelines (e.g. a scheduled cron job to train/forecast in Kubernetes).

    • (open source) OpenSTEF: software package that performs machine learning to forecast energy loads on the energy grid.

    • (open source) OpenSTEF-dbc: software package that is interface to read/write data from/to a database for openstef tasks.

  • CI/CD

    • (create yourself) Energy forecasting Application CI/CD: CICD pipeline to build, test, and deploy forecasting application (e.g. to Kubernetes via Jenkins/Tekton).

    • (open source) OpenSTEF package CI/CD: CICD pipeline to build, test, and release OpenSTEF package to PyPI (via github actions).

  • Compute: software applications can be run on Kubernetes on AWS.

  • Database: SQL, influx, or other database can be used to store fetched data and forecasts.

  • Dashboard: dashboard to visualize historic and forecasted energy loads.

Screenshot of the operational dashboard showing the key functionality of OpenSTEF.

Domain Adaptation for Zero Shot Learning in Sequence (DAZLS)#

DAZLS is an energy splitting function in openSTEF. Is a technique which transfers knowledge from complete-information substations to incomplete-information substations for solar and wind power prediction. It is being used in openstef.pipeline.create_component_forecast to issue the prediction.

This function trains a splitting model on data from multiple substations with known components and uses this model to carry out a prediction for target substations with unknown components. The training data from the known substations include weather, location, and total load information of each substation and predicts the solar and the wind power of the target substations.

The model is developed as a zero-shot learning method because it has to carry out the prediction of target substations with unknown components by using training data from other substations with known components. For this purpose, the method is formulated as a 2-step approach by combining two models deployed in sequence, the Domain and the Adaptation model.

The schema bellow depicts the structure of the DAZLS model. The input of the model is data from the complete-information substations. For every known substation we have input data, source metadata and output data. At first, we feed the input data to train the Domain model. Domain model gives a predicted output. This predicted output data, linked together with the source metadata of each substation, is being used as the input to train the Adaptation model. Then, the Adaptation model provides the final output prediction of solar and wind power for the target substations.


Domain Adaptation Model#

For more information about DAZLS model, see:

Teng, S.Y., van Nooten, C.C., van Doorn, J.M., Ottenbros, A., Huijbregts, M., Jansen, J.J. Improving Near Real-Time Predictions of Renewable Electricity Production at Substation Level (Submitted)

HOW TO USE: The code which loads and stores the DAZLS model is in the notebook file 05. Split net load into Components.ipynb. When running this notebook, a dazls_stored.sav file is being produced and can be used in the prediction pipeline. It is important, whenever there are changes in the, to run again the notebook and use the newly produced dazls_stored.sav file in the repository.

Overview of relational database schema#

OpenSTEF uses a relational database to store information about prediction jobs and measurements. An ER diagram of this database is shown bellow.

ER diagram

Overview of timeseries database schema#

OpenSTEF uses a timeseries database to store all timeseries data. A diagram of its structure is shown bellow.

Diagram InlfuxDB