Migrating from OpenSTEF 3 to 4#
This page guides users of the legacy openstef v3 package through the conceptual
and practical changes in OpenSTEF v4. If you are starting fresh, skip this page and
begin with Installation.
Note
V4 is a ground-up redesign. While the forecasting goals are the same, the API surface is intentionally different. Plan for a rewrite of integration code, not a find-and-replace.
Package Structure#
V4 splits functionality into focused, independently installable packages. You can use the models without any database dependency, and each package can be tested in isolation.
In v3 these were all bundled in a single openstef package:
V3 |
V4 |
Responsibility |
|---|---|---|
|
|
Model training, prediction, feature engineering |
|
|
Types, datasets, validation, testing utilities |
|
|
Backtesting, evaluation, experiment tracking |
(not available) |
|
Ensemble forecasting, metalearning |
|
No equivalent |
See Reference Implementation below |
Configuration#
V4’s ForecastingWorkflowConfig validates all fields
at construction time. Model types are constrained literals, durations use timedelta,
and hyperparameters are typed per-model objects. This replaces v3’s
PredictionJobDataClass which used free-form strings and untyped dicts.
Before (V3):
from openstef.data_classes.prediction_job import PredictionJobDataClass
pj = PredictionJobDataClass(
id=287,
model="xgb",
resolution_minutes=15,
forecast_type="demand",
quantiles=[10, 30, 50, 70, 90],
)
After (V4):
from openstef_models.presets import ForecastingWorkflowConfig
config = ForecastingWorkflowConfig(
model_id="loc_287",
model="xgboost",
sample_interval=timedelta(minutes=15),
quantiles=[Q(0.1), Q(0.3), Q(0.5), Q(0.7), Q(0.9)],
)
Key improvements:
ForecastingWorkflowConfiguses Pydantic v2 with strict validation. Configuration errors surface immediately at construction time.Model types, quantiles, and durations use constrained types with IDE autocompletion.
Location, horizon, and hyperparameter settings are structured sub-objects with their own validation and defaults.
V3 ( |
V4 ( |
Notes |
|---|---|---|
|
|
Now |
|
|
Validated; see Model Types below |
|
|
Per-model typed config objects |
|
|
|
|
|
Supports multiple horizons |
|
|
Structured LocationConfig sub-object |
|
|
|
|
|
Always set; default |
|
|
Same name and semantics |
|
|
|
|
|
Unchanged |
|
|
Slight rename (no underscore in “nonzero”) |
|
|
Unchanged |
|
Removed |
Handled by model choice + transforms |
|
|
Simplified |
|
|
Structured config |
|
Removed |
User manages orchestration |
|
Removed |
Call workflow methods directly |
|
|
Feature selection via enum |
See the ForecastingWorkflowConfig API reference for
the full list of fields and defaults.
Data Handling#
V4 introduces TimeSeriesDataset,
which carries the data and its metadata together. In v3, metadata like sample
interval and column roles lived separately in the prediction job.
Benefits of TimeSeriesDataset:
Sample interval – validated on construction, ensuring consistent resampling.
Availability windows – tracks when each observation became available (critical for correct backtesting without lookahead).
Versioning – supports horizon-aware or
available_at-aware slicing.
Before (V3):
import pandas as pd
input_data = pd.read_csv("data.csv", index_col="index", parse_dates=True)
train_model_pipeline(pj, input_data)
After (V4):
import pandas as pd
from openstef_core.datasets import TimeSeriesDataset
df = pd.read_csv("data.csv", index_col="index", parse_dates=True)
dataset = TimeSeriesDataset(
data=df,
sample_interval=timedelta(minutes=15),
)
The dataset is then passed to workflows which can introspect its properties without external configuration.
Workflows#
V4 unifies the v3 concepts of “tasks” (database-coupled orchestration) and “pipelines” (ML logic) into a single concept: Workflows. A workflow encapsulates the full train/predict cycle without assuming any particular storage backend.
V3 |
V4 |
|---|---|
|
|
|
|
Task (fetch -> pipeline -> store) |
User code + workflow (you own I/O) |
A Preset is a factory that builds a fully configured workflow from a config object:
from openstef_models.presets import create_forecasting_workflow, ForecastingWorkflowConfig
config = ForecastingWorkflowConfig(
model_id="loc_287",
model="xgboost",
sample_interval=timedelta(minutes=15),
quantiles=[Q(0.1), Q(0.5), Q(0.9)],
)
workflow = create_forecasting_workflow(config)
# Training
workflow.fit(dataset)
# Prediction
forecast = workflow.predict(dataset)
create_forecasting_workflow() assembles preprocessing,
the forecaster, postprocessing, and callbacks into a
CustomForecastingWorkflow.
For ensemble approaches, openstef-meta provides
create_ensemble_forecasting_workflow().
See Forecasting for a complete walkthrough.
Model Types#
Model identifiers changed for clarity. Quantile variants are no longer separate model
types; configure quantiles via the quantiles field instead.
V3 |
V4 |
Notes |
|---|---|---|
|
|
|
|
|
Quantiles now in config, not model name |
|
|
Same |
|
|
|
|
Removed |
Use |
|
|
|
|
|
Quantiles now in config |
|
|
Unchanged |
|
|
Unchanged |
(new) |
|
Fallback model for low-data situations |
(new) |
|
LightGBM with linear learner |
Reference Implementation#
In v3, openstef-dbc provided scheduling, database integration, and orchestration.
V4 focuses on the core ML libraries and leaves integration to the user.
The openstef-reference repository demonstrates how a complete v3 system was deployed (scheduling, data integration, and storage). For v4 deployment patterns, see Deployment instead.
If your v3 code relied on openstef-dbc:
Data ingestion – write an adapter that loads data into
TimeSeriesDataset.Result storage – extract forecasts from the workflow output and write them to your database of choice.
Configuration storage – serialize
ForecastingWorkflowConfigto/from your config store (JSON, YAML, database row).