Using Reinforcement Learning to try to control the heating of a building (district heating)

Tony Karlsson

3

[ Question ]

Using Reinforcement Learning to try to control the heating of a building (district heating)

by Tony Karlsson

4th Oct 2023

1 min read

A

1 5

3

In short, we are trying to use Reinforcement Learning to try to control the heating of a building (district heating) with the input buildings zone temperature, outdoor temperature. To not use the real building during training of the RL-algorithm we are using a building simulation program as an environment.

The building simulation program has inputs:

Zone thermostat heating and cooling setpoint (C)
Hot water pump flow rate.

Outputs from the building simulation program are:

Zone temperatures (C)
Outdoor temperature (C)
Hot water rate (kw)

The aim of the RL-algorithm is to make a more efficient control of the buildings district heating use, then the current district heating control function. The primary goal is to make the RL-algorithm peak-shave the district heating use.

We are using ClippedPPO as an agent using a RL-framework. As a comparison we have district heating data from one year from the building we want to control. The building is modelled in the building simulation format.

Action space of the RL-algorithm is:

Hot water pump flow rate
Zones heating and cooling temperature SP

Observation space of the RL-algorithm is:

Zone Air temperature
Outdoor temperature, current and forecast (36 hours into future)
Heating rate of hot water

In each timestep the RL-environment takes the input from the building simulation program and calculates a penalty from the observation state that is returned to the agent. The penalty is calculated as a sum of 4 different parts. Each part has a coefficient that by art I have been trying to figure out. Some of parts are for example the -coeff1*heating_rate^2, -coeff2*heating_derivative and -coeff3*unfomfortabletemp (large penalty when indoor temperature less than 19C)

The problem is that we are seeing heating with high peaks that we want the RL-algorithm to shave. So if anyone has any idea on how to get this working or give some insight on how to progress.

The orange part is the RL-resulting hot water heating rate and the blue part is the real-world measured values for 2022:

AI

Frontpage

3

New Answer

New Comment

1 Answers sorted by
top scoring

Dagon

Oct 04, 2023

20

You're seeing high peaks in the real observation data, or in the current simulations of the RL model?

My main worry would be that there is imprecision in the controls (set flow rate to X gpm, actually get more or less by an amount that isn't predictable), and delays in impact (time from starting to heat to seeing the air temperature change) which your simulation is making too precise or differently from the real world.

[-]Tony Karlsson3y30

There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.

Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.

Reply

2Dagon3y

I'd expect building inertia (and heater inertia, and water-flow and -temperature inertia) are important to both pre-heating effectively, and to smoothing out any spikes. The other factor in the spikes is probably the cost function - are you modeling constraints like minimum time to heat and rapid-cycling maintenance increase?

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:03 PM