Power Flow Balancing with Decentralized Graph Neural Networks

1

Power Flow Balancing with Decentralized GraphNeural Networks

Jonas Berg Hansen, Stian Normann Anfinsen, and Filippo Maria Bianchi

Abstract—We propose an end-to-end framework based on aGraph Neural Network (GNN) to balance the power flows in ageneric grid. The optimization is framed as a supervised vertexregression task, where the GNN is trained to predict the currentand power injections at each grid branch that yield a powerflow balance. By representing the power grid as a line graphwith branches as vertices, we can train a GNN that is moreaccurate and robust to changes in the underlying topology. Inaddition, by using specialized GNN layers, we are able to builda very deep architecture that accounts for large neighborhoodson the graph, while implementing only localized operations. Weperform three different experiments to evaluate: i) the benefitsof using localized rather than global operations and the tendencyto oversmooth when using deep GNN models; ii) the resilienceto perturbations in the graph topology; and iii) the capability totrain the model simultaneously on multiple grid topologies andthe consequential improvement in generalization to new, unseengrids. The proposed framework is efficient and, compared toother solvers based on deep learning, is robust to perturbationsnot only to the physical quantities on the grid components, butalso to the topology.

I. INTRODUCTION

POWER flow analysis is a fundamental part of the op-eration, planning, maintenance, and control of power

systems [1]. To ensure adequate system reliability for day-to-day operation, the power flow must be distributed such thatthe demand is met without violating the physical constraintsof the electrical components of the energy grid. In addition,the power system should be reasonably resilient to outagesand disconnections. To perform the computations necessary tofind this balanced state of operation, a power solver needsinformation about how the electrical nodes in the powergrid are connected, as well as the physical properties of theinfrastructure (e.g., resistances of the transmission lines).

Traditionally, the power flow balance is computed using nu-merical optimizers, such as the Newton-Raphson method [2].These methods are computationally expensive for large sys-tems, the convergence to the global optimum is in general notguaranteed, and a new optimization must be performed foreach small modification of the grid configuration.

Presently, several approaches based on machine learningprinciples have been proposed to compute fast approximations

Correspondence: [email protected]. B. Hansen and F. M. Bianchi are with Department of Mathematics and

Statistics, UiT The Arctic University of NorwayS. N. Anfinsen and F. M. Bianchi are with NORCE Norwegian Research

Centre ASS. N. Anfinsen is with Department of Physics and Technology, UiT The

Arctic University of NorwayThis project was supported in part by Equinor under the Academia

Agreement with UiT The Arctic University of Norway.

of the power flow solution while being more accurate thantraditional approximation methods such as the linear DCpower flow (DCPF) approximation. Earlier attempts mostlyinclude models based on multi-layer perceptrons (MLPs) [3],[4]. MLPs are easy to implement, but they can only generalizeto different system values (e.g., loads on the buses) and not todifferent system topologies. In addition, MLPs have a highertendency to overfit the training data compared to other neuralnetworks, which exploit inductive biases to reduce the numberof trainable parameters.

A few recent works have proposed to represent powersystems as graphs and used graph neural networks (GNNs)to solve power flow related tasks [5]–[11]. GNNs are neuralnetwork models that directly exploit the topology of the graphto implement localized computations, which are independentfrom the global structure of power systems. Therefore, GNNscan, in principle, be applied to grids with different sizes andtopologies. The latter is one of the most appealing featuresof GNN models compared to MLPs, as it allows to quicklycompute a solution after the grid topology is modified, e.g.when transmission lines are disconnected due to contingenciesor planned rerouting operations.

However, most of the GNN-based approaches found in theexisting literature use architectural components that dependon the global structure, which essentially locks the models toa specific topology [7]–[10]. Like the MLP, these models arestill effective when dealing with a single grid topology, but areunable to process topological variations in the configuration ofpower systems. There are a few works that adopt GNNs ableto process multiple topologies. However, these architecturesare either unable to handle transmission line attributes, suchas resistance or reactance, which are important to model thephysics of the power flow problem [5], or they use the GNNto perform a single operation within a larger pipeline thatincludes global operations to update the state of the buses [6].

Contributions. We propose a fully decentralized GNN ar-chitecture and implement an end-to-end approach to optimizethe power flows. The optimization is framed as a supervisedvertex regression task, which aims to predict the electricalquantities of interest at each branch. By using only localoperations, our GNN can not only handle perturbations inthe graph topology, but it can be transferred to completelydifferent grids. To effectively implement such a model, wehave addressed two challenges:

• In power flow balancing, distant vertices on the graphshould be able to communicate with each other. Toaccomplish this in the absence of centralized operations,

arX

iv:2

111.

0216

9v1

[cs

.LG

] 3

Nov

202

1

2

a very deep GNN with a large receptive field is required.Since deep GNN architectures are often affected byoversmoothing (the output of adjacent vertices tends tobecome the same), we use convolutional ARMA layersdesigned to implement a large receptive field.

• In traditional power flow optimization the phase anglesat each bus are expressed relative to the slack bus [12].However, such a global dependency is difficult to modelwith a GNN that implements only localized operations.In addition, buses with generators must be treated differ-ently when computing the solution, which complicatesthe GNN training. As a solution, we introduce a linegraph representation of the power grid, which allowsto: i) model both bus and branch features as vertexattributes; ii) treat all vertices in the line graph equally;and iii) no longer express the solution in terms of theslack bus, which reduces the dependency on such globalinformation.

We perform three experiments to investigate the potentialof a decentralized GNN for power flow balancing.

1) We compare the proposed GNN with an MLP to high-light the benefits of using localized rather than globaloperations. We also analyze the problem of oversmooth-ing that GNNs encounter in power flow balancing andwe show how it can be mitigated by using specializedGNN layers to implement very deep architectures.

2) We consider a setting where the topology changes asa consequence of line disconnection in the grid and weshow the advantage of using a decentralized architecture.

3) We perform training on multiple power systems withcompletely different topologies to investigate if the gen-eralization capability on new, unseen grids improves.

II. THEORETICAL BACKGROUND

A. Notation

A power grid consists of a set of buses and branches thatconnect them. Each of these core components are characterizedby a set of physical quantities, which influence the power flowsolution. Those are referred to as features and are describedin the following, along with their measurement units (inbrackets).

1) Bus features:• |V |: voltage magnitude [p.u. (per unit)]• θ: voltage angle [radians]• Pd: active power demand [MW]• Qd: reactive power demand [MVAr]• Gs: conductance of connected shunt element [MW

demanded at |V | = 1.0 p.u.]• Bs: susceptance of connected shunt element [MVAr

injected at |V | = 1.0 p.u.]• Pg: active power output of connected generator

[MW]• Qg: reactive power output of connected generator

[MVAr]2) Branch features:

• r/x: resistance/reactance [p.u.]

• b: total line charging susceptance [p.u.]• τ : transformer off nominal turns ratio• θshift: transformer phase shift angle [radians]

B. The Power Flow Problem

The steady-state solution to the power flow problem for asystem of Nb buses is found by solving a set of nonlinearequations given by Kirchhoff’s laws. In polar form, the powerbalance equations are [13]:

Pi =

Nb∑k=1

|Vi||Vk|(Gik cos θik +Bik sin θik), (1)

Qi =

Nb∑k=1

|Vi||Vk|(Gik sin θik −Bik cos θik), (2)

where Pi and Qi are the net active and reactive powerinjections at bus i, θik is the voltage phase angle differencebetween bus i and k, and Bik and Gik are respectively thereal and imaginary parts of the (i, k)-th element in the busadmittance matrix. The shunt conductance and susceptance,along with all of the aforementioned branch attributes, areused to construct this admittance matrix.

Typically, the aim of power flow balancing is to findthe complex voltages on the buses that satisfy the balanceequations in (1) and (2). To obtain the same number ofequations and variables, buses are split into different categorieswhere certain variables are fixed. Firstly, a single bus with agenerator is selected to act as the slack or reference bus, forwhich |Vi| and θi are fixed. The voltage phase angles of theother buses are expressed as differences from the slack. Theother buses with generators are called PV buses, and for these|Vi| and Pi are fixed. Finally, the remaining buses are calledPQ buses, where Pi and Qi are known.

C. Multi-Layer Perceptrons

An MLP consists of multiple layers of computational (orprocessing) units that perform an affine transformation fol-lowed by a nonlinearity [14]. Formally, an MLP with L hiddenlayers implements the following operations

h(0) = x, (3)

h(l) = σ(l)(W(l)h(l−1) + b(l)

), l = 1, 2, ..., L, (4)

y = σ(L+1)(W(L+1)h(L) + b(L+1)

), (5)

where x ∈ RFin×1 is the input, h(l) ∈ RFl×1 is the hiddenstate at the l-th layer, W (l) ∈ RFl×Fl−1 is a trainable weightmatrix, b(l) ∈ RFl×1 is a trainable bias vector, σ(l) is anactivation function, and y ∈ RFout×1 is the MLP output. Fl

denotes the number of processing units in the l-th layer.The dimensions of the weight matrices and bias vectors in

the input and output layers depend on the number of variablesin x and y, which prevents the MLP from being appliedto data where the number of variables differ across samples.This makes the MLP unsuitable for balancing the power flowin grids with different number of buses. In addition, as the

3

number of trainable parameters increases with the number ofvariables, MLPs used to process large grids must have a veryhigh capacity and can more easily memorize specific patternsin training data at the expense of generalization to new data.This problem is referred to as overfitting [14].

D. Graph Neural Networks

Power systems can be naturally represented as graphs, dueto their non-Euclidean structure. A graph with N vertices,each one associated with a feature vector of size Fin, can berepresented by the tuple {A,X}, where A ∈ RN×N is the ad-jacency matrix and X ∈ RN×Fin contains the vertex features.GNNs are a class of neural networks specifically designed toprocess data represented as graphs [15]. GNNs include layersimplementing localized operations, called message passing(MP), that are independent from the global structure of thegraph. A GNN without layers that perform global operations iscompletely decentralized and can process graphs with differenttopologies.

The MP operations in a GNN can be performed in severaldifferent ways. A popular MP layer is the one from the GraphConvolutional Network (GCN) [16], which updates the nodefeatures as

X(t) = σ(AX(t−1)W

)(6)

where X(t) are the vertex features at layer t (note thatX(0) = X), W is a matrix of trainable parameters, and Ais the symmetrically normalized version of A + I, given by

A = D−1/2(A + I)D−1/2, (7)

where D is the degree matrix of A + I.To optimize the power flow, it is necessary to reach higher

order neighborhoods, since the value of a generator can beinfluenced by the load of a distant bus in the grid. To combinenodes from distant regions in the graph, it is necessary to stackmany MP operations. If these operations are implementedby layers that smooths graph signals [16]–[18], the repeatedapplication of MP leads to oversmoothing, i.e., the vertexrepresentations become uniform on the whole graph [19].This prevents the GNN from producing a dissimilar outputat vertices that are very close on the graph. The differencebetween smooth and sharp graph signals is illustrated in Fig. 1.

It is important for the power flow problem that a GNN cangenerate sharp signals, since neighboring grid buses mightassume very different physical quantities. As a solution, weimplement a very deep GNN composed of convolutionalARMA layers [20]. The ARMA layer is more flexible andless prone to oversmooth the vertex features compared to otherMP layers. Each ARMA layer consists of K stacks, eachone performing T propagation steps, which we will denoteas ARMAT

K . The output of each stack is computed as

X(1)k = σ

(AXW(0) + XV(0)

), (8)

X(t)k = σ

(AX

(t−1)k W + XV

), t = 2, 3, ..., T, (9)

Fig. 1: The graph signal becomes sharper going from left toright. Smooth graph signals assign similar values to neigh-boring nodes, while a sharp graph signal can assign differentvalues to neighboring nodes. Most GNN layers perform asmoothing operation and, thus, by stacking many of them it ispossible to obtain in output only very smooth graph signals.

where X(t)k ∈ RN×Fout are node features at step t, σ is a

nonlinear activation function, and W(0), V(0), W and V aretrainable weight matrices. A is the symmetrically normalizedversion of the adjacency matrix A.

The final output for the ARMATK layer is computed as an

average of the output of all stacks:

X =1

K

K∑k=1

X(T )k (10)

III. METHODOLOGY

A. Power grid representation

It is problematic to express the power flow solution in termsof complex bus voltages for data-driven GNN architecturesthat rely solely on localized operations. In polar coordinates,all buses with generators must be treated differently, sincethey have fixed voltage magnitudes and there is an inherentambiguity in the phase angles, which are tied to the choiceof slack bus. Fixing or masking out values before computingthe loss makes training more complicated but alleviates theissue of having to adapt the voltage magnitude only for certainbuses. On the other hand, to predict the phase angles it isnecessary that the receptive field around all nodes cover theslack bus, and the model must be powerful enough to adjustall angles if the location of the slack bus changes.

To address these issues, we propose to predict the complexcurrent and power injections on the branches instead. Thesequantities can be obtained from the complex bus voltages andcan be used to express an equivalent power flow solution.Injections are made at both ends of a branch, so if powerand current are split into their real and imaginary parts, thepower flow optimization is framed as a vertex regression task,where the GNN predicts 8 quantities for each branch.

There are important advantages in computing the solutionfor the branches rather than voltage solutions for the buses.

1) None of these 8 quantities are fixed, meaning that it isnot necessary to mask or fix values in the GNN output.

2) The ambiguity related to the phase angles is avoided.3) Since the phase angles are not expressed as a difference

from the slack bus, the sensitivity to the choice of slackbus is somewhat reduced.

4

Fig. 2: Transition from the bus graph (G) to the branch graph(line graph of G).

Indeed, the choice of slack bus will still affect the powerflow solution through the active power component on this bus,which is unknown beforehand and will have an impact onthe injections on the branches. Ideally, the receptive field ofa GNN should be large enough to cover the slack bus whencentered on any branch. However, since the active power of theslack bus mostly affects the injections on the closest branches,a sufficiently deep GNN works well even if coverage by thereceptive field is not ensured for branches very far from theslack bus.

The shift in perspective from buses to branches can be seenas a conversion to a line graph representation [21] of the busgraph. Given a graph G, its line graph L(G) is a graph whereeach vertex represents an edge of G. Two vertices of L(G)are adjacent if and only if their corresponding edges in G areincident, i.e., they share a common endpoint. This yields anundirected graph, as illustrated in Figure 2.

Since each branch has a form of directionality related to theassignment of designated from and to ends for transformers, adirected graph could be used to model the branches’ direction.However, in a directed graph the peripheral vertices at the endof a chain of unidirectional edges cannot share (or receive,depending on the edges direction) information with the restof the graph. This affects the learning process of a GNN, asthe graph diffusion is hindered. Therefore, we have opted tokeep the undirected graph representation and the direction isimplicitly expressed through the inclusion of bus features fromthe two endpoints. Specifically, each vertex in the line graphis associated with the following features

• From corresponding branch: r, x, b, τ , θshift• From bus endpoints: Pd, Qd, Gs, Bs, Pg , |Vg|, islack

where Pg and |Vg| are the active power and voltage magnitudeof a connected generator, respectively. Finally, islack is a two-dimensional one-hot vector indicating whether the bus at thecorresponding endpoint is the slack bus. To summarize, eachvertex i in the line graph is associated with a vector xi ∈R21 of input features: [branch(5); from-bus(8); to-bus(8)]. Byconcatenating the features of the from-bus before the featuresof the to-bus, the branch direction is implicitly given. Theoutput features, to be approximated by the GNN, consist of avector yi ∈ R8 containing the following quantities

• P re1 , P

im1 , I re

1 , Iim1 , P re

2 , Pim2 , I re

2 , Iim2

which are the real and imaginary components of power andcurrent at the two endpoints.

Fig. 3: General structure of the GNN model.

The last advantage of using a line graph is that its vertexfeatures incorporate both the properties of the branch and thetwo endpoint buses. Therefore, the line graph is representedby an adjacency A and a vertex features matrix X, which canbe processed by any standard GNN architecture. Conversely,when operating on the bus graph there is an additional edgefeatures matrix representing the branch properties. While thereare MP layers that handle edge features [22], they are generallyused only as the first layer and their contribution vanishes invery deep architectures, like ours.

B. GNN Design

The node aggregation of the GNN model proposed here isdone with ARMAT

k layers. As discussed in Section II-D, byusing ARMA layers it is possible to build very deep GNNarchitectures that are robust to oversmoothing. In principle,one could use a single ARMA layer with a very high numberof iterations T to achieve the desired depth of the receptivefield. However, we found out experimentally that a GNN withmore ARMA layers, each one with fewer recursions, performsbetter.

Before and after the aggregation stage, the model uses denseprocessing layers to map the individual node states. The ideahere is to allow the model to map the states into a moremeaningful representation before the aggregation, and to mapthe final aggregated states into the output space. This typeof configuration is aligned with the study on the architecturaldesign space of GNNs by [23], which showed that such pre-and post-processing layers could improve performance whencombined with different aggregation techniques. Fig. 3 showsa schematic depiction of the adopted GNN architecture.

C. MLP Design

MLP models will be used to benchmark the GNN archi-tecture. We consider two different configurations of MLPs,one that processes each vertex features vector individually andanother that accounts for all vertices at once.

The first configuration, which we call Local MLP model,makes predictions for each branch using only its associatedinput features. In other words, the 8 target values for eachbranch are predicted using the aforementioned 21 input vertexfeatures. See Fig. 4 for an illustration. Since the Local MLP iscompletely agnostic of the underlying topology of the grid, theperformance obtained by the Local MLP model can be usedto quantify how much the information contained in the inputfeatures associated with a single branch can, alone, predict theoutput values for that branch.

5

Fig. 4: Local MLP. The number of units shown here does notreflect the exact amount used for the experiments.

In the second configuration, referred to as the Global MLPmodel, predictions are made for all branches at once using theinput features from all buses and branches. To accomplish this,all bus and branch input features are concatenated into twoone-dimensional arrays. Moreover, to reduce the number ofparameters in the input layer, bus and branch features are firstprocessed by separate layers, and the output of these layers arethen concatenated and fed to the hidden layers of the model,as shown in Fig. 5. At the end, this MLP model predicts all thebranch currents and power injections, i.e. the number of outputunits is equal to eight times the number of branches. Clearly,

Fig. 5: Global MLP.

the number of parameters of the Global MLP depends on thenumber of vertices and edges in the graph, which makes themodel unsuitable to handle perturbations on the grid topology.

IV. RELATED WORK ON POWER FLOW BALANCING

a) Newton-Raphson: The traditional Newton-Raphson(NR) method iteratively solves a set of non-linear equationsthrough a linear approximation based on a first order Taylorpolynomial. For the power flow problem this results in a linearsystem of equations of the form:[

∆θi∆|Vi|

]= −J−1

[∆Pi

∆Qi

], (11)

where ∆Pi and ∆Qi are the discrepancies between the ex-pressions on the left and right hand sides of equation (1) and(2), respectively, and J is the Jacobian given by

J =

[∂∆Pi/∂θi ∂∆Pi/∂|Vi|∂∆Qi/∂θi ∂∆Qi/∂|Vi|

](12)

The resulting values, ∆θi and ∆|V |i, are used to updatethe complex bus voltages, and updates continue until anequilibrium is reached, i.e., ∆θi = 0 and ∆|V |i = 0.

In practice, a single expression of the form of (11) istypically made for the full set of unknown θ and |V |, yieldinga (NPV +2NPQ)× (NPV +2NPQ) Jacobian matrix. To improvethe computational efficiency, the problem can also be reframedin the form Ax = b to avoid computing the inverse of J . Onecan also replace ∆Vi by ∆Vi/Vi and make analogous changesin J [24]. However, Newton-Raphson is still computationallyexpensive for large systems and, like most non-convex op-timization procedures, the iterative update procedure is notguaranteed to converge.

b) DCPF approximation: In the DC power flow (DCPF)approximation the power system is linearized, yielding anapproximate solution that can be found with a non-iterativeand absolutely convergent procedure. To accomplish this, allreactive power flows are ignored, all voltage magnitudes areassumed to be 1 p.u., phase angle differences are assumedto be small, and branches are considered to be lossless. Asa result, the two power balance equations, (1) and (2), arereduced to the single equation [25]:

Pi =

Nb∑k=1

Bikθik (13)

Thus, compared to the ACPF solution, a solution for θ canbe found analytically, at the expense of accuracy. In practice,DCPF is only useful in tasks where fast computations are moreimportant than obtaining accurate solutions [26]. Nevertheless,this approximation serves as a good baseline for any modelthat aims to be faster than NR, while also achieving highaccuracy.

c) GNN-based methods: These methods are those mostlyrelated to our work. In the following, we briefly point out theirlimitations, which are addressed by our framework.

Boltz et al. propose to compute the power flow balance byapplying a GNN on a pre-computed correlation matrix thatdetermines the neighborhood of the grid buses [8]. The graphassociated with such a correlation matrix does not match thephysical infrastructure of the power grid and does not allowto account for the physical quantities of the power lines.

In the study from Owerko et al. [7], the top performing GNNused to solve a version of the optimal power flow problemutilizes global fully connected layers at the end, which makesthe overall architecture dependent on the specific topologyof the grid. This prevents the use of this GNN to accountfor perturbations, such as line disconnections, and to transferthe model to new grids. In general, this architecture does notfully exploit the localized computations that characterize theGNN model. In addition, the optimal power flow problemwas simplified, since the only predicted values are the activegenerator power outputs.

6

Donon et al. [5] propose a supervised GNN model thatoperates on a graph represented by multiple graph adjacencymatrices, where the vertices are the transmission lines of thegrid, rather than the buses. Similarly to ours, their solutionprevents certain ambiguities in the bus solution tied to thevoltage phase angles and bus assignment. Drawbacks in theirapproach are the omission of branch attributes in the model,which are important to compute the power flows, and the useof a specialized architecture to implement a GNN with a largereceptive field.

In a follow-up work, Donon et al. [6] design an unsuper-vised model that relies on physical equations to iterativelyoptimize the voltage and phase angle for each bus. Althoughthe experimental results show that this model can somehowgeneralize to unseen topologies, the framework is quite com-plex and relies on a custom formulation of the power flowproblem. Most importantly, the GNN is used to implementonly a specific operation within the whole pipeline, namely tocompute ∆θi and ∆|V |i from ∆Pi,∆Qi, which is the sametype of operation performed by the NR method with a linearapproximation in (11).

Since ∆Pi and ∆Qi, the active and reactive power updatesat each vertex, are computed outside the GNN using globaloperations that account for the whole grid topology, it is notclear how useful are the local exchanges implemented bythe GNN to compute the voltage and phase angle updates.Arguably, a simpler function approximator such as an MLPcould have worked similarly well.

V. EXPERIMENTS & RESULTS

A. Datasets and Algorithms

We perform three experiments to evaluate the performanceof the proposed GNN-based power solver to cover cases when:i) the underlying grid topology is fixed, ii) the topology isperturbed with branch disconnections, iii) multiple grids withdifferent topologies are used to train the GNN model.

All the deep learning models are implemented in Tensor-flow [27] and we used the Spektral library [28] to build theGNN architectures. Power grid data for training and testing isobtained by resampling values in reference case grids providedby MATPOWER [29], an open-source Matlab library forsolving steady-state power system simulation and optimizationproblems, such as power flow balancing. MATPOWER is alsoused to compute the solutions with the DCPF and the NRoptimizer. Outputs of the NR optimizer are used as target fortraining the deep learning models.

a) Data Acquisition: Most of the grid systems consid-ered here are based on IEEE systems, whose overview is givenin Table I. To ensure that a feasible solution exists, only gridsthat are successfully solved by the NR solver in MATPOWERare included in the dataset. The resampling procedure weadopted is detailed in the following.

• Active and reactive loads on buses are uniformly sampledbetween 50% and 150% of the reference values.

• Shunt susceptances and conductances are uniformly sam-pled between 75% and 125% of the reference values.

TABLE I: Grid overview

Grid Details

case9 9 bus, 3 generator; based on data from [30].

case14 Conversion of the IEEE 14 bus test case.

case30 Conversion of the IEEE 30 bus test case.Named case ieee30 in MATPOWER.

case39 39 bus New England system. Modified versionof data from [31].


case89pegase Small part of an European system stemmingfrom the Pan European Grid Advanced Simu-lation and State Estimation (PEGASE) project[32], [33].



• Active power of generators is uniformly sampled fromthe reference values in a range going from 75% up to thelowest of 125% and the maximum allowed power outputPg,max. There is a chance that the sampled productionbecomes so high that the slack bus ends up with anegative active power injection. Since this occurs moreoften for certain grids, to avoid creating a bias in thedataset the sampling procedure is reset whenever thisoccurs.

• Voltage magnitude for generators is uniformly sampledfrom 0.95 p.u. to 1.05 p.u.

• Branch resistances, reactances and charging susceptancesare uniformly sampled between 90% and 110% of thereference values.

• Branch transformer off nominal turns ratio is uniformlysampled from 0.8 to 1.2. The shift angle for the trans-formers are uniformly sampled from -0.2 rad to 0.2 rad.

High spreads in the input values encourage a model to learnlarge weights. Such a model may experience instability duringlearning, become sensitive to input values, and achieve poorgeneralization capabilities. Similarly, a large spread of valuesin the output variables may result in large error gradientsthat dramatically change the weight values and make thelearning process unstable [34]. Therefore, to construct theinput and outputs values in each dataset, all load, generatorpower and shunt quantities are converted to per-unit values(p.u.), using the base apparent power value of the grid asreference. All system quantities (except the transformer tapratio and shift angle) are expressed in p.u. rather than in theoriginal unit of measurements described in Section II-A. Withthis conversion, different grids will have more similar valuedistributions, which makes it easier to apply the same deeplearning model to different grids.

b) Deep Learning Models: The proposed GNN modelconsists of two fully-connected (FC) pre-processing layers,five ARMA8

2 layers, and two post-processing layers. All ofthese layers have 64 hidden units, and use Leaky ReLUactivations with a negative slope coefficient of α = 0.2. Atthe end there is a FC layer with 8 output units and a linearactivation. Due to the use of 5 ARMA layers with 8 iterations

7

each, the model implements a total of 40 propagation steps.This defines the receptive field of the GNN, i.e., the GNNallows each graph vertex to exchange information with theneighbors within 40 hops on the graph. This configuration wasfound through a hyperparameter tuning session using Bayesianoptimization on the data for the second experiment. As abenchmark, we also consider a GNN model where the ARMAlayers are replaced with conventional GCN layers, discussed inSection II-D. Since each GCN layer implements only a singlepropagation step, we built a GNN with 40 GCN layers of 64hidden units to obtain the same receptive field of the GNNwith ARMA layers.

The Local MLP is designed to have roughly the same num-ber of trainable parameters as the ARMA GNN. Specifically,the Local MLP consists of three hidden layers with 256 units,two hidden layers with 128 units, one layer with 64 units, andan output layer with 8 units. All hidden layers are equippedwith a Leaky ReLU with α = 0.2.

For the Global MLP, the number of parameters depends onthe size of the grid(s) it is trained on and, thus, the size of thismodel is not scaled to match exactly the number of parametersof the GNN. Since the Global MLP contains a huge amountof parameters in the input and output layer, we limited thedepth of this model to prevent overfitting and to avoid havinga number of parameters that is several orders of magnitudelarger than in the other architectures. Specifically, the GlobalMLP model consists of bus and branch pre-processing layerswith 64 units each, two hidden layers with 128 units, and anoutput layer with number of units equal to eight times thenumber of branches. The processing layers and the hiddenlayers use Leaky ReLU activations with α = 0.2.

An overview of the model architectures is given in Figure 6.All models are trained in a supervised manner with a stochasticgradient descent learning algorithm using a mean squared error(MSE) loss between predictions and the output of the NRsolver in MATPOWER.

Fig. 6: Overview of the architecture of the GNN modelsand the two MLP models, as a combination of messagepassing (MP) and fully connected (FC) layers. The numbersin parentheses indicate the number of hidden/processing units.For the Global MLP model, Fout is equal to eight times thenumber of branches of the applied grid.

B. Constant Grid Topology

In this first experiment the models are trained only oninstances of the same grid topology, obtained with the re-sampling procedure described in Section V-A.a. This is thesimplest use case, where the models only need to adapt to

perturbations of the values of a specific set of components,but not to changes in the underlying graph topology.

For this task samples of three different MATPOWER casegrids are considered: case30, case118 and case300. A total of15 000 samples were generated for each of these case grids andtrain/validation/test sets were arbitrarily obtained following a56/14/30 split. Ten instances of the Local MLP, Global MLP,GCN and ARMA GNN models were trained for 250 epochs,using the ADAM optimizer with learning rate 0,001 and batchsize 16. After training, the weights of the model that achievedthe lowest validation loss were restored.

Method case30 case118 case300

DCPF 0.574 0.674 0.745Local MLP 0.350±0.002 0.447±0.002 0.528±0.003

Global MLP 0.044±0.004 0.134±0.003 0.314±0.007

GCN 0.611±0.03 0.887±0.02 0.851±0.02

ARMA GNN 0.022±0.001 0.057±0.002 0.199±0.004

TABLE II: NRMSE for test set

Table II displays the normalized root mean squared error(NRMSE) values for the branch current and power predictions.The NRMSE is computed as

NRMSE(Y , Y

)=

1

F

F∑j=1

√√√√√ 1

N

N∑i=1

(Yij − Yij

)2Var(yj)

, (14)

where F is the number of output features, N is the number ofsamples and Var(yj) is the sample variance of the j-th outputfeature. The ARMA GNN outperforms both MLP models, theGCN and the DCPF approximation. Compared to the GlobalMLP, the ARMA GNN achieves a performance incrementranging from roughly 40 to 60 percent. It should be notedthat the ratio of the number of parameters between the GlobalMLP and the ARMA GNN is roughly 1:2 for case30, 1.5:1 forcase118 and 3:1 for case300, so the difference in performanceis clearly not due to a higher capacity of the GNN model.

The GCN falls behind all other models in terms of NRMSE,which is likely due to the oversmoothing problem, i.e., themodel is not able to produce a sharp signal in output. Anillustration of the oversmoothing problem in GCN is shown inFigure 7, which displays for each vertex of case118 the meancosine distance between model predictions and the overallaverage branch label. The cosine distance is computed as

cosine distance = 1−ypred · ylabel

‖ypred‖‖ylabel‖. (15)

where ylabel is the average output vector in the test set. Weobtain the cosine distances for the true labels by replacing thepredictions ypred with y in (15). From the figure, it is clear thatARMA GNN manages to produce sharp outputs that coincidevery well with the true labels, whereas the GCN only produceslow-frequency signals, i.e., the output associated with neigh-boring vertices on the graph is very homogeneous. Obviously,this is a strong limitation of the GCN model, since in thepower flow balance state the output of neighboring vertices isnot necessarily similar. In the remaining experiments we willonly consider a GNN with ARMA layers.

8

(a) GCN (b) ARMA (c) True values

Fig. 7: Cosine distances between the branch values predicted by the GNNs and the mean of all the true branch values. Thedistances are averaged over all test samples of case118.

Fig. 8: Scatter plot of the NRMSE scores achieved by the GNNARMA model when trained and tested on different datasets.

Finally, we test how well the ARMA GNN performs whentrained on samples extracted from a specific grid and thentested on samples derived from completely different powergrids. For this experiment, we consider case30, case118, andcase300. Figure 8 shows performance of the ARMA GNN onboth the grids used for training and on the other two gridsthat are only used for testing. As expected, the GNN achievesbetter performance when tested on samples derived from thesame grid used for training. Interestingly, the GNN trainedon case118 performs particularly bad when tested on samplesfrom case30 and case300.

C. Perturbations of One Grid Topology

In this second experiment the models are trained and testedon perturbations of a reference grid where, in addition to thesampling procedure described in Section V-A.a, parts of thenetwork are disconnected. This setup can reflect scenarioswhere the grid is affected by outages or maintenance oper-ations, and it is crucial that the power flow solver can giveaccurate predictions for such cases if it is meant to be usedin, e.g., contingency analysis.

To create a topology perturbation, 5-20 randomly chosenbranches that are not directly connected to the slack busare disconnected. If the graph becomes disconnected, all thecomponents that do not include the slack bus are removed.Thus, the total number of disconnections can become largerthan what was initially sampled. With this method, there is achance that the component with the slack bus is actually thesmallest one and that most of the graph would be discarded.To avoid this situation, any resulting grid with less than10% of the original buses is discarded. Finally, the valuesampling procedure described in Section V-A.a is applied. Ifthe sampling fails, i.e. the NR solver does not converge, theprocess restarts from the disconnection step.

Fig. 9: Bar plot showing how many perturbation samples havea given number of branches remaining (logarithmic y-axis).For reference, the full case300 grid have 411 branches.

A total of 15 000 samples from case300 were generated andsplit into data sets following the same division as in the firstexperiment. Figure 9 shows the distribution of the remainingbranches in the final dataset. Despite the initial disconnec-tions being chosen uniformly, the resulting distribution is notuniform, since additional disconnections are triggered when acomponent of the graph becomes isolated. It is also possiblethat the NR solver does not converge on certain configurations,which would contribute to shape the non-uniform distribution.

The same configuration as in the previous experiment wasused to train ten instances of the Local MLP, Global MLP,and ARMA GNN models. For the Global MLP, since theparameters in the input and output layers are tied to the originaltopology of the case300 grid, we used a padding strategy todeal with a varying number of vertices/edges in the graph. Inparticular, missing values are set to zero during inference.

The resulting NRMSE values for the test set are givenin Table III. As expected, a comparison with the previous

9

Method case300 Perturbations

DCPF 0.762Local MLP 0.554±0.001

Global MLP 0.371±0.004

ARMA GNN 0.243±0.002

TABLE III: NRMSE for test set

experiment shows that the performance of each model de-creases in presence of disconnections. The slight increaseof NRMSE for DCPF suggests that when perturbing theunderlying topology, the resulting samples might, on average,violate the DCPF assumptions to a greater extent. The rankingof the performance between the models remains the sameas in the previous experiment. The ARMA GNN is still thebest performing model, followed by the Global MLP, whichmaintains reasonable performance even under perturbations ofthe underlying topology. This result might appear surprisingat first. However, it can be explained by the fact that even ifthe topology changes, each bus/branch in the grid is alwaysassigned to the same input unit of the Global MLP. In addition,when a vertex/edge is dropped, it is simply replaced with azero-input. Therefore, there is still a one-to-one correspon-dence between the grid components and the input units, whichallows the Global MLP to factor out (to a certain extent) theunderlying graph topology.

D. Multiple Grids with Different Topology

In this final experiment the models are trained simultane-ously on samples of six different grids: case9, case14, case30,case39, case89pegase and case118. In addition, the models areused to compute solutions for samples of case57 and case300,which are completely unseen during training.

To let the Global MLP process samples from grids withdifferent topology, the input and output layer are modifiedsuch that the model can accept samples from any of thegrids seen during training. In addition, the Global MLP canonly process samples from unseen grids that have a numberof vertices/edges less or equal to the maximum number ofvertices/edges seen during training. Each processing unit inthese layers are associated with a specific attribute such asactive load or branch real current injection but, unlike theconfiguration of the two previous experiments, these units arenot associated with a specific bus or branch. Thus, for samplesfrom the smaller grids not all the units will be utilized and theinput and output arrays are zero-padded. On the other hand,since the Local MLP and ARMA GNN use only localizedoperations that do not depend on the number of buses andedges in the grids, their architectures are unaltered.

A total of 5 000 samples are generated for each of the grids.Dataset split and training configuration remained the same asfor the previous two experiments, except for the batch sizethat was increased to 32 due to the larger total number ofsamples. Ten instances of the Local MLP, Global MLP andARMA GNN were trained on the mixed grid dataset. TableIV reports the NRMSE performance obtained in testing newsamples from both the grids used for training and for the

TABLE IV: NRMSE for test set

Dataset DCPF Local MLP Global MLP ARMA GNN

case9 0.512 0.501±0.019 0.121±0.003 0.207±0.010

case14 0.559 0.361±0.014 0.206±0.004 0.188±0.005

case30 0.573 0.411±0.011 0.313±0.006 0.212±0.007

case39 0.552 0.550±0.002 0.249±0.001 0.097±0.003

case89pegase 0.606 0.462±0.008 0.791±0.006 0.071±0.001

case118 0.674 0.456±0.002 0.672±0.015 0.140±0.005

case57 0.584 1.213±0.079 3.168±0.338 1.345±0.232

case300 0.746 5.361±1.499 N/A 2.478±1.636

two grids (case57 and case300) that were excluded from thetraining set.

When testing against samples coming from the 6 grids usedfor training, the ARMA GNN achieves top performance onall datasets except the case9 data, where the Global MLPachieves a better score. However, for the larger grids the GNNsurpasses the Global MLP by a large margin, particularly forcase89pegase. Since there is no longer a one-to-one corre-spondence between the input units and the grid buses, theGlobal MLP cannot infer the graph topology from the trainingsamples. Indeed, the Global MLP is unable to produce accuratesolutions since the same input unit must accommodate, at thesame time, very different types of buses and branches frommultiple grids. This experiment highlights the advantage ofusing an architecture that exploits localized operations thatare independent from the underlying topology.

For the unseen grids, the NRMSE scores are substantiallylarger than for the grids used in training. The main issueis that the data distributions of the training and test setsare too different from each other, which violates one of thefundamental assumption in statistics and machine learning.Therefore, it is not surprising that the simple algorithmicsolution generated by DCPF, where no model is learned, ismore suitable in these cases. It is, however, worth noticingthat also for the two unseen grids case57 and case300 theARMA GNN clearly outperforms the Global MLP. Also, whencomparing to the first experiment we see that a GNN trainedon multiple topologies improves its performance on unseengrids. This improvement comes at the expense of a small dropin performance on the samples from grids seen during training.Finally, we notice that the Global MLP is not able to computepredictions for case300 since the number of buses/branchesare greater than in the grids used for training and the modeldoes not have sufficient input and output units available.

VI. DISCUSSION & CONCLUSIONS

We investigated the potential of a fully localized GNNtrained end-to-end to balance power flows. To enable exchangeof information between distant vertices in the graph, we builta very deep GNN architecture endowed with ARMA layers,which prevents oversmoothing of the output features. We alsoadopted a line graph representation to avoid having differenttypes of nodes and expressing the phase angles as a functionof the slack bus, which can be difficult to handle for a fullydecentralized GNN. Importantly, the proposed representationencompasses all the relevant physical quantities of the power

10

system and the data format can be seamlessly processed byall the most common GNN architectures.

The experimental results show that the proposed ARMAGNN model outperforms a Global MLP with a larger capacityand a global view of the graph, even when computing solutionsfor a fixed grid topology. This demonstrates the importanceof using models that directly account for the graph topologyof the grid and the effectiveness of the proposed line graphrepresentation.

Compared to an MLP, a single GNN model can be trainedon multiple grid architectures and achieves good performanceon samples from the same grids used in training. However,even if the GNN outperforms the MLP, no model is partic-ularly effective in predicting power flows from grids unseenduring training. The most likely explanation is that the unseengrids and the training grids are too different and statisticalmodels generalize well only when the data distributions arethe same in the training and in the test set.

Finally, we observed that the Global MLP achieves lowerperformance than the ARMA GNN, yet still performs reason-ably under perturbations of a specific grid topology. The reasonis that there is still a one-to-one correspondence betweenthe input nodes of the MLP and the buses in the graph,which allows to infer the underlying topology. Moreover, theperturbations might act as a regularization factor, which forcesthe MLP to not rely on all components being present at thesame time. On the other hand, the MLP is completely unable tohandle, at the same time, samples coming from different gridtopologies. Similarly, it would be impossible for the MLP tohandle perturbations in the form of addition of buses/branchesrather than disconnections. In this case, designing a GlobalMLP that matches every conceivable configuration is clearlyunfeasible, while a GNN with fully localized operations wouldhave a clear advantage.

REFERENCES

[1] M. A. Salam, Fundamentals of Electrical Power Systems Analysis.Springer, 2020.

[2] C. Fuerte-Esquivel and E. Acha, “Unified power flow controller: acritical comparison of newton-raphson upfc algorithms in power flowstudies,” IEE Proceedings-Generation, Transmission and Distribution,vol. 144, no. 5, pp. 437–444, 1997.

[3] V. L. Paucar and M. J. Rider, “Artificial neural networks for solving thepower flow problem in electric power systems,” Electric Power SystemsResearch, vol. 62, no. 2, pp. 139–144, 2002.

[4] B. Donnot, I. Guyon, M. Schoenauer, A. Marot, and P. Panciatici, “Fastpower system security analysis with guided dropout,” arXiv preprintarXiv:1801.09870, 2018.

[5] B. Donon, B. Donnot, I. Guyon, and A. Marot, “Graph neural solverfor power systems,” in 2019 International Joint Conference on NeuralNetworks (IJCNN). IEEE, 2019, pp. 1–8.

[6] B. Donon, R. Clement, B. Donnot, A. Marot, I. Guyon, and M. Schoe-nauer, “Neural networks for power flow: Graph neural solver,” ElectricPower Systems Research, vol. 189, p. 106547, 2020.

[7] D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graphneural networks,” in ICASSP 2020-2020 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020,pp. 5930–5934.

[8] V. Bolz, J. Rueß, and A. Zell, “Power flow approximation based on graphconvolutional networks,” in 2019 18th IEEE International ConferenceOn Machine Learning And Applications (ICMLA). IEEE, 2019, pp.1679–1686.

[9] C. Kim, K. Kim, P. Balaprakash, and M. Anitescu, “Graph convolutionalneural networks for optimal load shedding under line contingency,” in2019 IEEE Power & Energy Society General Meeting (PESGM). IEEE,2019, pp. 1–5.

[10] D. Wang, K. Zheng, Q. Chen, G. Luo, and X. Zhang, “Probabilisticpower flow solution with graph convolutional network,” in 2020 IEEEPES Innovative Smart Grid Technologies Europe (ISGT-Europe). IEEE,2020, pp. 650–654.

[11] C. Nauck, M. Lindner, K. Schurholt, H. Zhang, P. Schultz, J. Kurths,I. Isenhardt, and F. Hellmann, “Predicting dynamic stability of powergrids using graph neural networks,” arXiv preprint arXiv:2108.08230,2021.

[12] A. G. Exposito, J. L. M. Ramos, and J. R. Santos, “Slack bus selectionto minimize the system power imbalance in load-flow studies,” IEEETransactions on power systems, vol. 19, no. 2, pp. 987–995, 2004.

[13] F. Milano, Power system modelling and scripting. Springer Science &Business Media, 2010.

[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,2016.

[15] D. Bacciu, F. Errica, A. Micheli, and M. Podda, “A gentle introductionto deep learning for graphs,” Neural Networks, vol. 129, pp. 203–221,2020.

[16] T. N. Kipf and M. Welling, “Semi-supervised classification with graphconvolutional networks,” in Proceedings of the 5th International Con-ference on Learning Representations (ICLR), 2017.

[17] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, andY. Bengio, “Graph attention networks,” International Conference onLearning Representations, 2017.

[18] W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representationlearning on large graphs,” in Proceedings of the 31st InternationalConference on Neural Information Processing Systems, 2017, pp. 1025–1035.

[19] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolu-tional networks for semi-supervised learning,” in Thirty-Second AAAIconference on artificial intelligence, 2018.

[20] F. M. Bianchi, D. Grattarola, L. Livi, and C. Alippi, “Graph neuralnetworks with convolutional arma filters,” IEEE Transactions on PatternAnalysis and Machine Intelligence, 2021.

[21] F. Harary and R. Z. Norman, “Some properties of line digraphs,”Rendiconti del Circolo Matematico di Palermo, vol. 9, no. 2, pp. 161–168, 1960.

[22] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filtersin convolutional neural networks on graphs,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2017, pp. 3693–3702.

[23] J. You, Z. Ying, and J. Leskovec, “Design space for graph neuralnetworks,” Advances in Neural Information Processing Systems, vol. 33,2020.

[24] A. J. Conejo and L. Baringo, Power system operations. Springer, 2018.[25] H. Seifi and M. S. Sepasian, Electric Power System Planning: Issues,

Algorithms and Solutions. Springer-Verlag Berlin Heidelberg, 2011.[26] J. Zhu, Optimization of power system operation. John Wiley & Sons,

2015.[27] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,

S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,” in 12th {USENIX} symposium on operatingsystems design and implementation ({OSDI} 16), 2016, pp. 265–283.

[28] D. Grattarola and C. Alippi, “Graph neural networks in tensorflow andkeras with spektral [application notes],” IEEE Computational Intelli-gence Magazine, vol. 16, no. 1, pp. 99–106, 2021.

[29] R. D. Zimmerman, C. E. Murillo-Sanchez, and R. J. Thomas, “Mat-power: Steady-state operations, planning, and analysis tools for powersystems research and education,” IEEE Transactions on power systems,vol. 26, no. 1, pp. 12–19, 2010.

[30] J. H. Chow, Time-scale modeling of dynamic networks with applicationsto power systems. Springer, 1982, vol. 46.

[31] G. Bills, “On-line stability analysis study, rp 90-1,” Edison ElectricInstitute, Tech. Rep., 1970.

[32] C. Josz, S. Fliscounakis, J. Maeght, and P. Panciatici, “Ac power flowdata in matpower and qcqp format: itesla, rte snapshots, and pegase,”arXiv preprint arXiv:1603.01533, 2016.

[33] S. Fliscounakis, P. Panciatici, F. Capitanescu, and L. Wehenkel, “Con-tingency ranking with respect to overloads in very large power systemstaking into account uncertainty, preventive, and corrective actions,” IEEETransactions on Power Systems, vol. 28, no. 4, pp. 4909–4917, 2013.

[34] C. M. Bishop et al., Neural networks for pattern recognition. Oxforduniversity press, 1995.

Power Flow Balancing with Decentralized Graph Neural Networks

Documents

Transcript of Power Flow Balancing with Decentralized Graph Neural Networks