S3385 Porting Marine Ecosystem Model

download S3385 Porting Marine Ecosystem Model

of 18

Transcript of S3385 Porting Marine Ecosystem Model

  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    1/18

    Porting Marine Ecosystem Model Spin-Up UsingTransport Matrices to GPUs

    E. Siewertsen, J. Piwonski, T. Slawig

    CAU - Christian-Albrechts-Universitat zu Kiel

    19. Marz 2013

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 1 / 18

    http://find/http://goback/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    2/18

    Outline

    Motivation

    Algorithm

    Operations

    Software Porting to GPU

    Biogeochemical model (Fortran) Model driver (C)

    Results Remarks

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 2 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    3/18

    Motivation

    Global carbon cycle, CO2 uptake of the worlds oceans

    Parameter estimation in biogeochemical models

    -180 -135 -90 -45 0 45 90 135 180

    Longitude [degrees]

    -80

    -60

    -40

    -20

    0

    20

    40

    60

    80

    Latitude

    [deg

    rees]

    0.6

    0.8

    1.0

    1.2

    1.4

    1.6

    1.8

    2.0

    2.2

    Simulated concentration of nutrients (phosphate, P O34

    ) at surface layer in

    mmolP/m3. The longitudinal and latitudinal resolution is at 1.0.

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 3 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    4/18

    Motivation contd

    Ocean Biogeochemical Dynamics [Sarmiento and Gruber, 2006]

    System of transport equations for biogeochemical tracers:

    yit = (yi)

    diffusion

    (v yi) advection

    + qi(y, u) reaction

    , i = 1, . . .

    Climatological (annual periodic) forcing

    Solution is steady annual periodic state (equilibrium)

    Involves integration over thousands of model years

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 4 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    5/18

    Motivation contd

    Transport Matrix Method [Khatiwala et al., 2005]

    Discretized and approximated problem:

    yj+1 = A imp,j (A exp,j

    transport matrices

    yj + t qj(yj ,u)), j = 0, . . .

    Monthly averaged matrices provided

    Interpolation needed:

    A imp,j = j Ai[i,j ] + j Ai[i,j ]A exp,j = j Ae[i,j ] + j Ae[i,j ]

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 5 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    6/18

    Algorithm

    Assuming: 1 year 360 days and t 3 h

    1: y = y02: repeat

    3: for j = 1, . . . , 2880 do4: evaluate biogeochemical model: yq = qj(y,u)5: interpolate matrices to time step j6: perform explicit step: y = Aexp,j y7: perform implicit step: y = Aimp,j(y + tyq)

    8: end for9: until steady annual cycle is reached

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 6 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    7/18

    Operations

    Evaluate biogeochemical model: BGCStep(); Including copying between different data alignments

    Interpolate matrices:

    MatCopy(); MatScale(); MatAXPY();

    Apply explicit and implicit step: MatMult();

    99.4% of computational effort is spent by these operations.

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 7 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    8/18

    Porting to GPU

    Software: Metos3D [Piwonski and Slawig, 2013] github.com/metos3d PETSc based implementation in C [Balay et al., 1997] Using PETSc matrix and vector operations Coupling biogeochemical models implemented in Fortran

    Objectives: Do not change Fortran implementation at all

    Do not change C implementation, if possible

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 8 / 18

    http://localhost/var/www/apps/conversion/tmp/scratch_9/github.com/metos3dhttp://localhost/var/www/apps/conversion/tmp/scratch_9/github.com/metos3dhttp://find/http://goback/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    9/18

    Biogeochemical model

    Fortran implementation: Obtained PGI CUDA Fortran compiler license Using wrapper file model.CUF with Fortran kernels Including original through: #include "model.F" Using macro to change subroutine to

    attributes(device) subroutine

    Copying between data alignments: Thrust iterators

    Operator overloading

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 9 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    10/18

    Model Driver

    Objective: Do not change software implementation

    Translates to: Change library beneath

    PETSc-dev GPU enabled PETSc version MatMult() is already implemented [Minden et al., 2010] MatCopy(), MatScale(), MatAXPY() have to added ..

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 10 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    11/18

    PETSc contd

    PETSc classes

    Basic PETSc principles:

    Object oriented programming using C language: Data encapsulation Polymorphism Inheritance

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 11 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    12/18

    PETSc contd

    Class Mat

    Operations:

    struct MatOps {...

    PetscErrorCode (*mult)(Mat,Vec,Vec);

    PetscErrorCode (*copy)(Mat,Mat,...);

    PetscErrorCode (*scale)(Mat,PetscScalar);

    PetscErrorCode (*axpy)(Mat,PetscScalar,Mat,...);

    ...

    };

    Adding own implementation:

    M->ops->scale = MatScale SeqAIJCUSP;

    ...

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 12 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    13/18

    Results

    GPU: GeForce GTX 480 CPU: Intel Xeon E5520, running at 2.27 GHz

    Biogeochemical model: N-DOP [Dutkiewicz et al., 2005]

    Min Max Avg StdDevCPU 621.43 s 626.79 s 622.14 s 0.540GPU 28.17 s 28.20 s 28.18 s 0.003

    22.06

    Overall performance gain simulating one model year using the

    N-DOP model at a longitudinal and latitudinal resolution of2.8125.

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 13 / 18

    http://goforward/http://find/http://goback/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    14/18

    Results contd

    Performance gain per operation:

    Routine CPU GPU CPU : GPU

    BGCStep 469.76 s 13.05 s 36.00MatCopy 34.04 s 3.91 s 8.70MatScale

    23.33 s 1.99 s11.70

    MatAXPY 37.49 s 2.89 s 12.96MatMult 58.19 s 5.87 s 9.92

    Performance ofMatMult in detail: CPU: 1.2 GFlop/s (13 % of 9.08 GFlop/s), 12 GB/s (56.8 % of 21.2 GB/s) GPU: 11.9 GFlop/s (7 % of 168 GFlop/s), 119.4 GB/s (67.4% of 177 GB/s)

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 14 / 18

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    15/18

    Results contd

    B G C S t e p

    7 5 . 0 %

    M a t C o p y

    5 . 4 %

    M a t S c a l e

    3 . 7 %

    M a t A X P Y

    6 . 0 %

    M a t M u l t

    9 . 3 %

    O t h e r

    N - D O P m o d e l , C P U ( 6 2 6 . 0 7 s / a )

    B G C S t e p

    4 6 . 0 %

    M a t C o p y

    1 3 . 8 %

    M a t S c a l e

    7 . 0 %

    M a t A X P Y

    1 0 . 2 %

    M a t M u l t

    2 0 . 7 %

    O t h e r

    2 . 2 %

    N - D O P m o d e l , G P U ( 2 8 . 3 4 s / a )

    [ b l o c k s i z e 1 6 0 ]

    Distribution of computational time per operation during

    the simulation of one model year.

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 15 / 18

    l d

    http://find/http://goback/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    16/18

    Results contd

    1 01 7

    2 0 2 8 3 0 4 0 5 0 5 6 6 0

    p r o c e s s o r c o u n t

    2 8

    5 0

    1 0 0

    1 5 0

    t

    i

    m

    e

    p

    e

    r

    c

    y

    c

    l

    e

    [

    s

    ]

    S i m u l a t i o n o f o n e c y c l e ( N - D O P , 2 . 8 1 2 5 , 2 8 8 0 t i m e s t e p s )

    2 . 1 0 G H z A M D B a r c e l o n a ( r z c l u s t e r )

    2 . 6 7 G H z I n t e l W e s t m e r e ( r z c l u s t e r )

    2 . 9 3 G H z I n t e l G a i n e s t o w n ( H L R N )

    G e F o r c e G T X 4 8 0

    Performance of GPU simulation compared to different CPU clusters.

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 16 / 18

    R k

    http://find/http://goback/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    17/18

    Remarks

    PGI Fortran preprocessor didnt like: #define subroutine attributes(device) subroutine

    C preprocessor was used Only sequential PETSc operations were implemented

    Ongoing work on multi GPU implementation

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 17 / 18

    Th k

    http://find/
  • 7/27/2019 S3385 Porting Marine Ecosystem Model

    18/18

    Thanks

    Thanks for your attention!

    Special thanks to Dr. Ulrich Knechtel

    E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 18 / 18

    http://find/