Dokken Darmstadt March 2006_print

download Dokken Darmstadt March 2006_print

of 51

Transcript of Dokken Darmstadt March 2006_print

  • 8/3/2019 Dokken Darmstadt March 2006_print

    1/51

    1ICT Applied Mathematics

    The Graphics Processing Unit (GPU)as a high performance

    computational resource for simulationand geometry processing

    Tor Dokken

    Department of Applied Mathematics

    SINTEF Information and Communication Technology

    www.sintef.no/gpgpu/[email protected]

    March 10th 2006

  • 8/3/2019 Dokken Darmstadt March 2006_print

    2/51

    2ICT Applied Mathematics

    Graphics cards as a high-end

    computational resource

    Project funded by the Research Council of Norway

    Main partner: SINTEF ICT, Department of Applied Mathematics

    Period: 2004-2007

    Cooperation with a number of academic partners in Norway:

    Center of Mathematics for Application at the University of Oslo,Department of Informatics University of Oslo

    3 Ph.D. fellows (1. University of Oslo and 2 SINTEF)

    6 master students (January 2006-June 2007) (Will sit at SINTEF)

    Mathematics Department, University of Bergen

    1 Post. Doc

    Narvik University College, Narvik 3 master students in 2006, 7 in 2005, 1 in 2004

  • 8/3/2019 Dokken Darmstadt March 2006_print

    3/51

    3ICT Applied Mathematics

    Steps in the graphics pipeline

    Vertices aftertransformation andcoloring, per vertex

    operations

    (Vertex processor)

    Rasterization. Texturing and coloring,fragment processing.

    (Fragment processor)

  • 8/3/2019 Dokken Darmstadt March 2006_print

    4/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    5/51

    5ICT Applied Mathematics

    Vertex Processor

    Vertex processor capabilities:

    Vertex transformation. Normal transformation.

    Texture coordinate generation and transformation.

    Lighting calculations.

    Fully programmable.

    Processes 4-component vectors (xyzw)

    Can change the position of current vertex.

    Can not read info from other vertices.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    6/51

    6ICT Applied Mathematics

    Fragment Processor

    Main capabilities is to calculate the final color and/or depth

    to a fragment. Fully programmable.

    Processes 4-component vectors (rgba).

    Can read info from other fragments.

    Can not write to more than one pixel in the same buffer.

    Typically more useful than vertex processor for GPGPUpurposes.

    Limited number of registers for temporary variables in afragment shader (currently NVIDIA 32 registers)

  • 8/3/2019 Dokken Darmstadt March 2006_print

    7/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    8/51

    8ICT Applied Mathematics

    Programming Fragment Processor

    Specificities

    The programs running on the fragment processor arecalled fragment shaders A fragment shader can write to up to 4 render targets (textures)

    A fragment shader can read from many textures,

    but cannot read from a render target to which it is writing (we do notknow the sequence in which the fragments are processed)

    A render target can be converted to a texture to be used as inputin later fragment shaders

    When programming the fragment processor for non-graphical purposes, a default primitive covering the entire

    viewport should be defined to facilitate the execution ofthe fragment shaders. High level program languagesthrough DirectX or OpenGL

  • 8/3/2019 Dokken Darmstadt March 2006_print

    9/51

    9ICT Applied Mathematics

    Shading languages

    High Level Shading Language (HLSL, Microsoft) Part of the DirectX API and only compiles into DirectX code.

    Hardware independent.

    Windows only.

    Game industry.

    C for graphics (Cg, NVIDIA) Platform independent but hardware dependent.

    Cg and HLSL are very similar languages. Cg/HLSL was co-developed by NVIDIA and Microsoft.

    OpenGL Shading Language (GLSL, ARB) Platform and hardware independent.

    CAD, scientific visualization, movie industry, academic world...

  • 8/3/2019 Dokken Darmstadt March 2006_print

    10/51

    10ICT Applied Mathematics

    Using shaders

    1. Provide shader source code to OpenGL.

    2. Compile shader.3. Link compiled shaders together.

    4. Use program.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    11/51

    11ICT Applied Mathematics

    The GPU and a double loopBody of for loop

    Initiation of geometry (a quad)glBegin(GL_QUADS);

    glTexCoord2f(0.0f, 0.0f);glVertex3f(0.0f, 0.0f, 0.0f);glTexCoord2f(1.0f, 0.0f);

    glVertex3f(1.0f, 0.0f, 0.0f);glTexCoord2f(1.0f, 1.0f);glVertex3f(1.0f, 1.0f, 0.0f);glTexCoord2f(0.0f, 1.0f);glVertex3f(0.0f, 1.0f, 0.0f);

    glEnd();

    Initiation of a viewport

    glViewport( 0, 0, n, m);

    Definition of the clipping volume such that the quad is just inside

    the clipping volume.

    n x m fragments with texture coordinates in [0,1][0,1] will be executed

  • 8/3/2019 Dokken Darmstadt March 2006_print

    12/51

    12ICT Applied Mathematics

    Debugging shaders

    No normal debuggers for GPUs

    Debugging has to be based on reading values from textures andanalyzing these

    Visualizing the values of computational grid as an image can behelpful for understanding the behavior of the shader.

    Parallel CPU-implementations often very useful to verify that theshader works properly (and measure performance)

    The reason for an error can be:

    You may have an error in your algorithm.

    You may have misunderstood the functionality of the shaderlanguage.

    The driver for the GPU is not working properly.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    13/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    14/51

    14ICT Applied Mathematics

    Example: Heat Equation

    yyxxt uuu +=

    Discretication by finite differences over a regular grid:

    ( ).4 ,1,1,,1,12,1

    ,

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji UUUUUh

    kUU ++++= ++

    +

    Each fragment updated as a

    weighted sum of its nearest fiveneighbours.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    15/51

    15ICT Applied Mathematics

    Heat equation shaders

    [Heat Equation Fragment shader]

    varying vec4 texXcoord;

    varying vec4 texYcoord;

    uniform sampler2D heatTex;

    uniform float r;

    void main(void)

    {

    vec4 col;

    vec4 tex = texture2D(heatTex, texXcoord.yx);

    vec4 tex0 = texture2D(heatTex, texXcoord.zx);

    vec4 tex1 = texture2D(heatTex, texXcoord.wx);

    vec4 tex2 = texture2D(heatTex, texYcoord.xz);vec4 tex3 = texture2D(heatTex, texYcoord.xw);

    col = tex + r*(tex0+tex1-4.0*tex+tex2+tex3);

    gl_FragColor = vec4(col);

    }

    [Heat Equation Vertex shader]

    varying vec4 texXcoord;

    varying vec4 texYcoord;

    uniform vec2 dXY;

    void main(void)

    {

    texXcoord=gl_MultiTexCoord0.yxxx +

    vec4(0.0,0.0,-1.0,1.0)*dXY.x;texYcoord=gl_MultiTexCoord0.xyyy +

    vec4(0.0,0.0,-1.0,1.0)*dXY.y;

    gl_Position =

    gl_ModelViewProjectionMatrix*gl_Vertex

    ;

    }

    ( ).4 ,1,1,,1,12,1

    ,

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji UUUUUh

    kUU ++++= ++

    +

    Not using thestandard texture

    coordinates tomake more

    efficient code.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    16/51

    16ICT Applied Mathematics

    Heat Equation contd.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    17/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    18/51

    18ICT Applied Mathematics

    Wave Equation contd.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    19/51

    19ICT Applied Mathematics

    Example: Linear Wave Equation

    Discretication by finite differences over a regular grid:

    yyxxtt uuu +=

    ( ).42 ,1,1,,1,122

    1

    ,,

    1

    ,

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji

    n

    ji UUUUUh

    kUUU ++++= ++

    +

    Almost the heat equation, but

    needs extra texture to store thevalue at n-1

  • 8/3/2019 Dokken Darmstadt March 2006_print

    20/51

    20ICT Applied Mathematics

    2. order high-resolution scheme

  • 8/3/2019 Dokken Darmstadt March 2006_print

    21/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    22/51

    22ICT Applied Mathematics

    Systems of Conservation Laws

    Fundamental laws of physics: conservation of quantities like mass,momentum and energy.

    In arbitrary space dimension this reads:

    Example: Shallow water equations

    )()0,(,0)( 0 xQxQQfQt ==+

    =

    +

    +

    ++

    0

    0

    0

    2

    212

    2

    212

    yxtghhu

    huv

    hv

    huv

    ghhu

    hu

    hv

    hu

    h

  • 8/3/2019 Dokken Darmstadt March 2006_print

    23/51

    23ICT Applied Mathematics

    Lax-Friedrich

  • 8/3/2019 Dokken Darmstadt March 2006_print

    24/51

    24ICT Applied Mathematics

    Speedup - Lax-Friedrichs Runtime per time step and speedup factor for the CPU versus GPU

    implementation of Lax-Friedrichs

    26.75.54148.001024x1024

    25.21.4737.10512x51219.80.469.09256x256

    9.530.232.22128x128

    SpeedupGPU**

    ms

    CPU*

    ms

    N

    * 2.8 GHz Intel Xeon (EM64T)** GeForce 7800 GTX (450 MHz)

  • 8/3/2019 Dokken Darmstadt March 2006_print

    25/51

    25ICT Applied Mathematics

    Semi-Discrete High-Resolution Schemes

    Evolution of cell averages described by ODEs

    Steps in the algorithms:

    Reconstruction of piecewise polynomials from cell averages

    Evaluation of reconstruction at integration points

    Numerical computation of edge fluxes Evolution by Runge-Kutta scheme

    y

    tGtG

    x

    tFtFtU

    dt

    d jijijijiij

    +

    = ++

    )()()()()(

    2/1,2/1,,2/1,2/1

  • 8/3/2019 Dokken Darmstadt March 2006_print

    26/51

    26ICT Applied Mathematics

    2nd order high-resolution - Bottom Topography

    Initial wave map Initial bottom map

  • 8/3/2019 Dokken Darmstadt March 2006_print

    27/51

    27ICT Applied Mathematics

    2nd order high-resolution - Bottom Topography

  • 8/3/2019 Dokken Darmstadt March 2006_print

    28/51

    28ICT Applied Mathematics

    Bilinear Interpolation - Dry States

    Initial wave map Initial bottom map

  • 8/3/2019 Dokken Darmstadt March 2006_print

    29/51

    29ICT Applied Mathematics

    Bilinear Interpolation - Dry States

  • 8/3/2019 Dokken Darmstadt March 2006_print

    30/51

  • 8/3/2019 Dokken Darmstadt March 2006_print

    31/51

    31ICT Applied Mathematics

    Euler equations The two dimensional Euler equations model the dynamics of

    compressible gasses:

    denotes density, uand vvelocity in x- and y- directions,p pressureand Ethe total energy.

    The three dimensional

    0

    )()(

    2

    2

    =

    +

    ++

    +

    ++

    yxtpEv

    pv

    uv

    v

    pEu

    uv

    pu

    u

    E

    v

    u

  • 8/3/2019 Dokken Darmstadt March 2006_print

    32/51

    32ICT Applied Mathematics

    Example: Interaction of a low-density bubble

    with a shock.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    33/51

    33ICT Applied Mathematics

    Speedup of 2D shock-bubble on NxN cells

    16.57.62e-21.26e-019.91.48e-12.95e-01024

    17.11.72e-22.95e-120.83.32e-26.90e-1512

    24.74.37e-31.08e-120.08.69e-31.74e-1256

    13.61.38e-31.88e-211.83.70e-34.37e-2128

    speedup7800AMDspeedup6800IntelN

    Bilinear reconstruction

    14.42.99e-14.32e-09.37.14e-16.67e-01024

    15.06.86e-21.03e-09.41.78e-11.67e-0512

    19.81.74e-23.45e-18.44.99e-24.20e-1256

    17.24.60e-37.90e-28.61.22e-21.05e-1128

    speedup7800AMDspeedup6800IntelN

    CWENO reconstruction

  • 8/3/2019 Dokken Darmstadt March 2006_print

    34/51

    34ICT Applied Mathematics

    3D RayleighTaylor Instability.A layer of heavier fluid isplaced on top of a lighterfluid and the heavier fluid is

    accelerated downwards bygravity.

    11.51.72e-11.98e-081

    13.98.20e-21.14e-064

    12.64.16e-25.23e-149

    speedup7800AMDN

    N N N gridAverage time per time STEP

  • 8/3/2019 Dokken Darmstadt March 2006_print

    35/51

    35ICT Applied Mathematics

    Flow-chart for a GPU implementation of a semi-

    discrete, high resolution scheme.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    36/51

    36ICT Applied Mathematics

    Saturation equation The saturation equation models transport of two fluid-phases in a

    porous medium.

    Example: water injection into a fluvial reservoir filled with oil.

    ( ) ,0))()(( 0 =++ gsVsfs t

  • 8/3/2019 Dokken Darmstadt March 2006_print

    37/51

    37ICT Applied Mathematics

    Water injection in a fluvial reservoir

  • 8/3/2019 Dokken Darmstadt March 2006_print

    38/51

    38ICT Applied Mathematics

    Linear Advection

    Models various transport phenomena of (passive) quantities. Simpleequation, but difficult to compute solutions correctly.

    Classical schemes: excessive smearing or spurious oscillations.

    Need for more sophisticated schemes.

    1st order scheme: Lax-Friedrich 2nd order scheme: Lax-Wendroff

  • 8/3/2019 Dokken Darmstadt March 2006_print

    39/51

    39ICT Applied Mathematics

    Linear Advection contd.

    Modern high-resolution schemes:

    Higher-order approximation of smooth parts and no spuriousoscillations at discontinuities.

    Dissipative Compressive

  • 8/3/2019 Dokken Darmstadt March 2006_print

    40/51

    40ICT Applied Mathematics

    Semi-Discrete High-Resolution Schemes

    Evolution of cell averages described by ODEs

    Steps in the algorithms:

    Reconstruction of piecewise polynomials from cell averages

    Evaluation of reconstruction at integration points

    Numerical computation of edge fluxes

    Evolution by Runge-Kutta scheme

    y

    tGtG

    x

    tFtFtQ

    dt

    d jijijijiji

    +

    = ++

    )()()()()(

    2/1,2/1,,2/1,2/1

    ,

  • 8/3/2019 Dokken Darmstadt March 2006_print

    41/51

    41ICT Applied Mathematics

    CAD Intersections on the GPU

    Use the GPU to find if difficult intersection configurationsare present and create conjectures on the intersectionconfiguration. (SINTEF has applied for patent)

    Surface intersection check for loops

    check for singular intersections

    Surface self-intersection

    Is a self-intersection possible? Global self-intersection? Two different parts of the surface intersect

    Local self-intersection loop?

    Ridges with vanishing or near vanish surface normal possiblycombined with self-intersections.

    The GPU can help to determine if the self-intersectionneeds advanced intersection algorithms or simplerapproaches can be used.

  • 8/3/2019 Dokken Darmstadt March 2006_print

    42/51

    42ICT Applied Mathematics

    Global Self-intersection

    Two different parts of the surface intersect. Proper subdivisionchange the problem to a transversal intersections between twosurfaces.

    Trace of intersection in parameter domain

  • 8/3/2019 Dokken Darmstadt March 2006_print

    43/51

    S

  • 8/3/2019 Dokken Darmstadt March 2006_print

    44/51

    44ICT Applied Mathematics

    Self-intersection with ridges (Vanishing

    normal)

    The ridges do not in general follow constant parameter lines

    Typical for offset surfaces, duct type surfaces and draft-angle surface

    cannot be converted to a near-transversal intersection.

    Vanishing

    surface

    normal

    Self-

    intersectioncurve

    Example

    made by

    think3

    Partial trace of intersection in parameter domain

  • 8/3/2019 Dokken Darmstadt March 2006_print

    45/51

    45ICT Applied Mathematics

    Ridges in self-intersection Pipe surface

    Computationally expensive to solveby recursive subdivision.

    Points

    on

    ridge

    curve

    Parts of

    self-

    intersectionExamplemade bythink3

    Partial trace of intersection in parameter domain

    A f ( bi ) & l f

  • 8/3/2019 Dokken Darmstadt March 2006_print

    46/51

    46ICT Applied Mathematics

    A surface (cubic) & normal surface

    (quintic)

    GPU l l t d d i li d i t

  • 8/3/2019 Dokken Darmstadt March 2006_print

    47/51

    47ICT Applied Mathematics

    GPU calculated and visualized points on

    normal surface close to the origin

    The regions ith near anishing normal

  • 8/3/2019 Dokken Darmstadt March 2006_print

    48/51

    48ICT Applied Mathematics

    The regions with near vanishing normal

    visualized as texture on original surface

  • 8/3/2019 Dokken Darmstadt March 2006_print

    49/51

    49ICT Applied Mathematics

    Observations

    Most surfaces do not have self-intersection

    Important to identify most surfaces without self-intersections asearly as possible

    Moderate simplistic subdivision makes the surface and normalcone boxes smaller O(h2) convergence, and will classify manypossible self-intersections as not self-intersecting

    For surface with no vanishing surface normal all self-intersection curves intersect a boundary

    Vanishing surface normal identifies regions with morecomplex self-intersection topology

    Moderate simplistic subdivision reduce the size of such regions,and allows to focus a more complex self-intersection algorithms onsurface sub-regions.

    Comparison of CPU and GPU

  • 8/3/2019 Dokken Darmstadt March 2006_print

    50/51

    50ICT Applied Mathematics

    Comparison of CPU and GPU

    implementation of simplistic subdivisionInitial process for surface self-intersection

    Subdivision of a bicubic Bezier patch and (quintic) patch representing surfacenormals into 2n x 2n subpatches

    Test for degenerate normals for subpatches Computation of the approximate normal cones for subpatches

    Computation of the approximate bounding boxes for subpatches

    Computation of bounding box pair intersections for subpatches

    CPU/GPU approach

    For n

  • 8/3/2019 Dokken Darmstadt March 2006_print

    51/51

    51ICT Applied Mathematics

    Where are we now, and where do we go?

    During the first year (2004) of the project we gotexperiences on the use of the GPU, and the sort of

    problems best suited for the GPU We try to make the potential of the GPU understandable to

    personnel outside of computer graphics All concepts related to the GPU has a strong computer graphics

    flavor We will continue investigation on

    Partial differential equations

    Geometry problems - intersection

    Image processing Linear algebra