1. REPORT DATE 2. REPORT TYPE Briefing Charts 4. TITLE AND … · 2012-04-12 · Conclusion and...
Transcript of 1. REPORT DATE 2. REPORT TYPE Briefing Charts 4. TITLE AND … · 2012-04-12 · Conclusion and...
REPORT DOCUMENTATION PAGE Form Approved
OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD-MM-YYYY) 15-12-2011
2. REPORT TYPEBriefing Charts
3. DATES COVERED (From - To)
4. TITLE AND SUBTITLE
5a. CONTRACT NUMBER
Development of a Flow Solver with Complex Kinetics on the Graphic Processing 5b. GRANT NUMBER
Units (GPU) 5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
Hai P. Le and Jean-Luc Cambier
5f. WORK UNIT NUMBER23041057
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
8. PERFORMING ORGANIZATION REPORT NUMBER
Air Force Research Laboratory (AFMC) AFRL/RZSS 1 Ara Drive Edwards AFB CA 93524-7013
9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
Air Force Research Laboratory (AFMC) AFRL/RZS 11. SPONSOR/MONITOR’S
5 Pollux Drive NUMBER(S) Edwards AFB CA 93524-7048 AFRL-RZ-ED-VG-2011-581
12. DISTRIBUTION / AVAILABILITY STATEMENT Distribution A: Approved for public release; distribution unlimited. PA# 111067.
13. SUPPLEMENTARY NOTES For presentation at the 50th AIAA Aerospace Sciences Meeting, Nashville, TN, 9-12 Jan 2012.
14. ABSTRACT In the current work, we have implemented a numerical solver on the Graphic Processing Units (GPU) to solve the reactive Euler equations with detailed chemical kinetics. The solver incorporates high-order finite volume methods for solving the fluid dynamical equations and an implicit point solver for the chemical kinetics. Generally, the computing time is dominated by the time spent on solving the kinetics which can be benefitted from the computing power of the GPSs. Preliminary investigation shows that the performance of the kinetics solver strongly depends on the mechanism used in the simulations. The speed-up factor obtained in the simulation of an ideal gas ranges from 30 to 55 compared to the CPU. For a 9-species gas mixture, we obtained a speed-up factor of 7.5 to 9.5 compared to the CPU. For such a small mechanism, the achieved speed-up factor is quite promising. This factor is expected to go much higher when the size of the mechanism is increased. The numerical formulation for solving the reactive Euler equations is briefly discussed in this paper along with the GPU implementation strategy. We also discussed some preliminary performance results obtained with the current solver.
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF:
17. LIMITATION OF ABSTRACT
18. NUMBER OF PAGES
19a. NAME OF RESPONSIBLE PERSON Dr. Jean-Luc Cambier
a. REPORT Unclassified
b. ABSTRACT Unclassified
c. THIS PAGE Unclassified
SAR
30 19b. TELEPHONE NUMBER (include area code) N/A
Standard Form 298 (Rev. 8-98)Prescribed by ANSI Std. 239.18
Development of a Flow Solver with ComplexKinetics on the Graphic Processing Units (GPU)
Hai P. Le1, Jean-Luc Cambier2
1Mechanical and Aerospace Engineering DepartmentUniversity of California, Los Angeles
2Air Force Research LaboratoryEdwards AFB, CA
January 11, 2012
50th AIAA Aerospace Sciences Meeting, Nashville, Tennessee
Distribution A: Approved for Public Release; Distribution Unlimited
Table of Contents
1 Objectives & Motivation
2 Approach
3 GPU Implementation
4 Results
5 Conclusion and Future Works
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Objectives
Develop a fluid code on the GPU for modeling flows withcomplex chemical kinetics. The entire code is written usingCUDA C/C++ for maximum flexibility.
Explore different strategies for optimizing the performance ofthe code for a general chemistry mechanism.
Emphasis on the kinetics solver since it is morecomputationally expensive.
Benchmark with standard test cases.
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Motivation
Detail study of non-equilibrium processes associated withhigh-speed flow.
Detonation instabilityPartially ionized gasMHD
Development of a multi-physics code utilizing Object-Orientedand CUDA technology. Both of these features are available inCUDA C/C++.
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Governing Equations
Euler equations with source term for chemical kinetics
∂Q
∂t+
1
V
∮
SFndS = Ω (1)
Q =
ρsρuρvρwE
; Fn =
ρsUn
Pnx + ρuUn
Pny + ρvUn
Pnz + ρwUn
UnH
; Ω =
ωs
000
−∑s ωse0s
Solution method:
Finite Volume method for hyperbolic conservation laws
Source terms are solved by using operator splitting technique
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Numerical Schemes
Monoticity Preserving1 (MP) Schemes
3rd and 5th order spatial discretization was used inconjunction with 3rd order TVD-Runge-Kutta time integration
Arbitrary Derivative Riemann solver with Weighted EssentialNon-Oscillatory2 (ADERWENO) scheme
5th order spatial and 3rd order temporal without Runge-Kuttatime integrationUtilizes Cauchy-Kowalewski procedure and Taylor seriesexpansion of WENO fluxes for high order in time
1Suresh & Huynh (1997) J. Comp. Phys. 136, 83-992Titarev & Toro (2001) J. Comp. Phys. 204, 715-736
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Chemical Kinetics
Implicit formulation
dQ
dt= Ω→
(I −∆t
∂Ω
∂Q
)dQ
dt= Ω (2)
Elementary Reaction:∑
s
ν ′rs [Xs ] ∑
s
ν ′′rs [Xs ] (3)
Species production/destruction rate
ωs =∑
r
νrsKfr
∏
s
[Xs ]ν′rs −
∑
r
νrsKbr
∏
s
[Xs ]ν′′rs (4)
whereνrs = ν
′′rs − ν
′rs
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Graphic Processing Unit
What is GPU?
Graphic processing units containing a massive amount ofprocessing cores
Designed specifically for graphic rendering which is a highlyparallel process
Why GPU?
GPU is faster than CPU on SIMD execution model
GPU is now very easy to program
GPU is much cheaper than CPU
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
GPU versus CPU
0
200
400
600
800
1000
1200
1400
2003 2004 2005 2006 2007 2008 2009 2010
TheoreticalPeakGFLOP/s
NVIDIA GPU (Single)Intel CPU (Single)
NVIDIA GPU (Double)Intel CPU (Double)
Figure 1: Single and double precision floating point operation capability of GPU and CPU from2003-2010 (adapted from NVIDIA[8])
attempts had been made in writing scientific codes on the GPU, and promising results wereobtained both in terms of performance and flexibility.[1, 6]
In this work, we attempt to follow the same path on the design of numerical solvers onthe GPUs. However, our focus is different from the previous attempts such that we placemore attention to the kinetics solver than the fluid dynamics. This is due to the fact that forthe simulation of high-speed fluid flow, the computation is dominated by solving the kinetics.While the current implementation is only for chemical kinetics, we aim to extend it to a morecomplicated and computationally intensive kinetics model for plasma.
2 Governing Equations
The set of the Euler equations for a reactive gas mixture can be written as
∂Q
∂t+∇ · F = Ω (1)
where Q and F are the vectors of conservative variables and fluxes. We assumed that there isno species diffusion and the gas is thermally equilibrium (i.e., all species have the same velocityand all the internal energy modes are at equilibrium). The right hand side (RHS) of Eq. 1denotes the vector of source terms Ω. In this case, the source terms are composed of exchangeterms due to chemical reactions. By applying Gauss’s law to the divergence of the fluxes, onecould obtain
∂Q
∂t+
1
V
∮
S
FndS = Ω (2)
2
Figure: Single and double precision floating point operation capability ofGPU and CPU from 2003-2010
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
GPU Programming
Programming languages for GPU: CUDA, OpenCL,DirectCompute, BrookGPU, ...CUDA is the most mature programing environment for GPU.
similar to C/C++
support OO features
easy to debug
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
GPU Programming Model
Each device contains a set of streaming multi-processor (SM).Each SM contains a set of streaming processors (SP).Parallel based on grid and thread blocksExecution instruction called kernel
6
A Quick Refresher on CUDA
CUDA is the hardware and software architecture that enables NVIDIA GPUs to execute
programs written with C, C++, Fortran, OpenCL, DirectCompute, and other languages. A
CUDA program calls parallel kernels. A kernel executes in parallel across a set of parallel
threads. The programmer or compiler organizes these threads in thread blocks and grids of
thread blocks. The GPU instantiates a kernel program on a grid of parallel thread blocks.
Each thread within a thread block executes an instance of the kernel, and has a thread ID
within its thread block, program counter, registers, per-thread private memory, inputs, and
output results.
A thread block is a set of
concurrently executing threads
that can cooperate among
themselves through barrier
synchronization and shared
memory. A thread block has a
block ID within its grid.
A grid is an array of thread
blocks that execute the same
kernel, read inputs from global
memory, write results to global
memory, and synchronize
between dependent kernel calls.
In the CUDA parallel
programming model, each
thread has a per-thread private
memory space used for register
spills, function calls, and C
automatic array variables. Each
thread block has a per-Block
shared memory space used for
inter-thread communication,
data sharing, and result sharing
in parallel algorithms. Grids of
thread blocks share results in
Global Memory space after
kernel-wide global
synchronization.
CUDA Hierarchy of threads, blocks, and grids, with corresponding
per-thread private, per-block shared, and per-application global
memory spaces.
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
GPU Programming Model
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
CFD
CFD:
Cell-based parallelization: EOS, time marching, etc.
Face-based parallelization: Reconstruction, flux, etc.
Strategies:
Global memory
large but high latency; requires coalesced access
Shared memory
small but very fast; not useful in this case since NQ ∼ Ns
Reduce block occupancy to utilize more registers3.
3Volkov (2010) Better Performance at Lower Occupancy, GPU Tech. Conf.Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Chemical Kinetics
Main strategies
Coalesce memory access pattern for high global memorybandwidth
Utilize shared memory to reduce DRAM latency
Texture binding for read-only data
Issues:
How to overcome shared memory limitation?
How effective is global memory?
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Summary of Steps in Gaussian Elimination Algorithm
Forward substitution:
for np = 1:N-1
for ns = np+1:N
P := A(ns,np)/A(np,np)
RHS(ns) := RHS(ns)-RHS(np)*P
for ms = np+1:N
A(ns,ms) := A(ns,ms)-A(np,ms)*P
Backward substitution:
for np = N-1:1
P := 0
for ns=np+1:N
P := P+A(np,ns)*RHS(ns)
RHS(np) := (RHS(np)-P)/A(np,np)
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Shared Memory Limit
How many kinetics system can we put on shared memory (48KB/CUDA block)?
solved at every computational cell. Storing the whole Jacobian and the RHS vector on shared memory isnot an ideal situation here. Figure 4 shows the memory requirement for storing the Jacobian and RHS onthe typical shared memory (48 KB for a Tesla C2050/2070). If we associated an entire thread block to thechemical system in a cell, the number of species is limited to 75. Storing more than 1 system per blockmakes this limit go even lower, as can be seen for 1 thread per cell (32 threads block size).
32 cells/CUDA block
1 cell/CUDA blockMatrix
Size(K
B)
Numbers of Species
Shared Memory Limit
0 20 40 60 800
10
20
30
40
50
60
70
Figure 4: Chemistry size limit
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Reduced Storage Pattern
Store one row of matrix in shared memory for each row elimination
S
Shared Memory (Reduced)Shared Memory (Full)Global Memory
Maxim
um
NumbersofSpecies
Numbers of Elements
Shared Mem. Limit
Shared Mem. Limit
102 103 104 105 1060
500
1000
1500
2000
2500
3000
Figure 5: Chemistry size limit
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Algorithms
Algorithm 1: store matrix data on global memory andcoalesce memory access pattern
Inverse several matrices per CUDA block
Algorithm 2: store part of matrix data (one row at a time) onshared memory
Load and reload after row pivotingInverse one matrix per CUDA block
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
CFD Results: Forward Step
Mach 3 flow over a step with reflective boundary on top
No special treatment at the corner of the step
MP5 scheme with RK3 using 630,000 cells
Figure 4: Forward step problem
The second test is the shock diffraction problem which is modeled as the diffraction of ashock wave (M = 2.4) down a step[12]. The diffraction results in a strong rarefaction generatedat the corner of the step which can cause a problem of having negative density when performingthe reconstruction. The problem is modeled using 27,000 cells. The numerical simulation isshown in pair with the experimental images in figure 5. It has been shown that the solver wasable to reproduce the correct flow features in the region of the rarefaction fan.
Figure 5: Diffraction of a Mach 2.4 shock wave down a step. Comparison betweennumerical schlieren and experimental images
We also modeled the Rayleigh-Taylor instability problem. The problem is described as theacceleration of a heavy fluid into a light fluid driven by gravity. For a rectangular domain of
9
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
CFD Results: Backward Step
Mach 2.4 shock diffracted from a step
MP5 scheme with RK3 using 300,000 cells
Comparison with experiment shows excellent agreement
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
CFD Results: Rayleigh-Taylor Instability
Acceleration of a heavy fluid to a lighter fluidMP5 scheme with RK3 using 1.6M cellsContact discontinuity well resolved; evidence of fine scaleinstability structure
[0, 0.25]× [0, 1], the initial conditions are given as follows:
ρ = 2, u = 0, v = −0.025 cos(8πx), P = 2y + 1 for 0 ≤ y ≤ 1
2
ρ = 1, u = 0, v = −0.025c cos(8πx), P = y +3
2for
1
2≤ y ≤ 1
where c is the speed of sound.The top and bottom boundaries are set as reflecting and the left and right boundaries are
periodic. As the flow progress, the shear layer starts to develop and the Kelvin-Helmholtz insta-bilities become more evident. The gravity effect is taken into account by adding a source termvector which modifies the momentum vector and the energy of the flow based on gravitationalforce. The source term in this case is relatively simple and contributes very little to the overallcomputational time. The performance of the fluid dynamics calculation is discussed in the nextsection of this report.
10
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Cellular Detonation
Test setup:
Wall sparked ignition (P = 40 atm; T = 1500 K) withpremixed Stoichiometric Mixture of H2Air
Contact discontinuity initially disturbed in 2-D simulation
Maas and Warnatz4 H2-O2 reaction mechanism
Code Validation Detonations Detonation w/ MHD Simple Detonation
Detonation Test Setup
L
30 cm
P = 1 atmT = 300K
T = 1500 KP = 40 atm
• Premixed Stoichiometric Mixture of H2−Air 11 species and 38 reaction mechanisms
• Closed Ends
• Spark ignited (L = 0.25 cm)
Distribution A: For Public Release; Distribution Unlimited
4Maas, U. and J. Warnatz (1988). Combust. Flame 74, 5369.Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Cellular Detonation
Pressure and temperature evolution of flow field
Cellular structure developed due to to flame front instability
Figure 8: Rayleigh-Taylor instability computed with the MP5 scheme on a 400× 1600 grid
Figure 9: Evolution of the pressure and temperature in a 2D detonation simulation
11 of 18
American Institute of Aeronautics and Astronautics
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Performance Results: Algorithm 1
How effective is global memory access?
Theoretical PeakNon-coalescedCoalesced
Mem
ory
Bandwidth
(GB/s)
Numbers of Species0 50 100 150 200
0
50
100
150
200
Figure 11: Memory bandwidth of the kinetics solver measured on a Tesla C2050Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Performance Results: Algorithm 1 vs. 2
Measurement of the performance of the kinetics solver fordifferent species sizes.Grid size is varied due to limitation of global memory
Global MemoryShared Memory
Speed-up
Numbers of Species
32× 32256× 256
128× 128 64× 64
64× 64
128× 128
512× 512
0 50 100 150 200 250 300 350 400 4505
10
15
20
25
30
35
40
45
Figure 12: Comparison of the speed-up factor obtained from the kinetics solver using both global and shaDistribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Performance Results: CFD
ADERWENO shows substantial advantages over the MP5 dueto single step integration
Figure 8: Two dimensional simulation of a detonation wave
5.2 Performance Results
Figure 9 shows the performance of the solver for the simulation considering only the fluiddynamics aspect using the MP5 and the ADERWENO schemes. The speed-ups obtained inboth cases are very promising. Since the ADERWENO scheme only requires single-stage timeintegration, it is expected that it is faster than the MP5 scheme. It can be seen in Figure 9that the speed up scales almost linearly with the number of computational cells. For theADERWENO scheme, we can obtain almost 60 times speed-up for a large grid which is abouttwice faster than the MP5 scheme. All the comparisons are made between a Tesla C2070 andan Intel Xeon X5650 using double precision calculation.
ADERWENO
MP5
Speed-up
Numbers of Elements
103 104 1050
10
20
30
40
50
60
70
Figure 9: Performance of the fluid dynamics simulation only
Figure 10 shows the performance of only the kinetics solver (i.e., no convection). The testis performed considering only solving the kinetics system. Although constructing the kinetics
12
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Performance
Speed-up obtained for a larger mechanism (CH4 − O2) isnearly 40 times faster
36 species, 308 reactions9 species, 38 reactions
Speed-up
Numbers of Elements
103 1045
10
15
20
25
30
35
40
45
50
Figure 12: Performance of the reactive flow solver for two different chemistry mechanism
6 Conclusion and Future Works
In the current paper, we show the implementation of a numerical solver for simulating chemicallyreacting flow on the GPU. The fluid dynamics is modeled using high-order finite volume schemes,and the chemical kinetics is solved using an implicit solver. Results of both the fluid dynamicsand chemical kinetics are shown. Considering only the fluid dynamics, we obtained a speed-up of 30 and 55 times compared to the CPU version for the MP5 and ADERWENO scheme,respectively. For the chemical kinetics, we present two different approaches on implementing theGaussian elimination algorithm on the GPU. The best performance obtained by only solvingthe kinetics problem ranges from 30-40 depending on the size of the chemical mechanism. Whenthe fluid dynamics is coupled with the kinetics, we obtained a speed-up factor of 40 times for a9-species gas mixture with 38 reactions. The solver is also tested with a larger mechanism (36species, 308 reactions) and the performance obtained is faster than the small mechanism.
The current work can be extended in different ways. First, since the framework is performingwell in shared memory architecture, it is possible to also extend it to distributed memoryarchitecture utilizing Message Passing Interface (MPI). The extension permits using multi-GPUwhich is attracted for performing large-scale simulations. On the other hand, although thecurrent simulation is done for chemically reacting flow, it is desired to extend it to simulateionized gas (i.e. plasma) which requires additional modeling process (Collisional-Radiative) tomodel the changes in the different excited level of the charged species. In adidition, the governingequations also need to be extended to characterize the thermal non-equilibrium environment ofthe plasma. Given that the physics has been well established[2, 7, 4], the extension does notpresent a priori challenges.
15
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Conclusion and Future Works
Accomplishment:
Basic CFD framework for fluid simulation with detailedchemical kinetics.
Performance obtained in both cases are very promising: up to60 times for non-reacting flow and up to 40 for reacting flow
Future Works:
Extension to Multi-GPU using MPI
Collisional-Radiative kinetics for partial ionized gas
MHD simulation for electromagnetic field effects
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier
Objectives & MotivationApproach
GPU ImplementationResults
Conclusion and Future Works
Acknowledgements and Questions
AFRL
Mr. David Bilyeu
Dr. Justin Koo
UCLA
Mr. Lord Cole
Prof. Ann Karagozian
Questions?
Distribution A: Approved for Public Release; Distribution Unlimited H. P. Le and J.-L. Cambier