Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit)...

34
Computational physiCs Shuai Dong

Transcript of Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit)...

Page 1: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Computational physiCs

Shuai Dong

Page 2: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

High-performance computing

• PC, cluster, supercomputer• Parallelism by OpenMP• Parallelism by MPI• GPU programming• Numerical libraries

Page 3: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

PC: Personal Computer

• Usually 1 CPU (Central processing unit) per computer

• Even though, it is already very powerful.

x86(x86-64)-compatible microprocessors: the most used CPUs in PCs.

PentiumCore i3 i5, i7

Athlon, PhenomA4, A6, A10, FX

Page 4: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

workstation - powerfultower workstation

4U workstation

2U workstation 1U workstation

U= Rack unit

Page 5: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

1U rack • Usually more than 1 CPU per node.

• Intel: Xeon• AMD: Opteron,

EPYC

Page 6: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Non-x86/x86-64 compatible CPUs

• Intel: ItaniumIntel architecture 64

• IBM: Power(Performance Optimization With Enhanced RISC)

总参56所: ShenweiRISC architechture

• IOCT: CAS: LoongsonMIPS architechture

• Nvidia: TegraARM architecture

Page 7: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Cluster • More than 1 computer (node)

• Connect by network• Work together.

More Powerful than PCCheaper than supercomputer

Can be small:Blades

Page 8: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Transformers

• The philosophy of PC cluster.

Devastator挖地虎----> 大力神

Page 9: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Supercomputer

More CpusFaster connection

very expensive

Page 10: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Parallelism

Big problem? needs long CPU-time?The solution:• Simplify the problem!• Use a faster CPU!• Use more than 1 CPU! ---> Parallelism

Proverb: Many hands make light work!!!

Page 11: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Code example

• for(int i=0;i<1000000;i++)• {• a[i]=i;• }

If you only have one cpu:The process is:1. a[0]=0;2. a[1]=1;3. a[2]=2;........

If you have two cpus:You can divide the task to two cpus:1. a[0]=0; a[1]=1;2. a[2]=2; a[3]=3;3. a[4]=4; a[5]=5;........ (even) (odd)

Save half time!

Page 12: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

automatically parallelized by compilers

One process, several threads

Page 13: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Example: 9.1.openmp.cpp#include <iostream>#include <unistd.h>using namespace std;

int main(){

const int n=8;

#pragma omp parallel forfor(int i=0;i<n;i++){

cout<<i;sleep(1);

}cout<<endl;return 0;

}

g++ 9.1.openmp.cpptime ./a.outg++ -fopenmp 9.1.openmp.cpptime ./a.out

export OMP_NUM_THREADS=2

9.2.openmp2.cpp

Page 14: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

MPI-Message Passing Interface

• MPI is a language-independent communications protocol used to program parallel computers

Several software implementations:Open MPIMPICH/MPICH2LAM/MPIIntel MPIMicrosoft MPI

Page 15: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

code example: 9.3.mpi.cpp#include <iostream>#include <mpi.h>using namespace std;

int main(int argc, char *argv[]){

MPI_Init(&argc,&argv);int mpi_procs,mpi_rank;MPI_Comm_size(MPI_COMM_WORLD,&mpi_procs);MPI_Comm_rank(MPI_COMM_WORLD,&mpi_rank);

cout<<"It is the "<<mpi_rank<<" of total "<<mpi_procs<<endl;

MPI_Finalize();return 0;

}

Page 16: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Compile and run a MPI program

• Install one MPI implementation, e.g. OpenMPI

• mpic++ 9.3.mpi.cpp• mpirun -np 10 a.out

Page 17: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

MPI for molecular dynamics

• If you have N degrees of freedom during the molecular dynamics simulation, you can use m processors to share the task.

• e.g. 100 atoms in a 3D box, 300 degrees of freedom. We can use 20 cpus together, each of which deals with 5 atoms.

Page 18: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Darts method powered by MPI

Circle: pr2/4=p/4Square: r2=1

Darts in Circle p------------------= --Total Darts 4

Page 19: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

MPI Darts

Maybe, it is the earliest MPI darts invented by Mr. Liang Zhuge

The modern MPI dartsKatyusha, Soviet Union

Page 20: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

MPI Darts

• 9.4.mpidarts.cpp

• Another method:• 9.5.mpidarts2.cpp

Page 21: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

The working mechansim of MPI

• Make n copy of program.• Each copy holds one process• All copies (processes) run parallel• Processes can communicate between each

other

Page 22: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

To use GPU

• GPU: Graphic Processing Unit

Very powerfulfor molecular dynamics simulations

Page 23: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

GPU Programming• The use of Graphics Processing Units for

rendering is well known, but their power for general parallel computation has only recently been explored.

• Parallel algorithms running on GPUs can often achieve up to 100x speedup over similar CPU algorithms, with many existing applications for physics simulations, signal processing, financial modeling, neural networks, and countless other fields.

Page 24: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Power of GPU

Page 25: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

GPU computing language

• Open general-purpose GPU computing language: OpenCL (Open Computing Language)

• Proprietary framework: Nvidia's CUDA since 2006.

• CUDA (Compute Unified Devices Architecture)

• Learn by yourself

Page 26: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Numerical libraries• fftw: Fastest Fourier transfer in the West • blas: Basic Linear Algebra Subprograms• lapack: Linear Algebra Package• mkl: Math Kernel Libraries• acml: AMD Core Math Libraries• .......

Page 27: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

some basic knowledge• source file main.cc func.cc• object file main.o func.o (func.obj)• dynamic library

libfunc.so (libfunc.dll)• static library

libfunc.a (libfunc.lib)• exectuable program• a.out (a.exe)• compile & link

Page 28: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

code example: 9.6.lib.cpp

• double dabs(double d)• {

if(d<0) d=-d;return d;

• }

• g++ -c 9.6.lib.cpp• g++ -shared -fPIC 9.6.lib.o -o libabs.so• ar crv libabs.a 9.6.lib.o

Page 29: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

code example: 9.6.link.cpp• #include <iostream> • using namespace std;• double dabs(double d);

• int main()• {

double a=0;cout<<"Please input the number:\t";cin>>a;cout<<"The absolute value is:\t"<<dabs(a) <<endl;return 0;

• }

g++ 9.6.link.cppg++ 9.6.link.cpp -L. -labsg++ 9.6.link.cpp libabs.aldd a.out

environment variableLD_LIBRARY_PATH

Page 30: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Using lapack• 9.7.Diagonalization.cpp

Call a Fortran function in C/C++ code.

extern "C"{ void dsyev_(char *jobz,char *uplo,int *n,double *a,int

*lda,double *w,double *work,int *lwork,int *info);}

g++ 9.6.Diagonalization.cpp -llapack

environment variableLD_LIBRARY_PATH

Page 31: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Tips

• Find the usage of lapack functions in the website: http://www.netlib.org/lapack/ or the manual of MKL.

• For symmetric/hermitian matrix, only upper/lower triangular matrix elements are used. But take care the difference between C/C++ and Fortran, especially for the hermitian matrix.

Page 32: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Read the manual

• http://www.netlib.org/lapack/explore-html/dd/d4c/dsyev_8f.html

• http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/dsyev.htm

Page 33: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Another example

• To use FFTW library to do FFT.

• http://www.fftw.org/

9.8.FFTW.cpp

Page 34: Computational physiCs Shuai Dong...PC: Personal Computer •Usually 1 CPU (Central processing unit) per computer •Even though, it is already very powerful. x86(x86-64)-compatible

Format of a paper

TitleAuthor

AffiliationDate

• Abstract• Keywords

• Main body (Introduction, method/algorithm, result and discussion, summary, acknowledgment, including figures and tables)

• ReferencesSupplementary: your_code