Go Native : Squeeze the juice out of your 64-bit processor using C++

Post on 26-Jun-2015

737 views 0 download

Tags:

Transcript of Go Native : Squeeze the juice out of your 64-bit processor using C++

Go Native

Squeeze the juice out of your 64-bit processor

using…

Go Native

Squeeze the juice out of your 64-bit processor

using…

C/C++

Who am I

Who am I

-> Fernando Moreira ( @fpmore )

Who am I

-> Fernando Moreira ( @fpmore )

-> MSc student @ FEUP

Who am I

-> Fernando Moreira ( @fpmore )

-> MSc student @ FEUP

-> Undergraduate Researcher @ Porto Interactive Center

Who am I

-> Fernando Moreira ( @fpmore )

-> MSc student @ FEUP

-> Undergraduate Researcher @ Porto Interactive Center

-> Microsoft Student Partner Lead @ M$ PT

Who am I

-> Fernando Moreira ( @fpmore )

-> MSc student @ FEUP

-> Undergraduate Researcher @ Porto Interactive Center

-> Microsoft Student Partner Lead @ M$ PT

-> I’ve doing C++ for over… 5y

Who are you ?

Who are you ?

-> Norte

Who are you ?

-> Norte . Centro

Who are you ?

-> Norte . Centro . Sul

Who are you ?

-> Norte . Centro . Sul . Açores

Who are you ?

-> Norte . Centro . Sul . Açores . Madeira

Who are you ?

-> Norte . Centro . Sul . Açores . Madeira . FMI

Who are you ?

-> Norte . Centro . Sul . Açores . Madeira . FMI

-> Who has experience with C?

Who are you ?

-> Norte . Centro . Sul . Açores . Madeira . FMI

-> Who has experience with C? And with C++?

Who are you ?

-> Norte . Centro . Sul . Açores . Madeira . FMI

-> Who has experience with C? And with C++?

-> Who has experience with 64bit native dev?

Talk’s Schedule

int main( int argc, char **argv ) {

try {

} catch( Timeout &e ) { return -1; }

return 0;

}

Talk’s Schedule

int main( int argc, char **argv ) {

try {

introducing_x64();

} catch( Timeout &e ) { return -1; }

return 0;

}

Talk’s Schedule

int main( int argc, char **argv ) {

try {

introducing_x64();

advantagesOver_x86();

} catch( Timeout &e ) { return -1; }

return 0;

}

Talk’s Schedule

int main( int argc, char **argv ) {

try {

introducing_x64();

advantagesOver_x86();

nativeDev_x64( const Topic &t );

} catch( Timeout &e ) { return -1; }

return 0;

}

Promising not to change the topic.

Talk’s Schedule

int main( int argc, char **argv ) {

try {

introducing_x64();

advantagesOver_x86();

nativeDev_x64( const Topic &t );

codeAnalysis_and_DebugTools();

} catch( Timeout &e ) { return -1; }

return 0;

}

Talk’s Schedule

int main( int argc, char **argv ) {

try {

introducing_x64();

advantagesOver_x86();

nativeDev_x64( const Topic &t );

codeAnalysis_and_DebugTools();

costProspectionOn_x64Dev();

} catch( Timeout &e ) { return -1; }

return 0;

}

introducing_x64()

introducing_x64()

-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…

introducing_x64()

-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…

-> Notice : IA-64 ≠ AMD64

introducing_x64()

-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…

-> Notice : IA-64 ≠ AMD64

-> AMD64 is backwards compatible with x86 (IA-64 isn’t)

introducing_x64()

-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…

-> Notice : IA-64 ≠ AMD64

-> AMD64 is backwards compatible with x86 (IA-64 isn’t)

-> Some Hardware: Phenom, Athlon 64, Core-iX, Core 2, …

introducing_x64()

-> The names : x64, x86-64, AMD64, Intel 64, IA-64, etc…

-> Notice : IA-64 ≠ AMD64

-> AMD64 is backwards compatible with x86 (IA-64 isn’t)

-> Some Hardware: Phenom, Athlon 64, Core-iX, Core 2, …

-> Some OS’s : Win(XP.Vista.7), OSX, Several Linux distros.

introducing_x64()

This talk will be focused on the AMD64 architecture.

advantagesOver_x86()

advantagesOver_x86()

-> Address space : Theoretical limit of 16 ExaBytes (2^64)

advantagesOver_x86()

-> Address space : Theoretical limit of 16 ExaBytes (2^64)

-> More available registers. (there’s one called RIP)

advantagesOver_x86()

-> Address space : Theoretical limit of 16 ExaBytes (2^64)

-> More available registers. (there’s one called RIP)

-> Larger instruction set with emphasis on SIMD

advantagesOver_x86()

-> Address space : Theoretical limit of 16 ExaBytes (2^64)

-> More available registers. (there’s one called RIP)

-> Larger instruction set with emphasis on SIMD

-> SSE1, SSE2, and SSE3 are always there

advantagesOver_x86()

-> Address space : Theoretical limit of 16 ExaBytes (2^64)

-> More available registers. (there’s one called RIP)

-> Larger instruction set with emphasis on SIMD

-> SSE1, SSE2, and SSE3 are always there

-> Unified function calling convention

advantagesOver_x86()

Can run x86 environmentsCan run x86 binaries under x64

environments

On Windows: . 32bit processes can’t load 64bit DLLs for execution

. 64bit processes can’t load 32bit DLLs for execution

nativeDev_x64( how_it_looks_like )

nativeDev_x64( how_it_looks_like )

-> A valid, yet useless, 64bit application.

int main( int argc, char **argv } { return 0;}

nativeDev_x64( how_it_looks_like )

-> A valid, yet useless and dangerous, 64bit application.

int main( int argc, char **argv } {

size_t external_debt = SIZE_MAX; int *ptr = &external_debt; *ptr = 0;

return 0;}

nativeDev_x64( how_it_looks_like )

-> A valid, yet useless and dangerous, 64bit application.

int main( int argc, char **argv } {

size_t external_debt = SIZE_MAX; int *ptr = &external_debt; *ptr = 0;

return 0;}

nativeDev_x64( data_model )

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)

-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)

-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )

Can you see the data portability problem?

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)

-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )

Suggestions: Use conditional compilation and type aliasing.

nativeDev_x64( data_model )

-> On Microsoft Win64 : LLP64 model

-> On Linux : LP64 model

-> LLP64: short( 2 ), int( 4 ), long( 4 ), ptr( 8 ), long long(8)

-> LP64: short( 2 ), int( 4 ), long( 8 ), ptr( 8 ), long long( 8 )

Suggestions: Use conditional compilation and type aliasing. Make conscious usage of the sizeof operator.

nativeDev_x64( data_model )

-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )

nativeDev_x64( data_model )

-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )

-> On x64 : ptr( 8 ), size_t( 8 ), ptrdiff_t( 8 )

nativeDev_x64( data_model )

-> On x86 : ptr( 4 ), size_t( 4 ), ptrdiff_t( 4 )

-> On x64 : ptr( 8 ), size_t( 8 ), ptrdiff_t( 8 )

These ones will increase memory usage…

But will be performance-wise.

nativeDev_x64( common_pitfalls )

nativeDev_x64( common_pitfalls )

-> Usage of magic numbers & bit-wise ops: 0x7fffffff

nativeDev_x64( common_pitfalls )

-> Usage of magic numbers & bit-wise ops: 0x7fffffff

-> Functions with variable number of arguments : printf

nativeDev_x64( common_pitfalls )

-> Usage of magic numbers & bit-wise ops: 0x7fffffff

-> Functions with variable number of arguments : printf

-> Virtual functions

nativeDev_x64( common_pitfalls )

-> Usage of magic numbers & bit-wise ops: 0x7fffffff

-> Functions with variable number of arguments : printf

-> Virtual functions

-> Data exchange between x86 and x64 apps

nativeDev_x64( common_pitfalls )

-> Usage of magic numbers & bit-wise ops: 0x7fffffff

-> Functions with variable number of arguments : printf

-> Virtual functions

-> Data exchange between x86 and x64 apps

-> Data misalignment : SSE requires 16-byte alignment

nativeDev_x64( optimization_tips )

nativeDev_x64( optimization_tips )

-> Use native types for loops or tight data usage

nativeDev_x64( optimization_tips )

-> Use native types for loops or tight data usage

-> Use 16-byte alignment for SSE loads and stores

nativeDev_x64( optimization_tips )

-> Use native types for loops or tight data usage

-> Use 16-byte alignment for SSE loads and stores

-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned

nativeDev_x64( optimization_tips )

-> Use native types for loops or tight data usage

-> Use 16-byte alignment for SSE loads and stores

-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned

-> *Use* intrinsics : #include <immintrin.h>

nativeDev_x64( optimization_tips )

-> Use native types for loops or tight data usage

-> Use 16-byte alignment for SSE loads and stores

-> Heap-allocs in Win64 and XBOX360 are 16-byte aligned

-> *Use* intrinsics : #include <immintrin.h>

-> Unroll loops and sort object’s member data by their size

nativeDev_x64( real-world_tips )

nativeDev_x64( real-world_tips )

-> Don’t sacrifice your software architecture.

nativeDev_x64( real-world_tips )

-> Don’t sacrifice your software architecture.

-> Don’t use it if you don’t know how to.

nativeDev_x64( real-world_tips )

-> Don’t sacrifice your software architecture.

-> Don’t use it if you don’t know how to.

-> Don’t go into premature optimization.

nativeDev_x64( real-world_tips )

-> Don’t sacrifice your software architecture.

-> Don’t use it if you don’t know how to.

-> Don’t go into premature optimization.

-> Do it at lower levels and then hide it.

nativeDev_x64( real-world_tips )

-> Don’t sacrifice your software architecture.

-> Don’t use it if you don’t know how to.

-> Don’t go into premature optimization.

-> Do it at lower levels and then hide it.

-> Trust your compiler to help you do the job.

codeAnalysis_and_DebugTools()

codeAnalysis_and_DebugTools()

-> Your IDE : LEARN to fu**** use it!

codeAnalysis_and_DebugTools()

-> Your IDE : LEARN to fu**** use it!

-> Conditional break points, call-stack

codeAnalysis_and_DebugTools()

-> Your IDE : LEARN to fu**** use it!

-> Conditional break points, call-stack

-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)

codeAnalysis_and_DebugTools()

-> Your IDE : LEARN to fu**** use it!

-> Conditional break points, call-stack

-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)

-> State-of-the-art tool: PVS-Studio (VS 05,08,10)

codeAnalysis_and_DebugTools()

-> Your IDE : LEARN to fu**** use it!

-> Conditional break points, call-stack

-> Free tool : CppCheck (CmdLine, Eclipse, CodeBlocks, …)

-> State-of-the-art tool: PVS-Studio (VS 05,08,10)

-> Do pair programming and peer-review if possible

costProspectionOn_x64Dev()

costProspectionOn_x64Dev()

-> Hardware & Software (IDE + Plugins + Tools + Libs)

costProspectionOn_x64Dev()

-> Hardware & Software (IDE + Plugins + Tools + Libs)

-> You’ll need to teach the developers (theory & practice)

costProspectionOn_x64Dev()

-> Hardware & Software (IDE + Plugins + Tools + Libs)

-> You’ll need to teach the developers (theory & practice)

-> A port takes time, adds bugs, and it’s not creative

costProspectionOn_x64Dev()

-> Hardware & Software (IDE + Plugins + Tools + Libs)

-> You’ll need to teach the developers (theory & practice)

-> A port takes time, adds bugs, and it’s not creative

-> … plus you’ll probably have to maintain two code paths

costProspectionOn_x64Dev()

-> Hardware & Software (IDE + Plugins + Tools + Libs)

-> You’ll need to teach the developers (theory & practice)

-> A port takes time, adds bugs, and it’s not creative

-> … plus you’ll probably have to maintain two code paths

-> Full implementation adds creativity, but takes much more time and will add many more bugs.

Lets gostate-of-the-art!

Questions?