MKL for MIC training - 03192012 - The Linux Foundation
Transcript of MKL for MIC training - 03192012 - The Linux Foundation
Intel® Math Kernel Library Perspectives and Latest Advances
Noah Clemons Lead Technical Consulting Engineer Developer Products Division, Intel
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Where your workload, Intel ® Integrated Performance Primitives (Intel® IPP), and Intel® Math Kernel Library (Intel® MKL) Intersect
• Library will generate hand-tuned Intel AVX
code
• Intel Performance Libraries work with any compiler
• Functions for common algorithms
• Thread-safe functions
• no blocking calls
• Multiple OS support
• Minimal OS overhead
• User-replaceable memory allocation mechanism
MKL & IPP
Compiler
SIMD
CPU Harness architecture features
with performance libraries tailored for wide variety of
computational domains
After Compiler and Threading Libraries, what’s next? Intel® Performance Libraries
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Agenda
• What is Intel® Math Kernel Library (Intel® MKL)?
• Intel® MKL supports Intel® Xeon Phi™ Coprocessor
• Intel® MKL usage models on Intel® Xeon Phi™
– Automatic Offload
– Compiler Assisted Offload
– Native Execution
• Performance roadmap
• Conditional Numerical Reproducibility
• Where to find more information?
3
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linear Algebra
•BLAS
•LAPACK
•Sparse solvers
•ScaLAPACK
Fast Fourier Transforms
•Multidimensional (up to 7D)
•FFTW interfaces
•Cluster FFT
Vector Math
•Trigonometric
•Hyperbolic
•Exponential, Logarithmic
•Power / Root
•Rounding
Vector Random Number
Generators
•Congruential
•Recursive
•Wichmann-Hill
•Mersenne Twister
•Sobol
•Neiderreiter
•Non-deterministic
Summary Statistics
•Kurtosis
•Variation coefficient
•Quantiles, order statistics
•Min/max
•Variance-covariance
•…
Data Fitting
•Splines
•Interpolation
•Cell search
4
Intel® MKL is industry’s leading math library*
Many-core Multicore
Multicore CPU
Multicore CPU
Intel® MIC co-processor
Source
Clusters with Multicore and Many-core
… …
Multicore Cluster
Clusters
Intel® MKL
*2012 Evans Data N. American developer survey
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® MKL Supports for Intel® Xeon Phi™ Coprocessor
• Intel® MKL 11.0 supports the Intel® Xeon Phi™ coprocessor.
• Heterogeneous computing
– Takes advantage of both multicore host and many-core coprocessors
• Optimized for wider (512-bit) SIMD instructions
• Flexible usage models:
– Automatic Offload: Offers transparent heterogeneous computing
– Compiler Assisted Offload: Allows fine offloading control
– Native execution: Use the coprocessors as independent nodes
5
Using Intel® MKL on Intel® Xeon Phi™ • Performance scales from multicore to many-cores
• Familiarity of architecture and programming models • Code re-use, Faster time-to-market
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Usage Models on Intel® Xeon Phi™ Coprocessor
6
• Automatic Offload – No code changes required
– Automatically uses both host and target
– Transparent data transfer and execution management
• Compiler Assisted Offload – Explicit controls of data transfer and remote execution using compiler
offload pragmas/directives
– Can be used together with Automatic Offload
• Native Execution – Uses the coprocessors as independent nodes
– Input data is copied to targets in advance
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Execution Models
7
Multicore Hosted
General purpose serial and parallel computing
Offload
Codes with highly- parallel phases
Many Core Hosted
Highly-parallel codes
Symmetric
Codes with balanced needs
Multicore (Intel® Xeon®)
Many-core (Intel® Xeon Phi™)
Multicore Centric Many-Core Centric
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Automatic Offload (AO)
• Offloading is automatic and transparent
• By default, Intel MKL decides:
– When to offload
– Work division between host and targets
• Users enjoy host and target parallelism
automatically
• Users can still control work division to fine tune
performance
8
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
How to Use Automatic Offload
• Using Automatic Offload is easy
• What if there doesn’t exist a coprocessor in the
system?
– Runs on the host as usual without any penalty
9
Call a function:
mkl_mic_enable()
or
Set an env variable:
MKL_MIC_ENABLE=1
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Automatic Offload Enabled Functions
• A selective set of MKL functions are AO enabled.
– Only functions with sufficient computation to offset data
transfer overhead are subject to AO
• In 11.0, AO enabled functions include:
– Level-3 BLAS: ?GEMM, ?TRSM, ?TRMM
– LAPACK 3 amigos: LU, QR, Cholesky
10
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Work Division Control in Automatic Offload
11
Examples Notes
mkl_mic_set_Workdivision( MKL_TARGET_MIC, 0, 0.5)
Offload 50% of computation only to the 1st card.
Examples Notes
MKL_MIC_0_WORKDIVISION=0.5 Offload 50% of computation only to the 1st card.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Tips for Using Automatic Offload
• AO works only when matrix sizes are large
– ?GEMM: Offloading only when M, N > 2048
– ?TRSM/TRMM: Offloading only when M, N > 3072
– Square matrices may give better performance
• Work division settings are just hints to MKL runtime
• How to disable AO after it is enabled?
– mkl_mic_disable( ), or
– mkl_mic_set_workdivision(MIC_TARGET_HOST, 0, 1.0), or
– MKL_HOST_WORKDIVISION=100
12
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Tips for Using Automatic Offload
• Threading control tips:
• Avoid using the OS core of the target, which is handling
data transfer and housekeeping tasks. Example (on a
60-core card):
MIC_KMP_AFFINITY=explicit,granularity=fine,proclist=[1-236:1]
• Also, prevent thread migration on the host:
KMP_AFFINITY=granularity=fine,compact,1,0
• Note: Different env-variables on host and coprocessor:
OMP_NUM_THREADS MIC_OMP_NUM_THREADS
KMP_AFFINITY MIC_KMP_AFFINITY
13
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Compiler Assisted Offload (CAO)
• Offloading is explicitly controlled by compiler pragmas or directives.
• All MKL functions can be offloaded in CAO.
– In comparison, only a subset of MKL is subject to AO.
• Can leverage the full potential of compiler’s offloading facility.
• More flexibility in data transfer and remote execution management.
– A big advantage is data persistence: Reusing transferred data for multiple operations
14
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
How to Use Compiler Assisted Offload
• The same way you would offload any function call to the coprocessor.
• An example in C:
15
#pragma offload target(mic) \
in(transa, transb, N, alpha, beta) \
in(A:length(matrix_elements)) \
in(B:length(matrix_elements)) \
in(C:length(matrix_elements)) \
out(C:length(matrix_elements) alloc_if(0))
{
sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N,
&beta, C, &N);
}
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
How to Use Compiler Assisted Offload
• An example in Fortran:
16
!DEC$ ATTRIBUTES OFFLOAD : TARGET( MIC ) :: SGEMM
!DEC$ OMP OFFLOAD TARGET( MIC ) &
!DEC$ IN( TRANSA, TRANSB, M, N, K, ALPHA, BETA, LDA, LDB, LDC ), &
!DEC$ IN( A: LENGTH( NCOLA * LDA )), &
!DEC$ IN( B: LENGTH( NCOLB * LDB )), &
!DEC$ INOUT( C: LENGTH( N * LDC ))
!$OMP PARALLEL SECTIONS
!$OMP SECTION
CALL SGEMM( TRANSA, TRANSB, M, N, K, ALPHA, &
A, LDA, B, LDB BETA, C, LDC )
!$OMP END PARALLEL SECTIONS
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Data Persistence with CAO
17
__declspec(target(mic)) static float *A, *B, *C, *C1;
// Transfer matrices A, B, and C to coprocessor and do not de-allocate matrices A and B
#pragma offload target(mic) \
in(transa, transb, M, N, K, alpha, beta, LDA, LDB, LDC) \
in(A:length(NCOLA * LDA) free_if(0)) \
in(B:length(NCOLB * LDB) free_if(0)) \
inout(C:length(N * LDC))
{
sgemm(&transa, &transb, &M, &N, &K, &alpha, A, &LDA, B, &LDB, &beta, C, &LDC);
}
// Transfer matrix C1 to coprocessor and reuse matrices A and B
#pragma offload target(mic) \
in(transa1, transb1, M, N, K, alpha1, beta1, LDA, LDB, LDC1) \
nocopy(A:length(NCOLA * LDA) alloc_if(0) free_if(0)) \
nocopy(B:length(NCOLB * LDB) alloc_if(0) free_if(0)) \
inout(C1:length(N * LDC1))
{
sgemm(&transa1, &transb1, &M, &N, &K, &alpha1, A, &LDA, B, &LDB, &beta1, C1, &LDC1);
}
// Deallocate A and B on the coprocessor
#pragma offload target(mic) \
nocopy(A:length(NCOLA * LDA) free_if(1)) \
nocopy(B:length(NCOLB * LDB) free_if(1)) \ { }
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Tips of Using Compiler Assisted Offload
• Use data persistence to avoid unnecessary data copying and
memory alloc/de-alloc
• Thread affinity: avoid using the OS core. Example for a 60-
core coprocessor:
MIC_KMP_AFFINITY=explicit,granularity=fine,proclist=[1-236:1]
• Use huge (2MB) pages for memory allocation in user code:
• MIC_USE_2MB_BUFFERS=64K
• The value of MIC_USE_2MB_BUFFERS is a threshold. E.g.,
allocations of 64K bytes or larger will use huge pages.
18
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Using AO and CAO in the Same Program
• Users can use AO for some MKL calls and use CAO
for others in the same program
– Only supported by Intel compilers
– Work division must be set explicitly for AO
– Otherwise, all MKL AO calls are executed on the host
19
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Native Execution
• Programs can be built to run only on the coprocessor by using
the –mmic build option.
• Tips of using MKL in native execution:
• Use all threads to get best performance (for 60-core coprocessor)
MIC_OMP_NUM_THREADS=240
• Thread affinity setting
KMP_AFFINITY=explicit,proclist=[1-240:1,0,241,242,243],granularity=fine
• Use huge pages for memory allocation (More details on the “Intel
Developer Zone”):
– The mmap( ) system call
– The open-source libhugetlbfs.so library: http://libhugetlbfs.sourceforge.net/
20
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Suggestions on Choosing Usage Models
• Choose native execution if
– Highly parallel code.
– Want to use coprocessors as independent compute nodes.
• Choose AO when
– A sufficient Byte/FLOP ratio makes offload beneficial.
– Level-3 BLAS functions: ?GEMM, ?TRMM, ?TRSM.
– LU and QR factorization (in upcoming release updates).
• Choose CAO when either
– There is enough computation to offset data transfer overhead.
– Transferred data can be reused by multiple operations
• You can always run on the host if offloading does not achieve better performance
21
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linking Examples
• AO: The same way of building code on Xeon!
icc –O3 –mkl sgemm.c –o sgemm.exe
• Native: Using –mmic
icc –O3 –mmic -mkl sgemm.c –o sgemm.exe
• CAO: Using -offload-option
icc –O3 -openmp -mkl \
–offload-option,mic,ld, “-L$MKLROOT/lib/mic -Wl,\
--start-group -lmkl_intel_lp64 -lmkl_intel_thread \
-lmkl_core -Wl,--end-group” sgemm.c –o sgemm.exe
22
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® MKL Link Line Advisor
• A web tool to help users to choose correct link line options.
• http://software.intel.com/sites/products/mkl/
• Also available offline in the MKL product package.
23
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Performance Roadmap
• The following components are well optimized for Intel Xeon Phi coprocessors:
• BLAS Level 3, and much of Level 1 & 2
• Sparse BLAS: ?CSRMV, ?CSRMM
• LU, Cholesky, and QR factorization
• FFTs: 1D/2D/3D, SP and DP, r2c, c2c
• VML (real floating point functions)
• Random number generators:
– MT19937, MT2203, MRG32k3a
– Discrete Uniform and Geometric
24
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Where to Find More Code Examples
$MKLROOT/examples/mic_samples
– ao_sgemm AO example
– dexp VML example (vdExp)
– dgaussian double precision Gaussian RNG
– fft complex-to-complex 1D FFT
– sexp VML example (vsExp)
– sgaussian single precision Gaussian RNG
– sgemm SGEMM example
– sgemm_f SGEMM example(Fortran 90)
– sgemm_reuse SGEMM with data persistence
– sgeqrf QR factorization
– sgetrf LU factorization
– spotrf Cholesky
– solverc PARDISO examples
25
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Where to Find MKL Documentation?
• How to use MKL on Intel® Xeon Phi™ coprocessors:
• “Using Intel Math Kernel Library on Intel® Xeon Phi™
Coprocessors” section in the User’s Guide.
• “Support Functions for Intel Many Integrated Core
Architecture” section in the Reference Manual.
• Tips, app notes, etc. can also be found on the “Intel
Developer Zone”:
• http://www.intel.com/software/mic
• http://www.intel.com/software/mic-developer
26
Run-to-run Numerical Reproducibility with the Intel® Math Kernel Library and Intel® Composer XE 2013
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Agenda
• Why do floating point results vary?
• Reproducibility in the Intel compilers
• New reproducibility features in Intel® MKL
28
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Ever seen something like this?
29
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678902222
C:\Users\me>test.exe
4.012345678902222
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678902222
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
…or this on different processors?
Intel® Xeon® Processor E5540 Intel® Xeon® Processor E3-1275
30
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678901111
C:\Users\me>test.exe
4.012345678902222
C:\Users\me>test.exe
4.012345678902222
C:\Users\me>test.exe
4.012345678902222
C:\Users\me>test.exe
4.012345678902222
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Why do results vary?
Root cause for variations in results in Intel MKL
• floating-point numbers and rounding
• double precision example where (a+b)+c a+(b+c)
2-63 + 1 + -1 = 2-63 (mathematical result)
(2-63 + 1) + -1 0 (correct IEEE result)
2-63 + ( 1 + -1) 2-63 (correct IEEE result)
31
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Why might the order of operations change in a computer program
32
Many optimizations require a change in order of operations.
Optimizations
instruction sets memory alignment affects grouping of data in registers
multiple cores / multiple processors
most functions are threaded to use as many cores as will give good scalability
Non-deterministic task scheduling
some algorithms use asynchronous task scheduling for optimal performance
code path optimized to use all the processor features available on the system where the program is run
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Why are reproducible results important for Intel MKL users?
33
Technical / legacy Software correctness is determined by comparison to previous ‘gold’ results.
Debugging / porting When developing and debugging, a higher degree of run-to-run stability is required to find potential problems
Legal Accreditation or approval of software might require exact reproduction of previously defined results.
Customer perception Developers may understand the technical issues with reproducibility but still require reproducible results since end users or customers will be disconcerted by the inconsistencies.
Source: Email correspondence with Kai Diethelm of GNS. see his whitepaper: http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2012/0312/W_CS_TheLimitsofReproducibilityinNumericalSimulation.pdf
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
What are the ingredients for reproducibility
Source code
Tools
• Compilers
• Libraries
34
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Floating Point Semantics
The -fp-model (/fp:) compiler switch lets you choose the floating point semantics at a coarse granularity. It lets you specify the compiler rules for:
– Value safety (our main focus)
– FP expression evaluation
– FPU environment access
– Precise FP exceptions
– FP contractions (fused multiply-add)
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
The -fp-model & /fp: switches
• Recommendation for reproducibility: -fp-model precise
– for reproducible result and for ANSI/ IEEE standards compliance, C++ & Fortran
Value Description
fast[=1] (default)
allows value-unsafe optimizations
fast=2 allows additional optimizations
precise value-safe optimizations only
except enables floating point exception semantics
strict precise + except + disable fma + don’t assume default floating-point environment
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Reassociation -fp-model precise
• disables reassociation
• enforces C std conformance (left-to-right)
• may carry a significant performance penalty
Parentheses are respected only in value-safe mode!
#include <iostream>
#define N 100
int main() {
float a[N], b[N];
float c = -1., tiny = 1.e-20F;
for (int i=0; i<N; i++) a[i]=1.0;
for (int i=0; i<N; i++) {
a[i] = a[i] + c + tiny;
b[i] = 1/a[i];
}
std::cout << "a = " << a[0]
<< " b = " << b[0]
<< "\n";
}
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Run-to-Run Variations (single-threaded)
Data alignment may vary from run to run, due to changes in the external environment
• E.g. malloc of a string to contain date, time, user name or directory:
size of allocation affects alignment of subsequent malloc’s
• Compiler may “peel” scalar iterations off the start of the loop until subsequent memory accesses are aligned, so that the main loop kernel can be vectorized efficiently
• For reduction loops, this changes the composition of the partial sums, hence changes rounding and the final result
• Occurs for both gcc and icc, when compiling for Intel® AVX
To avoid, align data: _mm_malloc(size, 32) (icc only)
mkl_malloc(size, 32) (Intel MKL)
• or compile with –fp-model precise (icc) or without –ffast-math (larger performance impact)
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Typical Performance Impact
• SPECCPU2006fp benchmark suite compiled with -O2 or -O3
• Geomean performance reduction due to –fp-model precise and –fp-model source: 12% - 15% – Intel Compiler XE 2011 ( 12.0 )
– Measured on Intel Xeon® 5650 system with dual, 6-core processors at 2.67Ghz, 24GB memory, 12MB cache, SLES* 10 x64 SP2
• Performance impact can vary between applications
Use -fp-model precise to improve floating point reproducibility while limiting performance impact
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
How did Intel MKL handle reproducibility historically?
Through MKL 10.3 (Nov. 2011), the recommendation was to:
• Align your input/output arrays using the Intel MKL memory manager
• Call sequential Intel MKL
• This meant the user needed to handle threading themselves
40
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
New
!
•Align memory — try Intel MKL memory allocation functions
•64-byte alignment for processors in the next few years
Memory alignment
•Set the number of threads to a constant number
•Use sequential libraries
Number of threads
•Ensures that FP operations occur in order to ensure reproducible results
Deterministic task scheduling
•Maintains consistent code paths across processors
•Will often mean lower performance on the latest processors
Code path control
Balancing Reproducibility and Performance: Conditional Numerical Reproducibility (CNR)
41
Goal: Achieve best performance possible for cases that require reproducibility
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Controls for CNR features
42
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
• In Intel MKL 11.0 reproducibility is currently available under certain conditions: – Within single operating systems / architecture
– Reproducibility only applies within the blue boxes, not between them…
– Within a particular version of Intel MKL
– Results in version 11.0 update 1 may differ slightly from results in version 11.0
– Reproducibility controls in Intel MKL only affect Intel MKL functions; other tools, and other parts of your code can still cause variation
Why “Conditional”?
Linux*
IA32
Intel® 64
Windows*
IA32
Intel® 64
Mac OS X
IA32
Intel® 64
43
Send us your feedback!
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
CNR Impact on Performance of Intel® Optimized LINPACK Benchmark
44
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
What’s next?
45
https://softwareproductsurvey.intel.com/survey/150072/1afd/
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Further resources on reproducibility
Reference manuals, User Guides, Getting Started Guides…
• Intel® MKL Documentation
• Intel® Fortran Composer XE 2013 Documentation
• Intel® C++ Composer XE 2013 Documentation
Knowledgebase:
• CNR in Intel MKL 11.0, Consistency of Floating-Point Results
Support
• Intel MKL user forum
• Intel compiler forums [IVF, Fortran , and C++]
• Intel Premier support
Feedback
• Survey: https://softwareproductsurvey.intel.com/survey/150072/1afd/
46
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Summary
When writing programs for reproducible results: • Write your source code with reproducibility in mind • Use “–fp-model precise” with Intel compilers • Use the new Conditional Numerical Reproducibility (CNR)
features in Intel MKL
Evaluate CNR in the following: Intel® Math Kernel Library 11.0
Intel® Composer XE 2013 Intel® Parallel Studio XE 2013 Intel® Cluster Studio XE 2013
Provide feedback:
https://softwareproductsurvey.intel.com/survey/150072/1afd/
47
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
49
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization Notice
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
50
50