Software Making Cloud Services Soar › content › dam › develop › ... · • Running...

4
Cloud service providers (CSPs) are a diverse group―ranging from those who deliver infrastructure-as-a-service and platform-as-a-service to social media companies to software-as-a-service companies. One thing CSPs have in common is the need to innovate to become more efficient, drive new revenue streams, and retain and grow market share. Intel is committed to helping CSP meet their goals, with Intel® Software Development Tools and third-party solutions based on Intel® architecture providing a solid foundation to power profitable next-generation cloud services. One of these tools is Intel® VTune™ Amplifier, a performance profiler that makes it easier to: Create faster code with accurate data and low overhead Get more data with CPU, GPU, FPU, threading, memory, and more Get fast answers with easy analysis that turns data into insights CSP Workloads Optimized performance helps CSPs generate higher revenue. For example, if a major social media company can serve more targeted ads faster, the ads will get more views and clicks. CSPs have teams of engineers dedicated to improving performance through software changes and hardware upgrades. Intel works with these teams to help them target their software improvements for maximum effect and ensure that hardware upgrades provide the expected gains. Intel has seen CSPs develop their services in a variety of programming languages, adapting them to their cloud environments by integrating new technologies such as containers. There are a few common trends: Running Java* and native applications in containers Running low-privilege Java microservices Backend processes in C++ with a Python* frontend Increasing usage of the Go* language Usage of just-in-time (JIT) engines (e.g., HHVM* and Node.js*) for Web applications High-Performance Computing Cloud Services How Leading Cloud Service Providers Are Boosting Performance and Growing Market Share with the Advanced Profiling Capabilities of Intel® VTune™ Amplifier Making Cloud Services Soar CASE STUDY Software

Transcript of Software Making Cloud Services Soar › content › dam › develop › ... · • Running...

Page 1: Software Making Cloud Services Soar › content › dam › develop › ... · • Running low-privilege Java microservices • Backend processes in C++ with a Python* frontend •

Cloud service providers (CSPs) are a diverse group―ranging from those who deliver infrastructure-as-a-service and platform-as-a-service to social media companies to software-as-a-service companies. One thing CSPs have in common is the need to innovate to become more efficient, drive new revenue streams, and retain and grow market share.

Intel is committed to helping CSP meet their goals, with Intel® Software Development Tools and third-party solutions based on Intel® architecture providing a solid foundation to power profitable next-generation cloud services. One of these tools is Intel® VTune™ Amplifier, a performance profiler that makes it easier to:

• Create faster code with accurate data and low overhead

• Get more data with CPU, GPU, FPU, threading, memory, and more

• Get fast answers with easy analysis that turns data into insights

CSP WorkloadsOptimized performance helps CSPs generate higher revenue. For example, if a major social media company can serve more targeted ads faster, the ads will get more views and clicks. CSPs have teams of engineers dedicated to improving performance through software changes and hardware upgrades. Intel works with these teams to help them target their software improvements for maximum effect and ensure that hardware upgrades provide the expected gains.

Intel has seen CSPs develop their services in a variety of programming languages, adapting them to their cloud environments by integrating new technologies such as containers.

There are a few common trends:

• Running Java* and native applications in containers

• Running low-privilege Java microservices

• Backend processes in C++ with a Python* frontend

• Increasing usage of the Go* language

• Usage of just-in-time (JIT) engines (e.g., HHVM* and Node.js*) for Web applications

High-Performance ComputingCloud Services

How Leading Cloud Service Providers Are Boosting Performance and Growing Market Share with the Advanced Profiling Capabilities of Intel® VTune™ Amplifier

Making Cloud Services Soar

Case study

Software

Page 2: Software Making Cloud Services Soar › content › dam › develop › ... · • Running low-privilege Java microservices • Backend processes in C++ with a Python* frontend •

Performance ProfilingCSPs use performance tools to accomplish three main tasks:

1. Identify different types of performance characteristics of current applications (such as in the CPU, memory, I/O, and communications bandwidth)

2. Optimize the applications to get the most out of their hardware

3. Pair the performance issues with suggestions for making the right hardware upgrade(s)

Intel VTune Amplifier One key performance tool for CSPs is Intel VTune Amplifier, a low-overhead performance analysis tool that helps developers create faster code by identifying performance bottlenecks in applications written in languages such as C, C++, Java, .NET, Python, or Go.

The tool provides different types of analyses to highlight different issues, including:

• Pinpointing where in the source code the application is spending a lot of time, including the call tree which led to that point

• Identifying threading inefficiency (thread imbalance and lock contention)

• Improving low floating-point operations per second (FLOPS) and floating point unit (FPU) utilization

• Finding issues causing bottlenecks at the architecture level—such as memory access, cache misses, bandwidth, non-uniform memory access (NUMA) latency, and I/O waits

Containers offer a consistent runtime environment with low overhead. Intel VTune Amplifier can profile Java and native applications running in Docker* and LXC*―the most popular container technologies―from inside and outside the container, with no additional setup. Intel has also recently added new profiling capabilities that characterize application utilization of storage and network resources.

Let’s look at a few examples of how CSPs gain substantial benefits from the advanced profiling capabilities of Intel VTune Amplifier.

Case Study 1: Optimizing Performance for a Web Services CompanyThe Intel team worked with a major Web services and content company to tune three workloads:

1. C++-based ad exchange services

2. Java-based ad exchange services

3. Web search frontend

C++-Based Ad Exchange Services

The ad exchange manages the buying and selling of ads between the publisher and advertisers. Better performance of the ad exchange results in faster and better-targeted ads served.

After the company upgraded the servers running their platform to a new architecture, they found that the C++

services saw higher latency and lower queries per second (QPS) than expected. These services suffered from poor scaling, with roughly a third of available threads executing simultaneously, and had a low amount of total CPU time consumed by the application, compared to a high amount consumed by kernel consumption. Using Intel VTune Amplifier, they found that most of the CPU time was spent on spin locks and flushing the transaction lookaside buffer (TLB). Solving these issues improved concurrency and scaling on their new hardware, roughly doubling the performance of their statistical analysis algorithm.

Java-Based Ad Exchange Services

The company’s primary ad exchange services are Java-based. Intel VTune Amplifer was able to identify NUMA issues in these services, leading to the disabling of transparent, huge pages on the Linux* kernel. This resulted in a 25% performance increase.

Web Search Frontend

The company suffered a drop of over 50% in performance after switching from a PHP- to a Node.js-based frontend for their Web search. They believed the root cause was related to garbage collection, and they proved it using the hotspot collection capability of Intel VTune Amplifier. The company got their Node.js performance back on track and made Intel VTune Amplifier their preferred tool to identify these types of performance issues.

Case Study 2: Delivering Faster Results for a Web and Mobile ApplicationAnother company—a major Web and mobile application provider—came to Intel with a goal of comparing public cloud performance to an optimized private data center. They were interested in three workloads:

1. Go* language based ad exchange

2. Image processing

3. Search

Go* Language-Based Ad Exchange

The company uses an ad exchange, but Go-based services process their ads. Intel VTune Amplifier reported problems with a full transaction lookaside buffer (TLB) and spikes in CPU usage caused by the Go garbage collector.

Based on these reports, Intel made three recommendations to improve performance:

1. Use LockOSThread in runtime to avoid thread migrations.

2. Increase page sizes or set hugepages to improve TLB performance.

3. Increase the garbage collection target percentage (GOGC) from the default of 100 to 300, to reduce the garbage collection frequency.

Case Study | Making Cloud Services Soar

2

Page 3: Software Making Cloud Services Soar › content › dam › develop › ... · • Running low-privilege Java microservices • Backend processes in C++ with a Python* frontend •

Case Study | Making Cloud Services Soar

Image Processing

The company helps users share visual assets and browse what other users have posted. It makes recommendations using complex routines written in C++ running on a NUMA architecture. Intel VTune Amplifier uncovered issues with thread migration and also found that the compiler (g++ 5.4) created a hashtable index that was kept on the stack instead of a register, causing a slowdown in the frequent access of that index.

Search

Users search for text and images via routines written in C++, and these routines had the same NUMA problems as the ad exchange and image processing services, and a high amount of branch misprediction. Socket pinning addressed the NUMA issues, and profile-guided optimization solved the branch mispredictions.

Maximizing Performance on Modern ProcessorsPerformance on modern processors requires much more than optimizing single thread performance. High-performing code must be:

• Threaded and scalable to utilize multiple CPUs

• Vectorized for efficient use of multiple FPUs

• Tuned to take advantage of non-uniform memory architectures and caches

Intel VTune Amplifier gives some of today’s largest CSPs all these advanced profiling capabilities with a single, friendly analysis interface. The result? New performance that helps them innovate to become more efficient, drive new revenue streams, and retain and grow their market share.

Intel VTune Amplifier in Action in Other Industries

“Intel VTune Amplifier helped us identify system performance issues by analyzing and displaying CPU utilization, cache misses, and more across multiple processors and server products. We optimized our product based on this analysis, improving overall performance by 12% on key workloads. Intel VTune Amplifier was invaluable in quickly identifying the optimization opportunities in our product.”

Charles LiuSolution Integration, Server Product Business Unit

Acer Inc.

“We achieved a significant improvement (almost 2x) even on one core by optimizing the code based on the information provided by Intel VTune Amplifier. Good scalability is a result of usage of combination of Intel® Threading Building Blocks and OpenMP parallelization techniques. We achieved over 8x the performance of the previous version on 8 cores and almost 11x the performance on 16 cores.”

Alexey AndrianovR&D Director Deputy, Mechanical Analysis Division,

Mentor Graphics Corporation

“Intel VTune Amplifier analyzes complex code and helps us identify bottlenecks rapidly. By using it and other Intel® Software Development Tools, we were able to improve PIPESIM performance up to 10x compared with the previous software version.”

Rodney LessardSenior Scientist

Schlumberger

3

Page 4: Software Making Cloud Services Soar › content › dam › develop › ... · • Running low-privilege Java microservices • Backend processes in C++ with a Python* frontend •

Case Study | Making Cloud Services Soar

Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as “Spectre” and “Meltdown”. Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

For more information regarding performance and optimization choices in Intel® Software Development Products, see our Optimization Notice: https://software.intel.com/articles/optimization-notice#opt

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2018 Intel Corporation Printed in USA 219/SS Please Recycle

Learn More• Intel VTune Amplifier >

• More Success Stories >

Performance Analysis Cookbooks• Profiling a Java* Application in a Singularity* Container >

• Profiling a Java* Application in a Docker* Container >

• Profiling PHP Code Running with HHVM* >

• Profiling JavaScript* Code in Node.js* >

• Profiling a .NET* Core Application >

4

Software