libperf

1
libperf libperf provides a tracing interface into the Linux Kernel Performance Counters (LKPC) subsystem recently introduced into the Linux Kernel mainline. This interface provides a unified API abstracting hardware-based performance counters, kernel trace points, and software-defined trace points. The counters are maintained by the kernel and maintain statistics per thread and per core. All counters are “virtual” 64-bit integers and are accessed via special file descriptors obtained from the kernel within libperf. Features and Highlights • System Call Wrapper Library • First API for LKPC • First User Space Library Interfacing with LKPC • Simple C API – 2 Calls Required by Default • Efficient Kernel Implementation • Low Overhead • Feasible for Dynamic Feedback • Preparing for Open Source GPLv2 Release Code Example /* start of tracing */ struct perf_data* pd = libperf_initialize(-1,- 1); /* do work */ libperf_finalize(pd, UUID); /* end of tracing */ Performance Overhead • Evaluated Using sysbench • 10 Runs Averaged on an Intel Centrino 2 • Overhead Significant for Threading (Context Switching) Worst Case: 3.63 % • Average Case: 3.25 % • Best Case: 2.87 % LightSpeed: Thread Scheduling for Multiple Cores Karl Naden ([email protected]) Wolfgang Richter ([email protected]) Ekaterina Taralova ([email protected]) Introduction Parallel applications have a hard time taking advantage of specifics of hardware. Operating Systems have greater knowledge of the hardware, but lose application-specific data. Solutions cutting across the stack from software to hardware may offer compelling paths in the future. Approach Provide the application layer more control over scheduling tasks and provide detailed information about hardware performance to make informed decisions based on application knowledge. Questions: 1. How could statistics about the underlying architecture’s performance be delivered efficiently to applications? 2. How could applications take advantage of this additional information? Target Workload Overview • Machine Learning parallel algorithm framework • Tailored to iterative algorithms on graph data structures GraphLab Key Components and Inputs Why GraphLab? • Existing parallel scheduling problem • Specific problem formulation (graphs) • Significant variation in algorithms gives potential for generality References • Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin. GraphLab: A New Framework for Parallel Machine Learning. Conference on Uncertainty in Artificial Intelligence (UAI), 2010 Schedulin g Consistency Model Data Graph Update Functions Shared Data Table

description

LightSpeed : Thread Scheduling for Multiple Cores. Data Graph. Update Functions. Scheduling. Karl Naden ( [email protected] ) Wolfgang Richter ( [email protected] ) Ekaterina Taralova ( [email protected] ). Introduction - PowerPoint PPT Presentation

Transcript of libperf

Page 1: libperf

libperf

libperf provides a tracing interface into the Linux Kernel Performance Counters (LKPC) subsystem recently introduced into the Linux Kernel mainline. This interface provides a unified API abstracting hardware-based performance counters, kernel trace points, and software-defined trace points. The counters are maintained by the kernel and maintain statistics per thread and per core. All counters are “virtual” 64-bit integers and are accessed via special file descriptors obtained from the kernel within libperf.

Features and Highlights

• System Call Wrapper Library• First API for LKPC• First User Space Library Interfacing with LKPC• Simple C API – 2 Calls Required by Default• Efficient Kernel Implementation• Low Overhead• Feasible for Dynamic Feedback• Preparing for Open Source GPLv2 Release

Code Example

… /* start of tracing */struct perf_data* pd = libperf_initialize(-1,-1);

… /* do work */libperf_finalize(pd, UUID);… /* end of tracing */

Performance Overhead

• Evaluated Using sysbench• 10 Runs Averaged on an Intel Centrino 2• Overhead Significant for Threading (Context Switching)

• Worst Case: 3.63 %• Average Case: 3.25 %• Best Case: 2.87 %

LightSpeed: Thread Scheduling for Multiple CoresKarl Naden ([email protected]) Wolfgang Richter ([email protected]) Ekaterina Taralova ([email protected])

Introduction

Parallel applications have a hard time taking advantage of specifics of hardware. Operating Systems have greater knowledge of the hardware, but lose application-specific data. Solutions cutting across the stack from software to hardware may offer compelling paths in the future.

Approach

Provide the application layer more control over scheduling tasks and provide detailed information about hardware performance to make informed decisions based on application knowledge.

Questions:

1. How could statistics about the underlying architecture’s performance be delivered efficiently to applications?

2. How could applications take advantage of this additional information?

Target Workload

Overview

• Machine Learning parallel algorithm framework• Tailored to iterative algorithms on graph data structures

GraphLab Key Components and Inputs

Why GraphLab?

• Existing parallel scheduling problem• Specific problem formulation (graphs)• Significant variation in algorithms gives potential for generality

References

• Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin. GraphLab: A New Framework for Parallel Machine Learning. Conference on Uncertainty in Artificial Intelligence (UAI), 2010

Scheduling

Consistency Model

Data Graph Update Functions

Shared Data Table