Transparent heterogeneous hardware Architecture …tango-project.eu › sites › default › files...

D3.1. TANGO Toolbox - Alpha version Version: v1.3 – Final, Date: 23/12/2016

TANGO Consortium 2016

of 39

Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

D3.1 TANGO Toolbox

Alpha version (year-1)

Lead Editor Jean-Christophe DEPREZ (CETIC)

Authors Jean-Christophe DEPREZ (CETIC), Lotfi Guedria (CETIC), Renaud De Landtsheer (CETIC), David Garcia Perez (ATOS), Roi Sucasas Font (ATOS), Richard Kavanagh (ULE), Jorge Ejarque (BSC), Yiannis Georgiou (BULL)

Version 1.3

Reviewers Bruno Wery (DELTATEC), Yiannis Georgiou (BULL)

Work package WP 3

Due date 31/12/2016

Submission date 23/12/2016

Distribution level (CO, PU): PU – Software (and associated report)



of 39

Abstract This document provides the installation manuals of the various software packages found in the TANGO architecture. The results achieved at the end of Year-1 makes it possible to use development tools to shape application code and then to run application on an heterogeneous hardware obtain time and energy consumption profile can be collected in order for developers to establish the desired benchmarks and to assist them in making requirements, design and coding decisions.

Keywords TANGO, software, framework, installation, manual

Licensing information: Each component is delivered under its own open source license specified in each code file headers.

This report is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

http://creativecommons.org/licenses/by-sa/3.0/

Document Description

Document Revision History

Version Date

Modifications Introduced

Description of change Modified by

v0.1 2016/11/25 First draft version Jean-Christophe

DEPREZ (CETIC)

v0.2

2016/12/12 Initial integration of Programming model, Code Optimiser, Energy Modeller, Energy probes, Application Life-cycle Deployment Engine

ULE, Atos, BSC

V0.3 2016/12/13 Introduction, Conclusion, Executive

Summary Jean-Christophe DEPREZ (CETIC)

V0.4 2016/12/15 Integrate Design-Time Characteriser

(Poroto) Jean-Christophe DEPREZ and Lotfi GUEDRIA (CETIC),

V1.0 2016/12/16 Integrate SLURM input from Yiannis and finalise layout and ToC to

Jean-Christophe DEPREZ (CETIC) and Yiannis Georgiou (BULL)

V1.1 2016/12/18 Initial set of basic review comments from Bruno handled

Jean-Christophe DEPREZ (CETIC)

V1.2 2016/12/20 Updates by partners to address Bruno’s review comments

CETIC, BSC, ULE, ATOS, BULL

V1.3 2016/12/20 Updates to address Yiannis’s review comments

Jean-Christophe DEPREZ (CETIC)

http://creativecommons.org/licenses/by-sa/3.0/



of 39

Table of Contents

Table of Contents .......................................................................................................................... 3

Table of Figures ............................................................................................................................. 5

Terms and abbreviations ............................................................................................................... 6

Executive Summary ....................................................................................................................... 7

1 Introduction .......................................................................................................................... 8

1.1 About this deliverable ................................................................................................... 8

1.2 Document structure ...................................................................................................... 8

2 Installation and Configuration Guide for development tools ............................................... 9

2.1 Installation and Configuration of Requirements and Modelling Tooling .................... 10

2.1.1 Installation and Configuration of the Design Time Characteriser ....................... 10

2.1.1.1 Overview and context ..................................................................................... 10

2.1.1.2 Installation ....................................................................................................... 11

2.1.1.3 Example of tool usage ..................................................................................... 12

2.1.2 Installation and Configuration of the Design and Development Time Optimiser 14

2.1.2.1 System requirements and Software Dependencies ........................................ 14

2.1.2.2 Installing Placer (Year-1 Implementation of the Design-time Optimiser) ....... 14

2.1.2.3 Running Placer ................................................................................................. 14

2.2 Installation and Configuration of Programming Model Tooling ................................. 15

2.2.1 System Requirements and Software Dependencies: .......................................... 15

Common: ......................................................................................................................... 15

--with-monitor option: .................................................................................................... 16

--with-tracing option ....................................................................................................... 16

2.2.2 Installation Instructions ....................................................................................... 16

2.2.3 Application Development Overview ................................................................... 16

2.2.4 Application Compilation ...................................................................................... 17

2.2.5 Application Execution .......................................................................................... 17

2.2.6 Known Limitations ............................................................................................... 18

2.3 Installation and Configuration of Code Optimiser Tooling ......................................... 18

2.3.1 Platforms Supported ........................................................................................... 18

2.3.2 Software Pre-requisites and Dependencies ........................................................ 19


2.3.4 Configuration ....................................................................................................... 19

3 Installation and Configuration Guide for runtime software packages ................................ 20



of 39

3.1 Installation and Configuration of Extra Energy Probes ............................................... 21

3.1.1 Nvidia GPUs ......................................................................................................... 21

3.1.1.1 Supported OS platforms and products............................................................ 21

3.1.1.2 Example: .......................................................................................................... 22

3.1.2 NVIDIA Plugins ..................................................................................................... 22

3.1.2.1 Collectd ............................................................................................................ 22

3.1.2.2 Slurm ............................................................................................................... 22

3.1.2.3 Installation and configuration ......................................................................... 24

3.2 Installation and Configuration of SLURM .................................................................... 24



3.2.3 Installation instructions ....................................................................................... 24

3.2.4 Slurm accounting and profiling framework......................................................... 25

3.2.5 SLURM Key Functions .......................................................................................... 27

3.2.6 SLURM Components ............................................................................................ 27

3.2.6.1 SLURMCTLD ..................................................................................................... 28

3.2.6.2 SLURMD ........................................................................................................... 29

3.2.6.3 SlurmDBD (SLURM Database Daemon) ........................................................... 29

3.3 Installation and Configuration of Energy Modeller ..................................................... 30

3.3.1 Minimal System Requirements ........................................................................... 30




3.3.5 Using the standalone calibrator .......................................................................... 34

3.3.5.1 Apps.csv ........................................................................................................... 35

3.3.5.2 Using the Watt Meter Emulator ...................................................................... 35

3.3.5.3 watt-meter-emulator.properties .................................................................... 36

3.4 Installation and Configuration of Application Life-cycle Deployment Engine ............ 36

3.4.1 System Requirements ......................................................................................... 36

3.4.2 Installation and configuration ............................................................................. 36

3.4.3 API Documentation ............................................................................................. 38

4 Conclusions ......................................................................................................................... 39



of 39

Table of Figures

FIGURE 1: GENERAL TANGO ARCHITECTURE WITH DEVELOPMENT TOOLS IN RED BOXES. .............................. 9 FIGURE 2: SCREENSHOT OF EXAMPLE MODEL USING PLACER. .................................................................. 15 FIGURE 3: GENERAL TANGO ARCHITECTURE WITH OPERATION SOFTWARE COMPONENTS IN RED BOXES. ...... 20 FIGURE 4: SLURM SIMPLIFIED ARCHITECTURE. .................................................................................... 28



of 39

Terms and abbreviations

ALDE Application Lifecycle Deployment Engine

API Application Programming Interface

BMC Board-based Management Controller

BSD Berkley Software Distribution

COP Code Optimiser

(C)OMPSs (Cloud) Open MP Superscalar (from BSC)

CPU Central Processing Unit

CUDA Compute Unified Device Architecture

FPGA Field Programmable Gate Array

GID Group Identification (in *nix OS)

GPU Graphical Processing Unit

DTC Design-Time Characteriser

DTO Design-Time Optimiser

EC European Commission

IDE Integrated Development Environment

IPMI Intelligent Platform Management Interface

PCIe Peripheral Component Interconnect express

RAPL Running Average Power Limit

REST Representational State Transfer

RIFFA Reusable Integration Framework for FPGA Accelerators

ROCCC Riverside Optimizing Compiler for Configurable Circuits

SLURM Simple Linux Utility for Resource Management

UID User Identification (*nix OS)

VHDL Very High-level Design Language



of 39

Executive Summary

This document (D3.1) accompanies the software release of the TANGO project at the end of Year 1 under work package 3. It presents the installation manuals of the different software components found in the overall TANGO architecture. First, it presents the installation of development tools and then of the software to install on the operational/runtime infrastructure of heterogeneous hardware.

At this stage, not all components have started to be implemented. However, the available component implementation provides the necessary and sufficient basis to profile the time and energy performance of an application so as to define the desired benchmarks and help developers make decisions on requirements, design, and coding.

This document remains focused on the technical details of installing the TANGO components. The scientific contributions achieved at the end of Year 1 relying on the various TANGO software components are presented in Deliverable D3.2.

In the following years, work packages 4 in year 2 and then work package 5 during year 3 will follow the scientific and technical effort. While Year-1 focused on providing the necessary software to perform profiling of time and energy performance to obtain static benchmarking information on applications run on a heterogeneous hardware architecture, the effort in Year-2 will further integrate the TANGO component to make it possible to explore trade-off on additional non-functional behaviours such as security, robustness and maintainability. Finally, during year 3, the focus will be on enhancing programmer productivity.



of 39

1 Introduction

TANGO develops software to facilitate exploiting heterogeneous hardware architectures in order to develop applications that meet time and energy performance while also satisfying security and dependability requirements. To achieve this goal, WP3, which runs during the first year of the TANGO project proposes to develop different tools. Although these tools have not reached their final implementation state, this deliverable presents their initial installation guides.

1.1 About this deliverable

This document presents information related to the software packages delivered as part of deliverable D3.1 of the TANGO project. It is an accompanying document that describes the installation of the different software tools used at development time by development teams and of the different software packages to install on the operational infrastructure to run and operate applications.

Next to this technical installation guide, deliverable D3.2 presents the scientific report on these different software tools and packages as well as an initial set of benchmarks obtained on the actual two TANGO testbeds.

1.2 Document structure

The first part of this document presents installation manual of software tools for the development team while the second part describes how to install the various software packages to obtain an operational TANGO infrastructure.



of 39

2 Installation and Configuration Guide for development tools

From its general architecture in D2.1 recalled in Figure 1, the TANGO framework contains the development tools highlighted in red boxes:

Requirements and Modelling tools to model, characterise and optimise the granularity and the placement of software components on an heterogeneous infrastructure at design (and deployment) time

Programming Model plugins to facilitate specifying OMPSs and COMPSs tasks in the application source code

Code optimiser (for Energy consumption) to assist developers with profiling energy consumption at the source code level

Device Emulator, which can potentially be used at development or at deployment time to obtain time and energy performance data without actually running application code on an real infrastructure

Figure 1: General TANGO Architecture with development tools in Red boxes.



of 39

In Year-1, an initial implementation is provided for the Requirements and Modelling tools, the Programming Model plugins and the Code Optimiser. Each development tool has its installation manual presented in subsections below. The device emulator will only start being implemented in year-2.

2.1 Installation and Configuration of Requirements and Modelling Tooling

The Requirements and Design Modelling toolbox is composed of the following sub-components:

A design time characteriser (DTC) for application time and energy performance on FPGA

A design and deployment time optimiser (DTO) to optimise the placement and execution of tasks on an heterogeneous infrastructure

A graphical modelling plugin to facilitate specifying characterisation data on an application for the design and deployment time optimiser. As of Year-1, this graphical modelling component has not been implemented yet.

The Requirement and Modelling sub components, in particular, the design time characteriser and the design and deployment time optimiser work as independent tools. Once their implementation converges and their input/output relation is better understood, the graphical modelling plugin will provide a high level integration. As of year-1, the installation manuals of DTC and DTO are presented in their own independent sub-section below.

2.1.1 Installation and Configuration of the Design Time Characteriser

2.1.1.1 Overview and context

Currently, the proposed design time characterization process is leveraging the Poroto tool developed by CETIC (https://github.com/cetic/poroto). The core of this tool is licensed under the 3-clause BSD license.

The target hardware for Poroto is typically a PCIe enabled workstation hosting a FPGA-based accelerator board.

The tool enables the generation of an FPGA design implementing a computation that the user defines in his code. It assumes input code be provided as separate C files for execution on the CPU and the FPGA respectively. The C files targeting the FPGA are basically implementing functions that will be translated to VHDL, compiled, synthesised and programmed to the target FPGA accelerator board. The code on the CPU side can then be very easily adapted to substitute the call for the original function by calls to wrapper module, generated by the tool. The wrapper encapsulates calls to the FPGA board driver API that essentially implements data transfer through the PCIe interface.

Poroto tool was initially designed around proprietary driver from AlphaData. It was recently extended to support a more generic interface approach through the integration of the RIFFA framework. RIFFA (Reusable Integration Framework for FPGA Accelerators) is a simple framework developed at UC San Diego (http://riffa.ucsd.edu/) to support communicating data from a host CPU to a FPGA via a PCI Express bus. The integration of RIFFA framework, allows to address FPGA accelerators, regardless the family type of FPGA (Xilinx or Altera) provided that the PCIe IP is available for the target architecture. Poroto also includes native support for the GHDL (VHDL simulator) which greatly facilitates the tests and accelerates the validation of the selected FPGA computation by targeting a virtual FPGA before integrating it as a real hardware

https://github.com/cetic/poroto

http://riffa.ucsd.edu/



of 39

acceleration for a software application running on CPU. GHDL support within Poroto allows a fast evaluation and characterization process where compiling, testing and validating the selected computation is extremely optimised and automated.

Poroto is implemented in Python programming language and it realizes the following compilation steps for a rapid evaluation of computation kernels offloading on an FPGA board:

The source C code of the computation to be offloaded is parsed and adapted for being fed to the underlying ROCCC tool (compiler that generates VHDL from C). This step takes into account various configurations and constraints of ROCCC. In case the tool cannot infer some characteristics of the input code, low level pragmas are used to guide the code generation.

Automation of ROCCC compilation process

VHDL code for interfaces, memories and FPGA glue is generated: interfacing is leveraging either the generic RIFFA framework IP or the vendor-specific AlphaData logic.

Generation of test set-ups for the FPGA design and also for the CPU design that will exploit the offloaded computation

Automation of the compilation process for FPGA (Target dependent, in our case: Xilinx synthesis tools)

Generation of the code for CPU that interfaces with the FPGA implementation (sending the bit stream to program the FPGA, transferring the data, triggering execution of offloaded design and retrieve back the result data).

2.1.1.2 Installation

The Poroto tool is intended to be installed and run on the following platforms:

Recent Linux like distributions such as Debian Jessie (or later) or Ubuntu 14.04 (or higher)

Recent OS X based machine: Mac-OS Yosemite

2.1.1.2.1 Dependencies The Poroto tool has the following open-source dependencies:

The Python interpreter (version 2.7 or above)

PyCParser and PLY libraries (Included in Poroto)

ROCCC compiler (version 0.7.6)

GCC compiler

RIFFA Framework (Version 2.0 or above, Optional)

GHDL (Version 0.31 or above, Optional)

Besides leveraging the ROCCC compiler for the generation of VHDL code, the tool makes use of the proprietary components that are associated with the target platforms, like the AlphaData board (ADM-XRC-6T1) :

AlphaData VHDL Library, C SDK and Driver

Xilinx PlanAhead compilation suite

Xilinx IP Cores for the generation of memory blocks, FIFO, computational IP (integer multiplication and division, floating point support, ...)

The above tools, API are not packaged with the tool and should be accessed or acquired and installed separately by the user. The templates related to these proprietary tools are not



of 39

provided by default within Poroto but can be disclosed to interested parties who should have a similar platform and valid licenses for the associated proprietary tools

2.1.1.2.2 Instructions The tool does not require any installation steps. To launch the tool, the following two environment variables must be set:

ROCCC_ROOT : points to the top directory of the ROCCC tool

POROTO_ROOT : points to the top directory of the Poroto tool

In order to simplify the usage, a Makefile support file is included in Poroto distribution to set up the correct environment and select the right configuration parameters.

2.1.1.3 Example of tool usage

After installing the Poroto software and all its dependencies, one can use the simple demo provided to check the Toolchain. The demo is a simple code that calculate the sum of two vectors, element by element. The demo source code, available in the tool demo directory, is:

#pragma poroto memory test_A int 100 #pragma poroto memory test_B int 100 #pragma poroto memory test_Out int 100 #pragma poroto stream::roccc_bram_in VectorAdd::A(test_A, N) #pragma poroto stream::roccc_bram_in VectorAdd::B(test_B, N) #pragma poroto stream::roccc_bram_out VectorAdd::Out(test_Out, N) void VectorAdd(int N, int* A, int* B, int* Out) { int i; for(i = 0; i < N; ++i) { Out[i] = A[i] + B[i]; } }

In order to test the correctness of the generated code, we can specify test vectors (in a python file):

from poroto.test import TestVector test_vectors = { ’VectorAdd’: TestVector(1,{ ’N’: [12], ’A’: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]], ’B’: [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]], ’Out’: [[10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]], }), }

The demo uses a simple Makefile to invoke Poroto :

POROTO_ROOT=../.. FILES=vector_add.c include $(POROTO_ROOT)/poroto.mak



of 39

The tool can be invoked using the Makefile provided in the demo

make clean make gen make compile make run

The gen target will read the source code, apply specific code transformation and optimisation and then invoke the ROCCC tool for each module to be converted into VHDL. Next, it generates all the dependencies needed by the modules, like memory blocks, IP blocks, data streams and test benches. If the target is able to communicate to a host environment, the tool will also generate the required wrappers to invoke the FPGA from the host environment.

For the GHDL target, it is possible to compile the generated VHDL and perform a run of the test bench on a simulated environment, this is done using the compile and run targets.

To specify a hardware target, one must add either in the Makefile or on the command line the TARGET parameter, e.g. if we want to use the alphadata FPGA based accelerator board:

make TARGET=alphadata clean make TARGET=alphadata gen

The generated project is found under the project/ directory and can be imported as is in the FPGA backend tool for synthesis and implementation, for instance the Xilinx PlanAhead suite. Also, the wrapper and C testbench for the host CPU are generated. Here below the wrapper code:

#include <inttypes.h> #include "fpga.h" void VectorAdd(int N, int *A, int *B, int *Out) { fpga_write_vector(0, (N)*4, A); fpga_write_vector(1, (N)*4, B); pFpgaSpace[0x1] = (uint32_t)N; while (pFpgaSpace[0x2000] == 0) ; //Wait for resultReady fpga_read_vector(2, (N)*4, Out); }

The existing host code can simply invoke the transformed function without changing the rest of the code since the C wrapper keeps the same function signature (i.e. it implements a function with the same name and parameter but this time it trigger the offloaded part rather than executing within the CPU).

Poroto software comes with several other examples. Below a list of the ones available in the distribution:

simple_add : A simple adder block with no data streaming

vector_add : A simple vector addition

vector_add_ip : A simple vector addition using an external IP block to perform the operation

vector_add_float : A simple vector addition based on float elements

matrix_multiplication : A generic multiplication of integer matrices

buffer_sliding : A 3x3 moving window over a matrix



of 39

vector_avg : A n-element wide moving window over a vector

vector_add_reduce : An reduce operation performed on a vector using an add operator

The code generated by Poroto can be optimized further using dedicated pragmas. With these pragmas, the user can control the transformation path performed by ROCCC, like partial loop unrolling, arithmetic balancing, pipelining optimisation, etc., as well as the performance of the data streams generated by Poroto.

Furthermore, some other advanced code transformation can be applied like code or variable inlining, data bit-size customisation, loop fusion, etc.

2.1.2 Installation and Configuration of the Design and Development Time Optimiser

Placer is still a research prototype, as such, it has no front end, and the examples are hard coded into Scala structures that must be compiled prior to execution.

2.1.2.1 System requirements and Software Dependencies

The computer must have the following software packages installed:

Java 1.8

Scala 2.11

IntelliJ14 IDE with Scala plugin

2.1.2.2 Installing Placer (Year-1 Implementation of the Design-time Optimiser)

To install the delivered prototype of Placer, a single zip archive has been supplied in the TANGO Github repository: https://github.com/TANGO-Project/placer

The zip file contains an IntelliJ project, and the necessary jar files are included as libraries.

The project includes several example scripts that contain software and hardware models declared using the structures of Placer. These scripts end up with a call to the solver of Placer.

2.1.2.3 Running Placer

Placer is still in a prototype state. It comes as an intelliJ project. Once opened in IntelliJ, the user is presented with the project view of IntelliJ. The project contains two examples, names Example1 and Example2 as illustrated in Figure 2. Both are application and can be executed as is. The matching optimal placement result is displayed in the console that appears below in IntelliJ.

https://github.com/TANGO-Project/placer



of 39

Figure 2: Screenshot of example model using Placer.

2.2 Installation and Configuration of Programming Model Tooling

The TANGO Programming Model and Runtime Abstraction Layer is a combination of the BSC's COMPSs and OmpSs task-based programming models, where COMPSs is dealing with the coarse-grain tasks and platform-level heterogeneity and OmpSs is dealing with fine-grain tasks and node-level heterogeneity. The code can be found at https://github.com/TANGO-Project/compss-tango

2.2.1 System Requirements and Software Dependencies:

Common:

Supported platforms running Linux (i386, x86-64, ARM, PowerPC or IA64) Git client

bash and tcsh

Apache maven 3.0 or better

Java SDK 8.0 or better

GNU C/C++ compiler versions 4.4 or better

CNU GCC Fortran

autotools (libtool, automake, autoreconf, make)

boost-devel

python-devel 2.7 or better

GNU bison 2.4.1 or better.

GNU flex 2.5.4 or 2.5.33 or better. (Avoid versions 2.5.31 and 2.5.34 of flex as they are known to fail. Use 2.5.33 at least.)

GNU gperf 3.0.0 or better.

https://github.com/TANGO-Project/compss-tango

https://github.com/TANGO-Project/compss-tango



of 39

SQLite 3.6.16 or better.

--with-monitor option:

This option enables the runtime to generate and to install tools to visualize the execution monitoring information and the execution graph.

xdg-utils package graphviz package

--with-tracing option

This option enables the runtime to generate execution trace files, which can be open with the Paraver tool1

libxml2-devel 2.5.0 or better gcc-fortran

papi-devel (sugested)

2.2.2 Installation Instructions

To install the whole framework you just need to clone the general repository code and run the following commands

$ git clone https://github.com/TANGO-Project/general.git $ cd general/IntegratedDevelopmentEnvironment/ProgrammingModelRuntime/ $ ./install.sh <Installation_Prefix> [options] #Examples #User local installation $./install.sh $HOME/TANGO --no-monitor --no-tracing #System installation $ sudo -E ./install.sh /opt/TANGO

2.2.3 Application Development Overview

To develop an application with the TANGO programming model, developers has to at least implement 3 files: the application main workflow in appName.c/cc, the application functions which are going to be coarse-grain tasks in appName.idl, and the implementation of the functions in appName-functions.cc. Other application files can be included in a src folder providing the building configuration in a Makefile.

appName.c/cc : Contains the main coarse-grain task workflow

appName.idl : Contains the Coarse-grain task definition

appName-functions.c/cc : Implementation of the coarse grain tasks

To define a coarse-grain task which contains fine-grain tasks, developers have to annotate the coarse-grain functions with the OmpSs compiler directives (pragmas).

More information about how to define coarse-grain tasks and other concerns when implementing a coarse-grain task workflow can be found

1 https://tools.bsc.es/paraver



of 39

in http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Development.pdf

More information about how to define fine-grain tasks and other considerations when

implementing a fine-grain task workflow can be found in https://pm.bsc.es/ompss-docs/specs/

2.2.4 Application Compilation

Once the application has been implemented, developers have to use the buildapp command. Before, running the command user has to define a set of environment variables to indicate if the coarse-grain tasks contains OmpSs tasks, or OmpSs tasks with CUDA or OpenCL code. Following you can see an example of how to run this command.

$export WITH_OMPSS=1 #If there are coarse-grain tasks defined as a workflow of fine-grain task $export WITH_CUDA=1 #If there are fine-grain tasks defined for a cuda device $export WITH_OCL=1 #If there are fine-grain tasks defined for a OpenCL device $buildapp appName

2.2.5 Application Execution

An application implemented with the TANGO programming model can be easily executed by using the COMPSs execution scripts. It will automatically starts the Runtime Abstraction Layer and execute transparently either coarse-grain and fine-grain tasks in the selected resources.

Users can use the runcompss command to run the application in interactive nodes.

Usage: runcompss [options] application_name application_arguments

An example to run the application in the localhost which is interesting for initial debugging

$ runcompss --lang=c appName appArgs...

To run an application in a preconfigured grid of computers, the TANGO Programming model and Runtime environment must be installed in all the nodes. The executed application must also be deployed in all the nodes. Then, users have to provide the resource description in a resources.xml file and the application configuration in these resources in the project.xml. Information about how to define this files can be found in

http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Exec.pdf

$ runcompss --lang=c --project=/path/to/project.xml \ --resources=/path/to/resources.xml appName app_args

More information about other possible arguments can be found by executing

$ runcompss --help

http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Development.pdf

http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Development.pdf

https://pm.bsc.es/ompss-docs/specs/

http://compss.bsc.es/releases/compss/latest/docs/COMPSs_User_Manual_App_Exec.pdf



of 39

To queue an application in a cluster managed by the SLURM resource manager, users has to use the enqueue_compss command.

Usage: enqueue_compss [queue_system_options] [rucompss_options] application_name application_arguments

The following command show how to queue the application by requesting 3 nodes with at least 12 cores, 2 GPUs and 32GB of memory (approx.)

$ enqueue_compss --num_nodes=3 --tasks-per-node=12 --gpus-per-node=2 --node-memory=32000 --lang=c appName appArgs

Other options available for enqueue_compss can be found by executing the following command.

$ enqueue_compss --help

In the next release, we will also include how to build and deploy the application by using the Application Lifecycle Deployment Engine.

2.2.6 Known Limitations

Due to preliminary version of the TANGO programming model, we basically support as task normal static function in the appName-functions.c/cc. For other type of functions, there are several unsupported methods types which can make applications failing. The know issues are:

Objects as return type or defined as parameters with OUT direction

Methods called from an object

Methods with the static definition in the idl file

There is also an issue when deserializing big objects with the boost library.

We are working to solve these issues in further versions.

2.3 Installation and Configuration of Code Optimiser Tooling

The code optimizer is a standalone plugin to Eclipse that analyses Java programs for their active power consumption. Thus allowing users to understand how much power their code uses. The code can be found at https://github.com/TANGO-Project/code-optimiser-plugin

2.3.1 Platforms Supported

All Linux and Windows variants supported by the Eclipse IDE.

https://github.com/TANGO-Project/code-optimiser-plugin



of 39

2.3.2 Software Pre-requisites and Dependencies

To use the Code Optimizer Plug-in (COP), the following dependencies must be resolved:

Dependency Version Comment

JDK 1.7+ Provides the java runtime environment for executing the VMC.

Eclipse Platform 3.6+ Provide environment for executing Code Optimizer

JVM Monitor ¡Error! No se ncuentra el origen de la referencia.

3.8+ Library dependencies managed by Maven

Maven 3.0+ The build environment


After installation of a suitable Java JDK and Eclipse Platform as described above, the Code Optimizer plug-in and Eclipse site must be built using Maven:

Firstly, checkout the COP tool source code from the URL obtained from the Tango git repository:

user@host:~$ git clone https://github.com/TANGO-Project/code-optimiser-plugin

The process to build the plug-in and the Eclipse site using Maven:

user@host:~$ cd <cop_path>/code-optimiser-plugin

user@host:<cop_path>/cop-plugin$ mvn clean install

user@host:~$ cd <cop_path>/tango-eclipse-site

user@host:<cop_path>/tango-eclipse-site $ mvn clean install

Install the plug-in from a local update site using the following local URL:

jar:file<cop_path>/tango-eclipse-site/target/site-<version>-SNAPSHOT.zip!/

2.3.4 Configuration

The COP component, need valid calibration data for the energy model to provide accurate values. An initial set of values are provided by default. The energy modeller calibration tool may be used to provide a set of values in cases where an attached watt meter is available.



of 39

3 Installation and Configuration Guide for runtime software packages

Once application development has taken place, it is possible to deploy and run it on a heterogeneous infrastructure. While in year-1, a significant amount of effort remains manual to upload, compile, execute, collect and aggregate monitoring data, most tools from the TANGO general architecture shown in the red box provide an initial implementation.

Figure 3: General TANGO Architecture with operation software components in Red boxes.

First, the SLURM tool provides an implementation for the Device Supervisor and the Infrastructure Monitor. Second, energy probes installed on hardware hosts can retrieve the energy consumed by the various heterogeneous components and communicated to the Infrastructure Monitor (SLURM or other).Third, the Self-Adaptation Manager proposes an initial implementation of its Energy Modeller. Fourth, the Application Life-cycle engine also presents an initial implementation. Only the implementation of Device Emulator will start during Year-2. The standard code for SLURM is available at https://slurm.schedmd.com/

The subsections below present the installation manual for:

https://slurm.schedmd.com/



of 39

Extra Energy Probes to install on hardware host to measure energy of the GPU and XeonPhi elements of a host. (NOTE: Energy consumption for the host can already be collected through IPMI or RAPL by SLURM

SLURM (Device Supervisor and Infrastructure Monitor)

Energy Modeller (Part of the Self-Adaptation Engine)

Application Life-cycle Deployment Engine

It is worth emphasizing that for the Benchmarking exercised in year-1 on Nova 2, only the SLURM component was used. Others have so far not been fully integrated to be used. Furthermore, most of these other components will be truly useful in order to achieve self-adaptation action at deployment and operation time. Thus, for this first year where benchmarking exercises was done statically and where application could be installed and executed manually, the TANGO operational components other than SLURM where not absolutely required.

3.1 Installation and Configuration of Extra Energy Probes

At the beginning of the project SLURM already provided probes to measure energy of a whole node using IPMI and energy consumption of an Intel CPU processor using Intel RAPL. To complement this, during the first year, it was decided to start building energy hardware probes to other hardware components such as Nvidia GPUs or Intel Xeon Phi Many-Core Processors (MCP). The probes tests of this last component are still in progress. Finally, it is expected that by the end of the project this component reaches at least a TRL 6 (Technology Readiness Level 6: system / subsystem model or prototype demonstration in a relevant environment).

These probes are designed to work only with SLURM monitoring infrastructure or with CollectD monitoring server. The next sections report about them.

3.1.1 Nvidia GPUs

In order to monitor these GPUs, this component relies on the NVIDIA Management Library (NVML2) provided by NVIDIA. This is a C-based API that offers a set of functions for monitoring various states within these GPUs, like temperature, power consumption, fan speeds etc.

3.1.1.1 Supported OS platforms and products3

The NVML library currently supports the following operating systems:

- Windows Server 2008 R2 64-bit, Windows Server 2012 R2 64bit, Windows 7-8 64-bit

- Linux 32-bit and 64-bit

The list of full supported NVIDIA products is the following:

- NVIDIA Tesla Line: S2050, C2050, C2070, C2075, M2050, M2070, M2075, M2090,

X2070, X2090, K8, K10, K20, K20X, K20Xm, K20c, K20m, K20s, K40c, K40m, K40t, K40s,

K40st, K40d, K80

- NVIDIA Quadro Line: 410, 600, 2000, 4000, 5000, 6000, 7000, M2070-Q, K2000,

K2000D, K4000, K5000, K6000

- NVIDIA GRID Line: K1, K2, K340, K520

2 https://developer.nvidia.com/nvidia-management-library-nvml

3 NVML API Reference documentation: https://docs.nvidia.com/deploy/nvml-api/nvml-api-

reference.html

https://developer.nvidia.com/nvidia-management-library-nvml

https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html

https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html



of 39

And finally, it also offers a limited support for the following products:

- NVIDIA Tesla Line: S1070, C1060, M1060 and all other previous generation Tesla-

branded parts

- NVIDIA Quadro Line: All other current and previous generation Quadro-branded parts

- NVIDIA GeForce Line: All current and previous generation GeForce-branded parts

3.1.1.2 Example:

The example that can be found in the Monitor Infrastructure repository4 looks for NVIDIA GPUs and retrieves the power usage of these devices (in Watts). To install and run this example these are the requirements:

- Download and install the GPU Deployment Kit5

- Modify (if needed) the Makefile, and then execute ‘make’ command

- Run the program

3.1.2 NVIDIA Plugins

Taken as a basis the previous example, we created two plugins in order to integrate the NVIDIA monitoring capability in the Monitoring Infrastructure. These plugins will be used respectively by Collectd and Slurm. The integration of these two plugins into the TANGO framework is currently taking place and will be finalized during the second year of the project.

3.1.2.1 Collectd

The “NVIDIA” Collectd plugin follows the instructions described in the Collectd Wiki – Plugin Architecture6.

These are the steps to compile and run the NVIDIA plugin for Collectd:

- Get collectd source code from https://github.com/collectd/collectd

- "build.sh" and "./configure && make"

- Create / edit the program following the instructions from Plugin architecture

- Compile plugin C program. For example:

o gcc -DHAVE_CONFIG_H -Wall -Werror -g -O2 -shared -fPIC -Isrc/ -Isrc/daemon/

-lnvidia-ml -L nvidia/lib/ -ldl -o nvidia_plugin.so nvidia/nvidia_plugin.c

This plugin also adds the following metric to Collectd:

static data_source_t dsrc[1] = { { "watts", DS_TYPE_GAUGE, 0, NAN } };

Collectd can obtain these results by enabling other plugins, like the RDD or the CSV ones, and these new metrics should be available after configuring and launching Collectd.

3.1.2.2 Slurm

The “NVIDIA” Slurm plugin follows the following architecture:

4 https://github.com/TANGO-

Project/general/tree/master/Middleware/Monitor_Infrastructure/Tests%20Examples/NVML 5 https://developer.nvidia.com/gpu-deployment-kit

6 https://collectd.org/wiki/index.php/Plugin_architecture

https://github.com/TANGO-Project/general/tree/master/Middleware/Monitor_Infrastructure/Tests%20Examples/COLLECTD_NVML_PLUGIN

https://github.com/collectd/collectd

https://github.com/TANGO-Project/general/tree/master/Middleware/Monitor_Infrastructure/Tests%20Examples/SLURM_NVML_PLUGIN

https://github.com/TANGO-Project/general/tree/master/Middleware/Monitor_Infrastructure/Tests%20Examples/NVML

https://github.com/TANGO-Project/general/tree/master/Middleware/Monitor_Infrastructure/Tests%20Examples/NVML

https://developer.nvidia.com/gpu-deployment-kit

https://collectd.org/wiki/index.php/Plugin_architecture



of 39

/* * init() is called when the plugin is loaded, before any other functions * are called. Put global initialization here. */ extern int init(void) { nvmlReturn_t result; … return SLURM_SUCCESS; } /* * This function is called before shutting down */ extern int fini(void) { nvmlReturn_t result; … return SLURM_SUCCESS; } /* * setter and getter methods for the gathered values: */ extern int acct_gather_energy_p_update_node_energy(void) { int rc = SLURM_SUCCESS;

… return rc; } extern int acct_gather_energy_p_get_data(enum acct_energy_type data_type, void *data) { int rc = SLURM_SUCCESS; … return rc; } extern int acct_gather_energy_p_set_data(enum acct_energy_type data_type, void *data) { int rc = SLURM_SUCCESS; … return rc; } /* * These functions are called for configuration: */ extern void acct_gather_energy_p_conf_options(s_p_options_t **full_options, int *full_options_cnt) { … return; } extern void acct_gather_energy_p_conf_set(s_p_hashtbl_t *tbl) { … return; } extern void acct_gather_energy_p_conf_values(List *data) { … return; }



of 39

3.1.2.3 Installation and configuration

Currently, the Collectd and the Slurm plugins for NVIDIA were tested in isolation, during the second year of the project, they will be fully integrated as part of the TANGO framework and installed on Nova 2 testbed.

3.2 Installation and Configuration of SLURM

In Year-1, SLURM plays the role of the device supervisor and of the component that collect the measurement data, which is a part of the role of the Infrastructure Monitor.

Slurm is a basic part of Tango framework and there are planned contributions for upcoming work-packages on years 2 and 3. However since some basic features of Slurm are needed for the Tango device supervisor and infrastructure monitor of year 1 this section provides the general installation and configuration of Slurm along with some short user-level guide.


All Linux variants and the following hardware architectures are supported (i386, x86-64, ARM, PowerPC or IA64)


GNU C/C++ compiler versions 4.4 or later

autotools (libtool, automake, autoreconf, make)

freeipmi version 1.2.1 or later

hwloc and hwloc-devel

munge and munge-devel

mysql or mariadb

hdf5

3.2.3 Installation instructions

1. Slurm can be downloaded from https://schedmd.com/downloads.php . It is better to select the latest stable version. At the time of the writing of this report this is the version 16.05.

2. Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.

3. Download and Install MUNGE for authentication. You can download from here: https://dun.github.io/munge/ Make sure that all nodes in your cluster have the same munge.key. Make sure the MUNGE daemon, munged is started before you start the Slurm daemons.

4. bunzip2 the distributed tar-ball and untar the files: 5. tar --bzip -x -f slurm*tar.bz2 6. cd to the directory containing the Slurm source and type ./configure with appropriate

options, typically --prefix= and --sysconfdir= 7. Type make to compile Slurm. 8. Type make install to install the programs, documentation, libraries, header files, etc. 9. Build a configuration file using a web browser and doc/html/configurator.html. 10. Create the Slurm User to all compute nodes of the cluster . 11. The parent directories for Slurm's log files, process ID files, state save directories, etc.

must be created and made writable by Slurm User as needed prior to starting Slurm daemons.

12. Install the configuration file in <sysconfdir>/slurm.conf and copy it on all nodes of the cluster.

https://schedmd.com/downloads.php



of 39

13. Start the slurmctld and slurmd daemons.

For the configuration details you can follow the instructions at the official Slurm site: https://slurm.schedmd.com/documentation.html

3.2.4 Slurm accounting and profiling framework

Slurm provides mechanisms that enables the detailed monitoring and reporting of resources consumption during the job execution. This section will give some configuration details of this framework

The following parameters are needed to configure the job accounting gather characteristics that are collected during the job execution:

JobAcctGatherType

The job accounting mechanism type. Acceptable values at present include:

jobacct_gather/aix

jobacct_gather/linux

jobacct_gather/cgroup (jobacct_gather/cgroup uses cgroups to collect accounting statistics)

jobacct_gather/none (no accounting data collected). The default value is jobacct_gather/none.

JobAcctGatherFrequency

The job accounting sampling interval. For jobacct_gather/none this parameter is ignored. For jobacct_gather/aix and jobacct_gather/linux the parameter is a number in seconds between sampling job state. The default value is 30 seconds. The minimum is 1 sec. A value of zero disables real the periodic job sampling and provides accounting information only on job termination (reducing SLURM interference with the job).

AcctGatherNodeFreq

The AcctGather plugins sampling interval for node accounting. For AcctGather plugin values of none, this parameter is ignored. For all other values, this parameter is the number of seconds between node accounting samples. The minimum is 1 sec. The default value is zero. This value disables accounting sampling for nodes. Note: The accounting sampling interval for jobs is determined by the value of JobAcctGatherFrequency.

AcctGatherEnergyType

Identifies the plugin to be used for energy consumption accounting. The jobacct_gather plugin and slurmd daemon call this plugin to collect energy consumption data for jobs and nodes. Configurable values at present are:

acct_gather_energy/none No energy consumption data is collected.

acct_gather_energy/ipmi Energy consumption data is collected from the Baseboard Management Controller (BMC) using the Intelligent Platform Management Interface (IPMI).

o acct_gather_energy/ipmi_raw Energy consumption data is collected from the Baseboard Management Controller (BMC) using the Intelligent Platform Management Interface (IPMI), based on BMC internal consolidation.



of 39

acct_gather_energy/rapl Energy consumption data is collected from hardware sensors using the Running Average Power Limit (RAPL) mechanism. The recommended option is acct_gather_energy/rapl

Besides accounting which gives aggregated results upon the resources usage, it is always desirable to have detailed data about an application performance. This data has traditionally been used to improve an application use of resources, particularly CPUs. There is an increasing need to improve the scheduling and placement of an application regarding its use of cluster resources. It is important to schedule applications to efficiently use energy. It is also important to allocate resources that are physically close together to minimize network latency for both message passing and use of parallel file systems.

In this context a profiling plugin exists that allows detailed data from different sources to be collected simultaneously and stored in a single file. The file is an HDF5 file, a format well known in High Performance Computing that allows heterogeneous data to reside in one structured dataset. In this case, there are sections for Energy statistics, Lustre I/O, Network I/O, and Task data.)

There are community programs, notably HDFView for viewing and manipulating these files.

AcctGatherInfinibandType

Identifies the plugin to be used for InfiniBand network traffic accounting. The plug-in is activated only when profiling on hdf5 files is activated and the user asks for network data collection for jobs through --profile=Network (or =All). The collection of network traffic data takes place at node level, hence only in the case of exclusive job allocation the collected values will reflect the jobs real traffic. All network traffic data is logged on hdf5 files per job on each node. No storage in the SLURM database takes place. Configurable values at present are:

acct_gather_infiniband/none No InfiniBand network data is collected.

acct_gather_infiniband/ofed InfiniBand network traffic data is collected from the hardware monitoring counters of InfiniBand devices through the OFED library.

AcctGatherFilesystemType

Identifies the plugin to be used for file system traffic accounting. The plug-in is activated only when profiling on hdf5 files is activated and the user asks for file system data collection for jobs through –profile=Network (or =All). The collection of file system traffic data takes place at node level; hence only in case of exclusive job allocation the collected values will reflect the jobs real traffic. All file system traffic data is logged on hdf5 files per job on each node. No storage in the SLURM database takes place. Configurable values at present are:

acct_gather_filesystem/none No file system data are collected.

acct_gather_filesystem/lustre Lustre file system traffic data are collected from the counters found in /proc/fs/lustre/.

For additional information on the accounting plugins, see Accounting and Resource Limits in the SLURM documentation.

Follow the link to Profiling Using HDF5 User Guide in the SLURM HTML documentation for more details.



of 39

3.2.5 SLURM Key Functions

As a cluster resource manager, SLURM has three key functions. Firstly, it allocates exclusive and/or non-exclusive access to resources (Compute Nodes) to users for some duration of time so they can perform work. Secondly, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.

Optional plug-ins can be used for accounting, advanced reservation, backfill scheduling, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.

Users interact with SLURM using various command line utilities7:

SRUN to submit a job for execution

SBCAST to transmit a file to all nodes running a job

SCANCEL to terminate a pending or running job

SQUEUE to monitor job queues

SINFO to monitor partition and the overall system state

SACCTMGR to view and modify SLURM account information. Used with the slurmdbd daemon

SACCT to display data for all jobs and job steps in the SLURM accounting log

SBATCH for submitting a batch script to SLURM

SALLOC for allocating resources for a SLURM job

SATTACH to attach to a running SLURM job step.

STRIGGER used to set, get or clear SLURM event triggers

SVIEW used to display SLURM state information graphically. Requires an XWindows capable display

SREPORT used to generate reports from the SLURM accounting data when using an accounting database

SSTAT used to display various status information of a running job or step

System administrators perform privileged operations through an additional command line utility, SCONTROL.

The central controller daemon, SLURMCTLD, maintains the global state and directs operations. Compute nodes simply run a SLURMD daemon (similar to a remote shell daemon) to export control to SLURM.

3.2.6 SLURM Components

SLURM consists of three types of daemons and various command-line user utilities. The relationships between these components are illustrated in the following diagram:

7 For more info a detailed user-guide can be found in the section “Slurm Users” at this link:

https://slurm.schedmd.com/documentation.html

https://slurm.schedmd.com/documentation.html



of 39

Figure 4: SLURM Simplified Architecture.

3.2.6.1 SLURMCTLD

The central control daemon for SLURM is called SLURMCTLD. SLURMCTLD is multi – threaded; thus, some threads can handle problems without delaying services to normal jobs that are also running and need attention. SLURMCTLD runs on a single management node (with a fail-over spare copy elsewhere for safety), reads the SLURM configuration file, and maintains state information on:

Nodes (the basic compute resource)

Partitions (sets of nodes)

Jobs (or resource allocations to run jobs for time period)

Job steps (parallel tasks within a job).

Software Subsystem Role Description

Node Manager Monitors the state and configuration of each node in the cluster.

It receives state-change messages from each Compute Node's SLURMD daemon asynchronously, and it also actively polls these daemons periodically for status reports.

Partition Manager Groups nodes into disjoint sets (partitions) and assigns job limits and access controls to each partition. The partition manager also allocates nodes to jobs (at the request of the Job Manager) based on job and partition properties. SCONTROL is the (privileged) user utility that can alter partition properties.



of 39

Job Manager Accepts job requests (from SRUN or a metabatch system), places them in a priority-ordered queue, and reviews this queue periodically or when any state change might allow a new job to start. Resources are allocated to qualifying jobs and that information transfers to (SLURMD on) the relevant nodes so the job can execute. When all nodes assigned to a job report that their work is done, the Job Manager revises its records and reviews the pending-job queue again.

3.2.6.2 SLURMD

The SLURMD daemon runs on all the Compute Nodes of each cluster that SLURM manages and performs the lowest level work of resource management. Like SLURMCTLD (previous subsection), SLURMD is multi-threaded for efficiency; but, unlike SLURMCTLD, it runs with root privileges (so it can initiate jobs on behalf of other users).

SLURMD carries out five key tasks and has five corresponding subsystems. These subsystems are described in the following table.

SLURMD Subsystem Description of Key Tasks

Machine Status Responds to SLURMCTLD requests for machine state information and sends asynchronous reports of state

changes to help with queue control.

Job Status Responds to SLURMCTLD requests for job state information and sends asynchronous reports of state

changes to help with queue control.

Remote Execution Starts, monitors, and cleans up after a set of processes (usually shared by a parallel job), as decided by

SLURMCTLD (or by direct user intervention). This can often involve many changes to process-limit,

environment-variable, working-directory, and user-id.

Stream Copy Service Handles all STDERR, STDIN, and STDOUT for remote tasks. This may involve redirection, and it always

involves locally buffering job output to avoid blocking local tasks.

Job Control Propagates signals and job-termination requests to any SLURM managed processes (often interacting with

the Remote Execution subsystem).

3.2.6.3 SlurmDBD (SLURM Database Daemon)

The SlurmDBD daemon stores accounting data into a database. Storing the data directly into a database from SLURM may seem attractive, but requires the availability of user name and password data, not only for the SLURM control daemon (slurmctld), but also user commands which need to access the data (sacct, sreport, and sacctmgr). Making possibly sensitive information available to all users makes database security more difficult to provide, sending the data through an intermediate daemon can provide better security and performance



of 39

(through caching data) and SlurmDBD provides such services. SlurmDBD is written in C, multi-threaded, secure and fast.

More information can be found in the official Slurm documentation for accounting8.

3.3 Installation and Configuration of Energy Modeller

The code can be found at https://github.com/TANGO-Project/energy-modeller

3.3.1 Minimal System Requirements

The energy modeller has two modes of operation the first gathers data and populates the models used for energy and power calculations while the second acts as a sub-component used for querying the generated models. In both cases it requires access to a MySQL database for the purposes of data storage and querying data which is used within the energy models.

The energy modeller is expected to work over a network and utilise a monitoring infrastructure such as Zabbix or by integration into SLURM, which provide the raw power information for the physical hosts. In addition to current host power information. In the event that not all host machines have Watt meters attached an estimated power value may be utilised instead. Such an estimated value can be generated by the Watt meter emulator component.

If Zabbix or SLURM are not to be used then the modeller may be directly attached to a WattsUp? Meter ¡Error! No se encuentra el origen de la referencia. in order to provide a tandalone mode of operation.


The Energy modeller has been tested on both Windows and Linux and works within any Java compliant environment.


To use the IaaS Pricing Modeller, the following dependencies must be resolved:

Dependency Version Comment

Java 7

Maven 2.2.1

MySQL 5.6.17

MySQL-connector-java 5.1.30

Apache commons-math3 3.3

log4j 1.2.17

Sigar ¡Error! No se ncuentra el origen de la referencia.

1.6.4 Used in standalone mode only, to obtain CPU load metric data

WattsUp SDK ¡Error! No e encuentra el origen de la referencia.

1.0 Used in standalone mode to contact the WattsUp? Meter

NRJavaSerial ¡Error! No e encuentra el origen de la referencia.

3.11.0 Used in standalone mode to contact the WattsUp? Meter

8 https://slurm.schedmd.com/accounting.html

https://github.com/TANGO-Project/energy-modeller

https://slurm.schedmd.com/accounting.html



of 39


To install the energy modeller, the following steps must be performed:

1. Generate the Energy Modeller jar using the command: mvn clean package (executed in the Energy Modeller directory)

2. Install the database. SQL statements to setup the database are held in the file “energy modeller db.sql” file it is held in the {energy-modeller root directory}\src\main\resources.

3. Install of Standalone Calibration Tool

a) Generate the jar energy modeller standalone calibration tool jar using the command: mvn clean package (executed in the standalone calibration tool directory)

b) Install the energy-modeller-standalone-calibration-tool on each host that is to be calibrated.

A configuration file called “Apps.csv” can now be specified. This file providing details about the application/s used to induce the training load for the host.

An example is provided within the source code and the headers as part of a default file are written out to disk if the apps.csv file is not found. A test application has also been provided under utils\ascetic-load-generator-app. This file specifies the following: The start time the application should run, the standard out and error files to redirect output to, the applications working directory and if output should also be redirected to the screen or not.

Configuring the Energy Modeller

3. The last stage is to configure the energy modeller. The energy modeller is also highly configurable and has several files that may be used to change its behaviour. The energy modeller has the following settings files in order to achieve these changes:

Settings File Purpose

energy-modeller-db.properties Holds database information for the energy modeller

energy-modeller-predictor.properties Holds settings relating to the prediction of energy usage.

energy-modeller-db-zabbix.properties Holds information on how to connect to the Zabbix database directly.

ascetic-zabbix-api.properties Settings for the Zabbix client, used to connect to a Zabbix based information source.

filter.properties This holds settings to distinguish between a host and a VM.

These settings must be tailored to the specific infrastructure. The settings are described below and an example of the settings is provided for reference.

energy-modeller-db.properties

This file specifies various database related settings for the energy modeller. An example is provided below:

energy.modeller.db.url = jdbc:mysql://iaas-vm-dev:3306/ascetic-em

energy.modeller.db.driver = org.mariadb.jdbc.Driver



of 39

energy.modeller.db.password = XXXXX

energy.modeller.db.user = user-em

This includes specifying the database username and password for the energy modeller to connect to its background database. This includes information such as the connection URL, the driver to use and the username and password to use.

The SQL script to setup the database structure is held in the file IaaS energy modeller db.sql. It is held under the directory {energy-modeller root directory}\src\main\resources.

energy-modeller-predictor.properties

This file specifies settings for the energy predictor mechanism, an example of such a file is provided below:

energy.modeller.cpu.energy.predictor.datasource =

ZabbixDirectDbDataSourceAdaptor

energy.modeller.cpu.energy.predictor.workload =

CpuRecentHistoryWorkloadPredictor

energy.modeller.cpu.energy.predictor.default_load = -1.0

energy.modeller.cpu.energy.predictor.utilisation.observe_time.min = 0

energy.modeller.cpu.energy.predictor.utilisation.observe_time.sec = 15

The data source parameter indicates how the energy modeller's predictor function will gain the environment data that it needs. It can be one of the following options:

ZabbixDirectDbDataSourceAdaptor: The default connector that directly accesses the Zabbix database for the information that it requires. This adaptor utilises the configuration file energy-modeller-db-zabbix.properties.

SlurmDataSourceAdaptor: This is an adaptor that connects the energy modeller into a SLURM job management based environment. Allowing access to information about the physical host.

ZabbixDataSourceAdaptor: This is an alternative adaptor that utilises at the JSON API of Zabbix in order to get hold of the required host and VM data.

WattsUpMeterDataSourceAdaptor: for local usage of the energy modeller

It should be noted that the observation window should not be too small, especially during the usage of the Zabbix data source adaptors, which may provide fewer data points than the WattsUpMeterDataSourceAdaptor, the latter been able to report at an interval as low as every second.

The energy predictor can utilise several different workload estimator functions. The default is to use the CpuRecentHistoryWorkloadPredictor. This has the following configuration settings.

The default_load parameter indicates what load the predictor should use as an estimate. It should be specified in the range 0..1. An alternative is to provide it the value -1, in which it will default to using the observed current load.

In the case where the observer current load is being used the observe_time.min and observe_time.sec parameters are used to indicate the size of the observation window for CPU utilisation. The two values are simply added together to make the total observation window time. The default observation window size is 15 minutes.



of 39

The other options for workload prediction are:

BasicAverageCpuWorkloadPredictor

BasicAverageCpuWorkloadPredictorDisk

BootAverageCpuWorkloadPredictor

BootAverageCpuWorkloadPredictorDisk

DoWAverageCpuWorkloadPredictor

DoWAverageCpuWorkloadPredictorDisk

These predictors work on historical load information and are designed for virtualised infrastructures in which each VM can be tagged with basic information about the application the VM is for and the disk image it is based upon.

Average CPU Workload predictors: give an estimate of the workload based upon the average CPU utilisation for a given application tag or base disk image.

Average Boot Workload predictors: give an estimate of the workload based upon the time from boot of a VM for a given application tag or base disk image.

Day of Week (DoW) Workload predictors: give an estimate of the workload based upon the time and day of the week that a VM is active for a given application tag or base disk image.

energy-modeller-db-zabbix.properties

This is the configuration file used to configure the energy modeller when using the ZabbixDirectDBDataSourceAdaptor. It holds the database connection settings used to connect directly to the Zabbix database.

energy.modeller.zabbix.db.driver = org.mariadb.jdbc.Driver

energy.modeller.zabbix.db.url = jdbc:mysql://192.168.3.199:3306/zabbix

energy.modeller.zabbix.db.user = zabbix

energy.modeller.zabbix.db.password = XXXXX

energy.modeller.filter.begins = wally

energy.modeller.filter.isHost = true

This includes specifying the database username and password for the energy modeller to connect to directly interface with the Zabbix database. This includes information such as the connection URL, the driver to use, the username and password to use.

filter.properties

This settings file is used in conjunction with the ZabbixDataSourceAdaptor and the ascetic-zabbix-api.properties configuration file. This settings file has two properties; the first indicates a string that is at the start of a host/vms name to be searched for. The second parameter indicates that if this string is found that it is a host or virtual machine (VM). True if a host false if a VM. The following is an example of the defaults that are written to disk in the event the file is not found.

energy.modeller.filter.begins = wally

energy.modeller.filter.isHost = true



of 39

The energy modeller in addition to this settings file has a method called setHostFilter, that allows alternative patterns such as looking at the ending of a hostname to determine if it is a host or VM or not.

3.3.5 Using the standalone calibrator

The standalone calibration tool is designed to calibrate the model that is used for physical hosts and can be found at https://github.com/TANGO-Project/energy-modeller-calibration-tool . Its usage is as follows:

java –jar energy-modeller-standalone-calibration-tool-0.0.1-

SNAPSHOT.jar <hostname> [halt-on-calibrated] [benchmark-only] [no-

benchmark] [use-watts-up-meter]

<hostname>: This is a non-optional argument that states which host to emulate the Watt meter for. If no hostname is specified the tool will work for all calibrated hosts.

[halt-on-calibrated]: The halt-on-calibrated flag will prevent calibration in cases where the data has already been gathered.

[benchmark-only]: The benchmark-only flag skips the calibration run and performs a benchmark run only. Benchmarking allows physical hosts to be ranked in order i.e. performance per Watt for example.

[no-benchmark]: The no-benchmark flag skips the benchmarking.

[use-watts-up-meter]: The use-watts-up-meter flag can be used so that Zabbix is not used for calibration but local measurements are performed instead. This requires a Watt’s Up Meter.

The standalone calibrator uses the same configuration files as the energy modeller.

These are namely the: energy-modeller-db, energy-modeller-predictor and energy-modeller-db-zabbix properties files. In addition it has the calibration_settings and energy-modeller-watts-up-meter.properties files. In addition it has an Apps.csv file that is used to specify the training load to be induced.

calibration_settings.properties

#Settings

#Fri Feb 13 11:55:08 GMT 2015

poll_interval=2

delay_before_taking_measurements=4

working_directory=

log_executions=true

simulate_calibration_run=false

The poll_interval indicates how often measurements should be taken during the run. Noting that Zabbix must be configured to report the values back fast enough, for a change to have potentially occurred. The delay before taking measurements indicates the delay in seconds that should be used before measurements are taken immediately after an induced load starts and ends.

The working directory indicates where the apps.csv settings file is located. Log executions indicate if a log should be created that indicates when each application used for generating training load was started and stopped. Simulate calibration run, indicates if the data should be

https://github.com/TANGO-Project/energy-modeller-calibration-tool

https://github.com/TANGO-Project/energy-modeller-calibration-tool



of 39

written to the energy modeller's database or not, simulated runs don't record the data gathered.

energy-modeller-watts-up-meter.properties

#Settings

#Fri May 01 14:29:48 BST 2015

energy.modeller.wattsup.scrape.file=//opt//wattsup-zabbix-

probe//testnode5-wattsup.log

energy.modeller.wattsup.hostId=10134

energy.modeller.wattsup.hostname=testnode5

energy.modeller.wattsup.port=FILE

3.3.5.1 Apps.csv

Time From Start, Command, stdOut, stdError, Working Directory ,Output

To Screen, Stop Time

0,sleep 50,test.out,error.out,,TRUE,50

60,run-stress-point.sh 10 4 60,test.out,error.out,,TRUE,120

160,run-stress-point.sh 20 4 60,test1.out,error1.out,,TRUE,220





The apps.csv file has several columns. Namely: Time From Start, Command, stdOut, stdError, Working Directory, Output To Screen, Stop Time.

These indicate the time in seconds from the start of the calibration run, in which to execute a program: the command to run the program, a description of where to redirect standard out and error, in addition to the working directory of the application. If output should also be sent to screen and finally the time the application is expected to stop, indicated as seconds from the start of the calibration run.

3.3.5.2 Using the Watt Meter Emulator

The Watt meter emulator is a tool that is designed to emulate the presence of a Watt meter in cases where on is not attached to a physical host that is reported to the energy modeller via Zabbix. The code can be found at https://github.com/TANGO-Project/watt-meter-emulator. Its usage is as follows:

java –jar host-power-emulator-0.0.1-SNAPSHOT.jar [hostname]

[host-name-to-clone]

<hostname>: This is an optional argument that states which host to emulate the Watt meter for. If no hostname is specified the tool will work for all calibrated hosts.

[host-name-to-clone]: This is an optional argument that allows the named host to have its data cloned for the purpose of emulating the named host.

[stop-on-clone]: This parameter stops the emulated Watt meter as soon as the cloning of the host calibration data has been completed. Thus it may be used to simply copy calibration data for one host to another.

The watt meter emulator uses the same configuration files as the energy modeller.

https://github.com/TANGO-Project/watt-meter-emulator



of 39

These are namely the: energy-modeller-db, energy-modeller-predictor and energy-modeller-db-zabbix properties files. In addition it has the watt-meter-emulator.properties file.

3.3.5.3 watt-meter-emulator.properties

#Settings

#Tue May 05 16:39:29 BST 2015

output_name=power-estimated

poll_interval=1

This has two settings, one is the metric name that should be output to Zabbix and the second is the rate at which this value should be pushed to Zabbix, in seconds.

3.4 Installation and Configuration of Application Life-cycle Deployment Engine

The Application Lifecycle Deployment Engine (ALDE) is the component responsible of taking an application, build it, package it, and if the targeted testbed supports it, deploy it remote from the TANGO development environment to the TANGO operational environment.

A complete implementation of ALDE is not expected for this first year hence no integration and usage with other components from the TANGO toolbox have been performed at the moment. However, initial implementation effort provides basic functionality reported hereafter. The code is accessible at https://github.com/TANGO-Project/alde

3.4.1 System Requirements

ALDE has the following basic requirements to run:

Python 3.4 or higher

With that minimum requirement, ALDE will be able to perform a basic running in any Windows, Mac OS or Linux based system but, to build an application, other tools maybe are necessary depending on the building scripts. Things like gcc compiler and third party libraries. In those cases it will depend in the application that it is necessary to be build and also in the selected packaged system. More specific requirements of this type will be reported later on in the project.

3.4.2 Installation and configuration

ALDE is packaged in a tar.gz file format, I to install it open a console in your system (Linux, Windows or Mac OS again, the compilation of an application may need specific system or software installed on it), one must perform the following steps:

1. Check that the right python version is already installed:

$ python --version Python 3.5.1

2. Unpackage ALDE

$ tar xvfz alde-1.0.dev0.tar.gz alde-1.0.dev0/ alde-1.0.dev0/alde.egg-info/ alde-1.0.dev0/alde.egg-info/dependency_links.txt alde-1.0.dev0/alde.egg-info/PKG-INFO

https://github.com/TANGO-Project/alde



of 39

alde-1.0.dev0/alde.egg-info/requires.txt alde-1.0.dev0/alde.egg-info/SOURCES.txt alde-1.0.dev0/alde.egg-info/top_level.txt alde-1.0.dev0/alde.egg-info/zip-safe alde-1.0.dev0/alde.py alde-1.0.dev0/app.py alde-1.0.dev0/model/ alde-1.0.dev0/model/application.py alde-1.0.dev0/model/base.py alde-1.0.dev0/model/models.py alde-1.0.dev0/model/__init__.py alde-1.0.dev0/PKG-INFO alde-1.0.dev0/setup.cfg alde-1.0.dev0/setup.py alde-1.0.dev0/__init__.py

3. Install it

$ pip3.4 install --editable . Obtaining file:///home/a510804/moment/alde-1.0.dev0/dist/alde-1.0.dev0 Collecting Flask (from alde==1.0.dev0) Using cached Flask-0.11.1-py2.py3-none-any.whl Collecting Flask-Restless (from alde==1.0.dev0) Collecting Flask-SQLAlchemy (from alde==1.0.dev0) Collecting Flask-Testing (from alde==1.0.dev0) Requirement already satisfied: python-dateutil in /home/a510804/.local/lib/python3.4/site-packages (from alde==1.0.dev0) Collecting sqlalchemy (from alde==1.0.dev0) Collecting click>=2.0 (from Flask->alde==1.0.dev0) Using cached click-6.6-py2.py3-none-any.whl Collecting itsdangerous>=0.21 (from Flask->alde==1.0.dev0) Collecting Jinja2>=2.4 (from Flask->alde==1.0.dev0) Using cached Jinja2-2.8-py2.py3-none-any.whl Collecting Werkzeug>=0.7 (from Flask->alde==1.0.dev0) Using cached Werkzeug-0.11.11-py2.py3-none-any.whl Collecting mimerender>=0.5.2 (from Flask-Restless->alde==1.0.dev0) Requirement already satisfied: six>=1.5 in /home/a510804/.local/lib/python3.4/site-packages (from python-dateutil->alde==1.0.dev0) Collecting MarkupSafe (from Jinja2>=2.4->Flask->alde==1.0.dev0) Collecting python-mimeparse>=0.1.4 (from mimerender>=0.5.2->Flask-Restless->alde==1.0.dev0) Using cached python_mimeparse-1.6.0-py2.py3-none-any.whl Installing collected packages: click, itsdangerous, MarkupSafe, Jinja2, Werkzeug, Flask, sqlalchemy, python-mimeparse, mimerender, Flask-Restless, Flask-SQLAlchemy, Flask-Testing, alde Running setup.py develop for alde Successfully installed Flask-0.11.1 Flask-Restless-0.17.0 Flask-SQLAlchemy-2.1 Flask-Testing-0.6.1 Jinja2-2.8 MarkupSafe-0.23 Werkzeug-0.11.11 alde click-6.6 itsdangerous-0.24 mimerender-0.6.0 python-mimeparse-1.6.0 sqlalchemy-1.1.4

4. Configure the database for the application, you need to edit the file app.py in the root folder of the application, and set the following variable:

SQL_LITE_URL='sqlite:////tmp/test.db'

5. Set the port of the REST service. Again, edit the following line of the file app.py 6. the application, and set the following variable:

PORT=5000

Once ALDE will be integrated with the other operational components of the TANGO toolbox, it will be possible to run an application as follows.



of 39

$ python app.py

For the TANGO development tools such as the Requirement and Design Modelling toolbox, the Programming Model plugins, and eventually the Code Optimiser, ALDE proposes a REST9 service running as the front-end to access the operational part of a TANGO system, in other words, the infrastructure where an application is executed. This service will be running in the URL: http://localhost:5000/ of the operation TANGO system

3.4.3 API Documentation

The documentation for the REST API of this component is available here: http://docs.applicationlifecycledeploymentengine.apiary.io/

The REST API at the moment allows to create the following entities to define a testbed:

Testbed – It defines a testbed, its characteristics and endpoints to interact with it, if possible.

o Node – It defines a computation node in a testbed. If the testbed is the type on-line, the nodes will be automatically added by ALDE to the database.

CPU – CPU information of a Node (it can contain several CPUs) GPU – GPU information of a Node (it can contain several GPUs) MCP – MCP information of a Node (it can contain several MCPs) Memory – Information of the different memory configurations of the

node. Typically it will be one entry per memory module. FPGA – FPGA information of a node (in can contain several FPGAs).

Also, the user can define applications to be built, compiled, packaged, and when possible, deployed in a target testbed.

9 https://en.wikipedia.org/wiki/Representational_state_transfer

http://localhost:5000/

https://jsapi.apiary.io/previews/applicationlifecycledeploymentengine/reference



of 39

4 Conclusions

At the end of the first year of the project, following the installation procedures presented in Section 2 will help a development team with setting up a development environment to facilitate the implementation of an application capable to benefit from heterogeneous hardware capabilities notably in term of parallelisation.

In addition, the installation process described in Section 3 guide the setup of an operational infrastructure composed of heterogeneous hardware so it becomes possible to measure the energy consumed by various types of heterogeneous hardware elements found in the physical host.

Using the development tools and an operational testbed, it is then possible to profile application on time and energy performance. It is then possible benchmark various implementation alternatives of an application to determine how best to scope the granularity of computing tasks to deploy and run on the available heterogeneous hardware host.

At the end of Year-1, the implementation of the various TANGO components is far from final. Extensive effort will continue not only to improve the scientific innovation but also to better integrate these components to automate many tasks that currently remained manual.

At the end of Year-1, neither software nor hardware for the Smart device elements of the TANGO architecture have progressed far enough to be worth presenting. These IoT Smart Device scenarios will be further explored during Year-2 and 3 notably, through the development of the Industrial case study from Deltatec.

Transparent heterogeneous hardware Architecture …tango-project.eu › sites › default › files...

Documents

Transcript of Transparent heterogeneous hardware Architecture …tango-project.eu › sites › default › files...