1 USING PYCHARM, PYLINT, PYTEST, AND VCS TO DESIGN A … Introduction... · 2020. 10. 30. ·...

USING PYCHARM, PYLINT,PYTEST, AND VCS TO DESIGNA PROJECT PIPELINEAntonio Luca Alfeo

1

OVERVIEW1. Installation

2. Work with Pycharm projects

I. Configure the interpreter

II. Requirements management

III. Project navigation and run

3. Code quality checking

I. Pylint

II. Plugin installation and usage

4. Code correctness checking

I. Enabling and configuring Pytest in Pycharm

II. Testing methods and usage

5. Version Control on Pycharm

I. Local history

II. Connect to GitHub

III. Github: share a project

IV. Github: commit and push

V. Github: clone a project

6. Recap and more resources

7. Exercise

8. Python basics to design a structured project pipeline

I. Tabular data with pandas.dataframes

II. Dataframe manipulation

III. Data segregation with Sklearn

IV. Model hyperarametrization

V. From JSON to Object

VI. Performance evaluation

VII. Model deployment

VIII. Project profiling

9. Recap and more resources

10. Exercise

2

3

INSTALLATION

INSTALLATION 1/3

1. Install Git using only the default settings in the installation process

• https://git-scm.com/downloads

2. Install PyCharm

• Linux -> https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.tar.gz

• Windows -> https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.exe

• MacOs -> https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.dmg

4

https://git-scm.com/downloads

https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.tar.gz

https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.exe

https://download.jetbrains.com/python/pycharm-community-anaconda-2020.2.3.dmg

INSTALLATION 2/3

During PyCharm installation enable “open folder as project”

5

INSTALLATION 3/3

• Accept the JetBrainsPrivacy Policy

• Choose the UI theme youprefer

• Do not install anyfeatured plugin

• Install Miniconda: includes

the conda environmentmanager, Python, thepackages they dependon, and a small numberof other useful packages(e.g. pip).

• Start using PyCharm

6

https://docs.conda.io/en/latest/miniconda.html

https://en.wikipedia.org/wiki/Python_(programming_language)

7

WORK WITH PYCHARM PROJECTS

1. Unzip “simpleClassifier.rar”

2. Open the resulting folder “as a PyCharm project”

OPEN A PROJECT

8

Project view

Menu Toolbar

Editor

Run/Debug buttons

Console and Tool windows

GUI

9

https://www.jetbrains.com/help/pycharm/using-consoles.html#context

https://www.jetbrains.com/help/pycharm/guided-tour-around-the-user-interface.html

INTERPRETER CONFIGURATION• File > Settings > Project > Python interpreter > Show all

• > Conda Environment > New environment

• Location and conda executable may have slightly different paths (the first part) according to your miniconda3 location.

• Apply > Ok

10

1

2

3

4

56

7

8

https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html#add_new_project_interpreter

WARNING

It can take some minutes to complete the project setting. A restart may be required before moving to

the next step!

11

PROJECT REQUIREMENTS 1/2• The requirements are all the packages that our software needs to

run properly. Those can be installed with a PyCharm plugin.

• Double click on “requirements.txt” in the Project View > Install Plugin

• Once the plugin is installed click on “Install requirements”, select alland click install.

12

• if everything goes smoothly you might see the package installationstatus progress at the bottom of the GUI.

• Once the package installation is done a notification appears, but itis still NOT possible to go to next step, at least until PyCharm hasfinished "Updating skeletons".

• Each package can also be installed via GUI or with the Terminal byusing

• “pip install <name_lib>” to install a single package

• “pip install –r requirements.txt” to install the packages on therequirements.txt

PROJECT REQUIREMENTS 2/2

13

https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html

https://en.wikipedia.org/wiki/Pip_(package_manager)

CHECK THE PACKAGES AVAILABLE

You can see (add, eliminate and upgrade*) the packages and libraries available in your virtual environment by checking :

File > Settings > Project > Python Interpreter

14

*

CONFIGURE THE FIRST RUN

1. Click “Add configuration”

2. Select “Python”

15

3. Select “main.py” in your project folder.

4. Click OK

RUN IT!

The confusion matrix!Txt with evaluation score!

Click RUN

16

https://en.wikipedia.org/wiki/Confusion_matrix

17

CODE QUALITY CHECKING

CODE QUALITY CHECKING

Pylint is a source-code, bug and quality checker for the Python programming language. It checks:

• if the code is compliant with Python's PEP8 standard

• if each module is properly imported and used

• if declared interfaces are truly implemented

• if there is duplicated code

18

https://www.pylint.org/

https://www.python.org/dev/peps/pep-0008/

PYLINT PLUGIN INSTALLATION

1. File > Settings

> Plugin

2. Search for

“pylint”

3. Install Pylint

plugin

4. Restart

Pycharm

1

2

3

19

4

https://github.com/leinardi/pylint-pycharm/blob/master/README.md

PYLINT PLUGIN SETUP

1. File > Settings > Pylint

2. Insert the path to the pylint

executable

3. Click “Test”

4. Check the result

5. Click “Apply” and “Ok”. You

can now run “pylint” by

clicking the green arrow in

the Pylint Tool Windows

1

23

45

20

https://github.com/leinardi/pylint-pycharm/blob/master/README.md

PYLINT TOOL WINDOW USAGE

21

1. Double click on a Python

file in the Project View

2. Open the Pylint Tool

Windows

3. Click the green arrow

1

2

3

22

USE PYLINT VIA COMMAND LINE

https://docs.pylint.org/en/1.6.0/run.html

23

CODE CORRECTNESS CHECKING

CODE CORRECTNESS CHECKING

PyCharm supports pytest, providing many testing features such as:

• Dedicated test runner.

• Code completion for test subject.

• Code navigation.

• Detailed failing assert reports.

• Multiprocessing test execution.

24

https://docs.pytest.org/en/stable/

https://www.jetbrains.com/pycharm/guide/technologies/pytest/

ENABLE PYTEST

1

2

1. File > Settings > Python Integrated Tools and

2. Set “pytest” as the “Default test runner”

25

CREATE A NEW TEST1. Put the cursor on the bracket of a method declaration, right-click and

select Go To > Test

2. Create a new test script

3. Give it a directory, a name, and a method to test

1

2

26

3

BUILD AND RUN A TEST 1/3

1. Fill the test function and the assert statement. If this statement is true

the test is PASSED!

2. Click “Edit pytest for …”

27


28

3. Reduce the verbosity of the pytest

outcome to improve readability. Put the

following Additional Arguments:

“ -q --disable-warnings --tb=no “

-q, quiet output

--disable-warnings, don't show most of the warnings

--tb=no, don't show any traceback

4. Click OK and Run the test

3

4


29

TEST PARAMETRIZATION

The Pytest framework makes it easy to write tests to support multiple and complex functional testing for applications and libraries. Pytestenables test parametrization at several levels, such as:

• @pytest.fixture provide a fixed baseline (e.g. an instance of a class) upon which tests can reliably and repeatedly execute. Moreover, via scope controls it is possible to check how often a fixture gets called.

• @pytest.mark.parametrize allows the definition of multiple sets of arguments and fixtures for testing purpose.

30


https://docs.pytest.org/en/stable/example/parametrize.html#paramexamples

https://www.jetbrains.com/help/pycharm/pytest.html#pytest-fixtures

https://www.jetbrains.com/help/pycharm/pytest.html#pytest-parametrize

BUILD AND RUN MANY TESTS

1. Prepare a set of values to

test

2. Test those by passing

them to the test function

3. Now you have one

evaluation for each test

value!

31

1

2

3

USE PYTEST VIA COMMAND LINE

32

https://docs.pytest.org/en/stable/usage.html

33

VERSION CONTROL ON PYCHARM

VERSION CONTROL

In Pycharm the Version Control

can be accessed locally:

1. via VCS > Local History

2. Stores only local and recent

change in the project

highlighting them

34

1

2

ENABLE VCI TO USE GITHUB

Github is way more convenient especially when a project is developed in group.

After VC integration is enabled (1 and 2), PyCharm will ask you whether you want to share project settings files via VCS.

In the dialog you can choose ”Always Add” to synchronize project settings with other repository users who work with PyCharm.

1

2

35

https://www.jetbrains.com/help/pycharm/manage-projects-hosted-on-github.html

36

ENABLE VCI TO USE GITHUB

Login into GitHub, requesting a new access token with your login and password (1). Then, authorize JetBrain IDE integration (2 and 3).

1 2 3


SHARE A PROJECT 1/21. When connection to GitHub has been established, the Share Project on

GitHub dialog appears (VCS > Import Version Control > Share …).

2. Name the repository and click share

37

1

2

3

SHARE A PROJECT 2/2

38

1. Being the initial commit, of the

project every file is highlighted

in red (they are not in the

VCS).

2. Click Add

3. PyCharm will notify once the

process is finalized 1

2

3

… ON GITHUB

Let’s say you have a central repository for your project, the

contribution of each coder will be highlighted by the commits she/he

made. In the next slide you will see how to “sign” your commit º

39

COMMIT AND PUSHUpdate (pull) Push

Commit

Color code:

- Blue: modified

- Green: commit done

- Red: not in the VCS

- White: push done

After the first commit here

there will be the “before

and after” comparison

Explain clearly the

change provided

º

40

https://www.jetbrains.com/help/pycharm/sync-with-a-remote-repository.html#update

CLONE FROM GIT 1/2

1. Go to GitHub and annotate the URL of the repository you are

interested in.

2. From the main menu, choose VCS > Get from Version Control, and

choose GitHub on the left.

1

2

41

CLONE FROM GIT 2/2

1. Specify the URL of the repository that you want to clone (the one you

annotated from GitHub).

2. In the Directory field, enter the path to the folder where your local Git

repository will be created.

3. Click Clone. If you want to create a project based on these sources,

click Yes in the confirmation dialog. PyCharm will automatically set Git

root mapping to the project root directory.

1

2

3

42

RECAPPyCharm is an integrated development environment (IDE) widelyused in the Python community. PyCharm provides:

• Project and code navigation: create a Python project by opening afolder as a project or even from scratch, by creating or importingthe python files. Navigate the project with specialized project views,jump between files, classes, methods and usages.

• Python refactoring: rename, extract methods, introduce variablesand constants

• Configurable python interpreter and virtual environment support.

• Coding assistance and analysis: code completion, syntax and errorhighlighting, quick fixes, guided import, integrated Pythondebugger, linting and unit testing with code coverage.

• PyCharm provides version control integration, a unified userinterface for many VCS such as GithHub, supporting themanagement of commit, push, pull, change lists, cloning… andeven branch, merge, and conflict via Git.

44

https://www.jetbrains.com/pycharm/download/

https://en.wikipedia.org/wiki/Python_(programming_language)

https://www.jetbrains.com/help/pycharm/working-with-source-code.html

https://www.jetbrains.com/help/pycharm/creating-files-from-templates.html#creating-new-file-from-template

https://www.jetbrains.com/help/pycharm/mastering-keyboard-shortcuts.html

https://www.jetbrains.com/help/pycharm/reformat-and-rearrange-code.html

https://www.jetbrains.com/help/pycharm/configuring-local-python-interpreters.html

https://www.jetbrains.com/help/pycharm/creating-virtual-environment.html

https://www.jetbrains.com/help/pycharm/working-with-source-code.html

https://www.jetbrains.com/help/pycharm/auto-completing-code.html#basic_completion

https://www.jetbrains.com/help/pycharm/tutorial-code-quality-assistance-tips-and-tricks.html#44f9d

https://www.jetbrains.com/help/pycharm/resolving-problems.html

https://www.jetbrains.com/help/pycharm/creating-and-optimizing-imports.html#import-packages

https://www.jetbrains.com/help/pycharm/debugging-code.html

https://plugins.jetbrains.com/plugin/11084-pylint

https://www.jetbrains.com/help/pycharm/testing.html

https://www.jetbrains.com/help/pycharm/version-control-integration.html

https://www.jetbrains.com/help/pycharm/github.html


http://www.jetbrains.com/help/pycharm/set-up-a-git-repository.html

45

MORE RESOURCES…

ON GETTING STARTED WITH PYCHARM

PYCHARM DOCS

• Miniconda can be installed at any time from Tools > Install Miniconda3

• https://www.jetbrains.com/help/pycharm/quick-start-guide.html

• https://www.jetbrains.com/help/pycharm/opening-your-project-for-the-first-time.html

• https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html

• https://www.youtube.com/watch?v=hyoS5W_1kBQ&list=PLQ176FUIyIUbDwdvWZNuB7FrCc6hHregy

CREATE AND MANAGE REQUIREMENTS

• https://www.jetbrains.com/help/pycharm/managing-dependencies.html#create-requirements

A PYCHARM BOOK

• https://www.academia.edu/42195195/Hands_On_Application_Development_with_PyCharm_Accelerate_your_Python_applications_using_practical_coding_techniques_in_PyCharm?auto=download

46



https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html


https://www.jetbrains.com/help/pycharm/managing-dependencies.html#create-requirements


ON LINTING

• https://www.python.org/dev/peps/pep-0008/

• https://plugins.jetbrains.com/plugin/11084-pylint

• Pylint is fully customizable: modify your pylintrc to customize which errors or conventions are important to you.

47

https://www.python.org/dev/peps/pep-0008/

https://plugins.jetbrains.com/plugin/11084-pylint

https://gist.github.com/uuklanger/fe65dc6da9169c1585abad4ac2b0d268

ON TESTING

• https://www.jetbrains.com/help/pycharm/pytest.html

• https://www.jetbrains.com/pycharm/guide/technologies/pytest/

• pytest.mark allows you setting the metadata on your test functions. Here you have some builtin markers different from pytest.mark.parametrize:

• @pytest.mark.skip: always skip a test function

• @pytest.mark.skipif: skip a test function if a certain condition is met

• @pytest.mark.xfail: produce an expected failure outcome if a certain condition is met

48

https://www.jetbrains.com/help/pycharm/pytest.html

https://www.jetbrains.com/pycharm/guide/technologies/pytest/

https://docs.pytest.org/en/latest/reference.html#marks

https://docs.pytest.org/en/latest/skipping.html#skipping-test-functions

https://docs.pytest.org/en/latest/reference.html#pytest-mark-skipif

https://docs.pytest.org/en/latest/skipping.html#xfail-mark-test-functions-as-expected-to-fail

ON VERSION CONTROL

PYCHARM DOCS

• https://www.jetbrains.com/help/pycharm/enabling-version-control.html

• https://www.jetbrains.com/help/pycharm/manage-projects-hosted-on-github.html

• https://www.jetbrains.com/help/pycharm/contribute-to-projects.html

VIDEO TUTORIAL

• https://www.youtube.com/watch?v=jFnYQbUZQlA

• https://www.youtube.com/watch?v=_w9XWHDSSa4

• https://www.youtube.com/watch?v=AHkiCKG-JhM

49

https://www.jetbrains.com/help/pycharm/enabling-version-control.html


https://www.jetbrains.com/help/pycharm/contribute-to-projects.html

https://www.youtube.com/watch?v=jFnYQbUZQlA

https://www.youtube.com/watch?v=_w9XWHDSSa4

https://www.youtube.com/watch?v=AHkiCKG-JhM

EXERCISE

1. Use the IrisClassifier.py by changing the number of training epochs,i.e. "max_iter" in the MLPClassifier() documentation page. Howdoes the performance change by using 50 epochs?

2. Perform code linting.

3. Test if the performance is greater than 0.75 with number of trainingepochs equal to 5, 10, 25, 50, 100, 200. How many of them fail?

4. Make commit and push on your personal GitHub repository.

51

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

52

A POSSIBLE SOLUTION

from typing import Setfrom irisclassifier import IrisClassifierimport pytest

epochs: Set[int] = {5, 10, 25, 50, 100, 200}

@pytest.mark.parametrize('epoch', epochs)def test_evaluations(epoch):

i = IrisClassifier(epoch)i.ingestion()i.segregation()i.train()res = i.evaluation()assert res > 0.75

54

PYTHON BASICS TO DESIGN A PROJECT PIPELINE

SO MANY PACKAGES AVAILABLE• Pytest and Pylint: to write better code!

• Pandas: read data from a broad range of sources like CSV, SQLdatabases, and Excel files.

• Json and Joblib: to save/read models and configurations to/fromfiles.

• Sci-kit_learn: machine learning library with a broad range ofclustering, regression and classification algorithms. It provides alsodata preprocessing, pipelining and performance analysis modules.

• Matplotlib: graphic visualization libraries.

• NumPy: to apply mathematical functions and scientificcomputation to multi-dimensional data

…and much more.

55


https://www.pylint.org/

https://pandas.pydata.org/

https://docs.python.org/3/library/json.html

https://joblib.readthedocs.io/en/latest/

https://scikit-learn.org/stable/user_guide.html

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

https://matplotlib.org/

https://numpy.org/

PROJECT ORGANIZATION

• Consider that each class in your project should provideone or more functionalities that work with inputs andconfigurations to generate outputs. Inputs,configurations and outputs have to be stored indedicated folders. Each class should be able tosave/load data and configurations from those folders.

• Let’s start with a very simple example. We aim atclassify a yearly set of time series. We need at least 3folders:

• data -> tabular data, from raw to segregated form

• configs -> configurations and models

• results -> outcomes, figures and scores obtained

56

57

1. UPLOAD/SAVE TABULAR DATA WITH PANDAS.DATAFRAME

from pandas import read_csv

# Read csvdata1 = read_csv('data/hotspotUrbanMobility-1.csv')# Print dataframe shapeprint(data1.shape)# Read csvdata2 = read_csv('data/hotspotUrbanMobility-2.csv')# Print dataframe shapeprint(data2.shape)# Append dataframesdata1 = data1.append(data2, ignore_index=True)print(data1.shape)# Save the new datafame as csvdata1.to_csv('data/completeDataset.csv', index=False)

Path of the csv to read

Dataframe shape

consist of its numer of

rows and column

Append 2 dataframes

and ignore the indexes

that pandas provides

Save the dataframe

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html?highlight=shape#pandas.DataFrame.shape

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html?highlight=append#pandas.DataFrame.append

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html?highlight=to_csv#pandas.DataFrame.to_csv

2. DATAFRAME MANIPULATION WITH PANDAS.DATAFRAME

58

from pandas import read_csv

# Read csvdata = read_csv('data/completeDataset.csv')# Check the content of the dataframeprint(data.describe())# Remove column with constant valuesdata = data.drop('h24', axis=1)# Remove anomalous instancesdata = data.loc[data['Anomalous'] < 1]print(data.describe())# Save as csvdata.to_csv('data/preprocessedDataset.csv', index=False)

Provides statistics for

each column in the

dataframe

Drop the column (axis=1)

whose label is ‘h24’

Loc select instances by

using a boolean array.


https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html?highlight=drop#pandas.DataFrame.drop

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

59

3. DATA SEGREGATION WITH PANDAS AND SCIKIT-LEARN

from pandas import read_csvfrom sklearn.model_selection import train_test_split

# Read csvdata = read_csv('data/preprocessedDataset.csv')print(data.describe())# Random split 75% as training and 25% as testingtraining_data, testing_data, training_labels, testing_labels =

train_test_split(data.iloc[:, 4:len(data.columns)], data.iloc[:, 1])# Save as csvtraining_data.to_csv('data/trainingData.csv', index=False)testing_data.to_csv('data/testingData.csv', index=False)training_labels.to_csv('data/trainingLabels.csv', index=False)testing_labels.to_csv('data/testingLabels.csv', index=False)

Split randomly 75% and

25% of the dataset

(default), once it knows

which are the target

and the data.

iloc provides

integer-location

based indexing

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

60

4. ML MODEL PARAMETRIZATION VIA GRID SEARCH

from sklearn.ensemble import RandomForestClassifierfrom sklearn.model_selection import GridSearchCVfrom pandas import read_csvfrom numpy import ravelimport json

# Read the datatraining_data = read_csv('data/trainingData.csv')training_labels = read_csv('data/trainingLabels.csv')# setup Grid Search for a randomForestClassifierrf = RandomForestClassifier(random_state=0)# values to testparameters = {'n_estimators': (1, 2, 3)}# apply grid searchgs = GridSearchCV(rf, parameters)gs.fit(training_data, ravel(training_labels))# save best configurationconfig_path = 'config/modelConfiguration.json'with open(config_path, 'w') as f:

json.dump(gs.best_params_, f)

The ML model

(random_state is used for

reproducibility)

N_estimator is the name

of the hyperparameter of

RandomForestClassifier

Use ravel to transform

dataframe to narray

Save the best

hyperparameter as JSON

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ravel.html

https://docs.python.org/3/library/json.html#basic-usage

61

5. FROM JSON TO OBJECT

import json

class Params():def __init__(self, n_estimators):

self.n_estimators = n_estimators

# load best configurationconfig_path = 'config/modelConfiguration.json'with open(config_path, 'r') as f:

params = json.load(f)

params_object = Params(**params)

Create a new Object,

and pass the JSON

dictionary as a map to

convert JSON data into a

custom Python Object

Load the JSON .

Remember, it should be

compliant to the

expected json schema

i.e. pass the validation

Use a simple class to get

the model’s parameters

https://docs.python.org/3/library/json.html#basic-usage

https://python-jsonschema.readthedocs.io/en/stable/

https://python-jsonschema.readthedocs.io/en/stable/validate/

62

from numpy import ravelfrom pandas import read_csvfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_scoreimport json

# Read the datatraining_data = read_csv('data/trainingData.csv')training_labels = read_csv('data/trainingLabels.csv')testing_data = read_csv('data/testingData.csv')testing_labels = read_csv('data/testingLabels.csv')# train the modelmodel = RandomForestClassifier(n_estimators=3, random_state=0).fit(training_data, ravel(training_labels))# compute labels on testing datalabels = model.predict(testing_data)# evaluate accuracy scorescore = accuracy_score(ravel(testing_labels), labels)# report the result on a jsonresults = {}results['accuracy'] = float(score)json_str = json.dumps(results, indent=4)with open('results/testReport.json', 'w') as file:

file.write(json_str)

You can use many different

performances measures

Define a set for your results

and save them as JSON

6. ML MODEL PERFORMANCE ASSESSMENT

https://scikit-learn.org/stable/modules/model_evaluation.html

7. MODEL DEPLOYMENT WITH JOBLIB

63

from pandas import read_csvfrom sklearn.ensemble import RandomForestClassifierfrom numpy import ravelimport joblib

# Read the training datatraining_data = read_csv('data/trainingData.csv')training_labels = read_csv('data/trainingLabels.csv')# train the modelinitial_model = RandomForestClassifier(n_estimators=3, random_state=0).fit(training_data, ravel(training_labels))# save the modeljoblib.dump(initial_model, 'config/fitted_model.sav')#...# Read new datatesting_data = read_csv('data/testingData.csv')testing_labels = read_csv('data/testingLabels.csv')# load the modelmodel = joblib.load('config/fitted_model.sav')# Evaluate the initial model on new dataprint('Evaluation score:')print(model.score(testing_data, ravel(testing_labels)))

Save a fitted model to file

Load a fitted model from file

Use the model with new data

https://joblib.readthedocs.io/en/latest/

1. Install the packages via Terminal by using:

• Pip install cProfile

• Pip install snakeviz

2. Than execute:

• python -m cProfile -o test.profile main.py

• snakeviz test.profile

3. Open the link in the Terminal windows:

64

8. PROFILING YOUR PROJECT WITH CPROFILE AND SNAKEVIZ

The main of your

project

The output of your

project profiling

https://docs.python.org/3/library/profile.html#module-cProfile

https://jiffyclub.github.io/snakeviz/

65

8. PROFILING YOUR PROJECT WITH CPROFILE AND SNAKEVIZ

Use the interactive interface to navigate the profiling output or search for

a specific method to understand its usage in your project. More info in

the official documentation

https://docs.python.org/3/library/profile.html#module-cProfile



66

RECAP AND MORE RESOURCES…

MANIPULATE TABULAR DATA WITH PANDAS.DATAFRAME

• Load csv as a dataframe, save a dataframe as csv

• Shape, head and quick description of a dataframe

• Dataframe concatenation and append

• Data sort by index or by value

• Data resampling

• Dataframe selection by value or index

• Dataframe columns/row addition and elimination

• Replace by value, drop duplicate rows, group by value

• Drop NAN values , fill NAN value

• Data interpolation and signal smoothing via moving average

…and much more

67


https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html?highlight=interpolate#pandas.DataFrame.interpolate

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.mean.html

GRAPHS AND VISUALIZATION WITH MATPLOTLIB.PYPLOT

• Plots and save a figure

• Histogram e barplot

• Boxplots

• Scatterplot

• Single e multiple graphs

…and much more

68

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.html

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.savefig.html#matplotlib.pyplot.savefig

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.boxplot.html

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter

https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots

DATA PRE-PROCESSING WITH SCIKIT-LEARN

• Data discretization

• Data normalization with minmax or standard scaler

• Features computation and extraction with NumPy and Scipy

• Split data in train and testing set

…and much more

69

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

https://numpy.org/

https://www.scipy.org/scipylib/index.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

ML MODEL IMPLEMENTATION WITH SCIKIT-LEARN

• MLPclassifier (classification)

• RandomForestClassifier (classification)

• DecisionTreeRegressor (regression)

• MLPregressor (regression)

• k-means (partition based clustering)

• DBSCAN (density based clustering)

• IsolationForest (anomaly detection)

• OneClassSVM (anomaly detection)

…and much more

70

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html

https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html

ARCHITECTURE EVALUATION

• Choose the right performance metric

• Performance visualization

• K-Folds and Stratified-K-Fold cross-validation

• Hyperparameters search via grid search

• Import/export objects parametrization with json

• Validate the json schema

• Dump/load fitted model with joblib

• How much time does an instruction take to execute?

• Official Pycharm profiling tool (just for Pro version)

…and much more

71

https://scikit-learn.org/stable/modules/model_evaluation.html

https://scikit-learn.org/stable/visualizations.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

https://docs.python.org/3/library/json.html

https://python-jsonschema.readthedocs.io/en/stable/validate/

https://python-jsonschema.readthedocs.io/en/stable/

https://joblib.readthedocs.io/en/latest/persistence.html

https://docs.python.org/3/library/time.html#time.time

https://www.jetbrains.com/help/pycharm/profiler.html

EXERCIZE• Given the csvs used in this lab, build a few classes that:

• read the csvs “1” and “2” and append them

• remove the columns with constant values

• select only data from Cluster 0 (‘Cluster’ = 0)

• generate training and testing sets (respectively 75% and 25% of the

whole dataset) using only columns ‘h1’-‘h23’ as data and the column

‘Anomalous’ as target

• build a ML model using isolationForest

• train the model with the training data

• predict the anomalous label with the testing data

• compute the accuracy of your model. Consider that with your labels

you have 1 for anomalous instances whereas isolationForest return -1 for

anomalous instances.

• save the ML trained model

• load the trained model and use it with the “New” data

• print the resulting accuracy on screen.

73

1 USING PYCHARM, PYLINT, PYTEST, AND VCS TO DESIGN A … Introduction... · 2020. 10. 30. ·...

Documents

Transcript of 1 USING PYCHARM, PYLINT, PYTEST, AND VCS TO DESIGN A … Introduction... · 2020. 10. 30. ·...