Machine learning in php
-
Upload
damien-seguy- -
Category
Technology
-
view
454 -
download
1
Transcript of Machine learning in php
![Page 1: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/1.jpg)
MACHINE LEARNING IN PHPThe roots of education are bitter, but the fruit is sweet
Verona, Italia, 2016
![Page 2: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/2.jpg)
AGENDA
How to teach tricks to your PHP
Application : searching for code in comments
Complex learning
![Page 3: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/3.jpg)
SPEAKER
Damien Seguy
Exakat CTO
Static analysis of PHP code
![Page 4: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/4.jpg)
MACHINE LEARNING
Teaching the machine
Supervised learning : learning then applying
Application build its own model : training phase
It applies its model to real cases : applying phase
![Page 5: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/5.jpg)
APPLICATIONS
Play go, chess, tic-tac-toe and beat everyone else
Fraud detection and risk analysis
Automated translation or automated transcription
OCR and face recognition
Medical diagnostics
Walk, welcome guest at hotels, play football
Finding good PHP code
![Page 6: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/6.jpg)
PHP APPLICATIONS
Recommendations systems
Predicting user behavior
SPAM
conversion user to customer
ETA
Detect code in comments
![Page 7: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/7.jpg)
REAL USE CASE
Identify code in comments
Classic problem
Good problem for machine learning
Complex, no simple solution
A lot of data and expertise are available
![Page 8: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/8.jpg)
SUPERVISED TRAINING
Historydata Training
ModelReal data Results
![Page 9: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/9.jpg)
THE FANN EXTENSION
ext/fann (https://pecl.php.net/package/fann)
Fast Artificial Neural Network
http://leenissen.dk/fann/wp/
Neural networks in PHP
Works on PHP 7, thanks to the hard work of Jakub Zelenka
https://github.com/bukka/php-fann
![Page 10: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/10.jpg)
NEURAL NETWORKS
Imitation of nature
Input layer
Output layer
Intermediate layers
![Page 11: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/11.jpg)
NEURAL NETWORK
Imitation of nature
Input layer
Output layer
Intermediate layers
![Page 12: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/12.jpg)
INITIALIZATION<?php
$num_layers = 1; $num_input = 5; $num_neurons_hidden = 3; $num_output = 1; $ann = fann_create_standard($num_layers, $num_input, $num_neurons_hidden, $num_output);
// Activation function fann_set_activation_function_hidden($ann,
FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC);
![Page 13: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/13.jpg)
PREPARING DATA
Raw data Extract Filter Human review Fann ready
![Page 14: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/14.jpg)
EXPERT AT WORK// Test if the if is in a compressed format
// none need yet
// icon
// There is a parser specified in `Parser::$KEYWORD_PARSERS`
// $result should exist, regardless of $_message
// $a && $b and multidimensional
// numGlyphs + 1
// TODO : fix this; var_dump($var);
// if(ob_get_clean()){
//$annots .= ' /StructParent ';
// $cfg['Servers'][$i]['controlpass'] = 'pmapass';
![Page 15: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/15.jpg)
INPUT VECTOR
'length' : size of the comment
'countDollar' : number of $
'countEqual' : number of =
'countObjectOperator' number of -> operator ($o->p)
'countSemicolon' : number of semi-colon ;
![Page 16: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/16.jpg)
INPUT DATA
46 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...
* This file is part of Exakat. * * Exakat is free software: you can redistribute it and/or modify * it under the terms of the GNU Affero General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * Exakat is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Affero General Public License for more details. * * You should have received a copy of the GNU Affero General Public License * along with Exakat. If not, see <http://www.gnu.org/licenses/>. * * The latest code can be found at <http://exakat.io/>. * */
// $x[3] or $x[] and multidimensional
//if ($round == 3) { die('Round '.$round);}
//$this->errors[] = $this->language->get('error_permission');
Number of input Number of incoming data Number of outgoing data
![Page 17: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/17.jpg)
TRAINING$max_epochs = 500000; $desired_error = 0.001;
// the actual trainingif (fann_train_on_file($ann, 'incoming.data', $max_epochs, $epochs_between_reports, $desired_error)) { fann_save($ann, 'model.out'); }fann_destroy($ann); ?>
![Page 18: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/18.jpg)
![Page 19: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/19.jpg)
![Page 20: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/20.jpg)
![Page 21: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/21.jpg)
TRAINING
47 cases
5 characteristics
3 hidden neurons
+ 5 input + 1 output
Duration : 5.711 s
![Page 22: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/22.jpg)
APPLICATION
Historydata Training
ModelReal data Results
![Page 23: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/23.jpg)
APPLICATION<?php
$ann = fann_create_from_file('model.out');
$comment = '//$gvars = $this->getGraphicVars();';
$input = makeVector($comment); $results = fann_run($ann, $input);
if ($results[0] > 0.8) { print "\"$comment\" -> $results[0] \n"; }
?>
![Page 24: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/24.jpg)
RESULTS > 0.8
Answer between 0 and 1
Values ranges from -14 to 0,999
The closer to 1, the safer. The closer to 0, the safer.
Is this a percentage? Is this a carrots count ?
It's a mix of counts…
![Page 25: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/25.jpg)
-16
-12
-8
-4
0
60.000000
70.000000
80.000000
90.000000
100.000000
![Page 26: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/26.jpg)
REAL CASES
Tested on 14093 comments
Duration 367.01ms
Found 1960 issues (14%)
![Page 27: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/27.jpg)
0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';
0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();
/* 0.99999928 if (defined('SESSIONUPLOAD')) { // write sessionupload back into the loaded PMA session
$sessionupload = unserialize(SESSIONUPLOAD); foreach ($sessionupload as $key => $value) { $_SESSION[$key] = $value; }
// remove session upload data that are not set anymore foreach ($_SESSION as $key => $value) { if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX)) == UPLOAD_PREFIX && ! isset($sessionupload[$key]) ) { unset($_SESSION[$key]); } } }
![Page 28: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/28.jpg)
0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232
0.99361396 // We have server(s) => apply default configuration 0.98383027 // Duration = as configured
0.99999928 // original -> translation mapping
0.97590065 // = ( 59 x 84 ) mm = ( 2.32 x 3.31 ) in
![Page 29: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/29.jpg)
True positive False positive
True negative False negative
Found by FANN
Target
![Page 30: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/30.jpg)
True positive
False positive
True negative
False negative
Found by FANN
Target
// $cfg['Servers'][$i]['table_coords'] = 'pma__table_coords';
//(isset($attribs['height'])?$attribs['height']: 1);
// if ($key != null) did not work for index "0"
// the PASSWORD() function
0.99999923
0.73295981
0.99999851
0.2104115
![Page 31: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/31.jpg)
RESULTS
1960 issues
50+% of false positive
With an easy clean, 822 issues reported
14k comments, analyzed in 367 ms
Total time of coding : 27 mins.
// = ( 59 x 84 ) mm = ( 2.32 x 3.31 ) in /* vim: set expandtab sw=4 ts=4 sts=4: */
![Page 32: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/32.jpg)
LEARN BETTER, NOT HARDER
Better training data
Improve characteristics
Configure the neural network
Change algorithm
Automate learning
Update constantly
Real data
Historydata
Training
Model Results
Retroaction
![Page 33: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/33.jpg)
BETTER TRAINING DATA
More data, more data, more data
Varied situations, real case situations
Include specific cases
Experience is capital
https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
![Page 34: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/34.jpg)
IMPROVE CHARACTERISTICS
Add new characteristics
Remove the one that are less interesting
Find the right set of characteristics
![Page 35: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/35.jpg)
NETWORK CONFIGURATION
Input vector
Intermediate neurons
Activation function
Output vector
0
5000
10000
15000
20000
1 2 3 4 5 6 7 8 9 10
1 layer 2 layers 3 layers 4 layers
Time of training (ms)
![Page 36: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/36.jpg)
CHANGE ALGORITHM
First add more data before changing algorithm
Try cascade2 algorithm from FANN
0.6 => 0 found
0.5 => 2 found
Not found by the first algorithm
![Page 37: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/37.jpg)
FINDING THE BEST
Test with 2-4 layers10 neurons
Measure results
0
2250
4500
6750
9000
1 2 3 4 5 6 7 8 9 10 11 12 13
1 layer 2 layers 3 layers 4 layers
![Page 38: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/38.jpg)
DEEP LEARNING
Chaining the neural networks
Auto-encoders
Unsupervised Learning
Genetic algorithm, ant
![Page 39: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/39.jpg)
OTHER TOOLS
PHP ext/fann
Langage R
https://github.com/kachkaev/php-r
Scikit-learn
https://github.com/scikit-learn/scikit-learn
Mahout
https://mahout.apache.org/
![Page 41: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/41.jpg)
![Page 42: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/42.jpg)
AUTRES CONFIGURATIONS
Fonction d'activation
FANN_SIGMOID_SYMMETRIC
FANN_LINEAR
FANN_THRESHOLD
FANN_SIN_SYMMETRIC
![Page 43: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/43.jpg)
Linéaire Seuil
Tangeante
Gaussienne Quadratique
Sigmoide
![Page 44: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/44.jpg)
QUELLES APPLICATIONS?
Non-déterministe
Elimination de tout ce qui est systématique à trouver
Accès à l'expertise et aux vecteurs de caractéristiques
Couche finale après les résultats
Classification, priorisation, approximation rapide
![Page 45: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/45.jpg)
APPRENTISSAGE PAR RENFORCEMENT
Logiciel
Monde réel
RécompenseActionRéaction
![Page 46: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/46.jpg)
FILTRES BAYESIENS
![Page 47: Machine learning in php](https://reader034.fdocuments.in/reader034/viewer/2022051122/588a1bc31a28abb21f8b4679/html5/thumbnails/47.jpg)
ALGORITHMES GÉNÉTIQUES
Population
Population
Selection
Reproduction
PopulationVariations