h2o - Amazon Web Servicesh2o-release.s3.amazonaws.com/h2o/rel-tukey/6/docs-website/h2o-r/h2... ·...

148
"h2o" February 22, 2016 R topics documented: h2o-package ......................................... 5 aaa .............................................. 6 apply ............................................ 6 as.character.H2OFrame ................................... 7 as.data.frame.H2OFrame .................................. 7 as.factor ........................................... 8 as.h2o ............................................ 9 as.matrix.H2OFrame .................................... 9 as.numeric .......................................... 10 as.vector.H2OFrame .................................... 10 australia ........................................... 11 colnames .......................................... 11 dim.H2OFrame ....................................... 11 dimnames.H2OFrame .................................... 12 h2o.aic ............................................ 12 h2o.anomaly ......................................... 13 h2o.anyFactor ........................................ 14 h2o.assign .......................................... 14 h2o.auc ........................................... 15 h2o.betweenss ........................................ 16 h2o.biases .......................................... 16 h2o.cbind .......................................... 17 h2o.centers ......................................... 17 h2o.centersSTD ....................................... 18 h2o.centroid_stats ...................................... 18 h2o.clearLog ........................................ 19 h2o.clusterInfo ....................................... 19 h2o.clusterIsUp ....................................... 20 h2o.clusterStatus ...................................... 20 h2o.cluster_sizes ...................................... 21 h2o.coef ........................................... 21 h2o.coef_norm ....................................... 22 h2o.confusionMatrix .................................... 22 1

Transcript of h2o - Amazon Web Servicesh2o-release.s3.amazonaws.com/h2o/rel-tukey/6/docs-website/h2o-r/h2... ·...

"h2o"February 22, 2016

R topics documented:h2o-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5aaa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6apply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6as.character.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7as.data.frame.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7as.factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8as.h2o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9as.matrix.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9as.numeric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10as.vector.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11colnames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11dim.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11dimnames.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12h2o.aic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12h2o.anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13h2o.anyFactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14h2o.assign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14h2o.auc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15h2o.betweenss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16h2o.biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16h2o.cbind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17h2o.centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17h2o.centersSTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18h2o.centroid_stats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18h2o.clearLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19h2o.clusterInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19h2o.clusterIsUp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20h2o.clusterStatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20h2o.cluster_sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21h2o.coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21h2o.coef_norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22h2o.confusionMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

2 R topics documented:

h2o.createFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23h2o.cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25h2o.day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26h2o.dayOfWeek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26h2o.dct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27h2o.ddply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28h2o.deepfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29h2o.deeplearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30h2o.describe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35h2o.downloadAllLogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36h2o.downloadCSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36h2o.download_pojo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37h2o.exportFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38h2o.exportHDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39h2o.filterNACols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39h2o.find_row_by_threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40h2o.find_threshold_by_max_metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40h2o.gainsLift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41h2o.gbm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42h2o.getConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45h2o.getFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45h2o.getFutureModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46h2o.getGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46h2o.getId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47h2o.getModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47h2o.getTimezone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48h2o.getTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48h2o.getVersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49h2o.giniCoef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49h2o.glm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50h2o.glrm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54h2o.grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57h2o.group_by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58h2o.gsub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59h2o.head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59h2o.hist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60h2o.hit_ratio_table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61h2o.hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61h2o.ifelse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62h2o.importFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63h2o.impute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64h2o.init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65h2o.insertMissingValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68h2o.interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69h2o.is_client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70h2o.killMinus3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70h2o.kmeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71h2o.levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

R topics documented: 3

h2o.listTimezones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73h2o.loadModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73h2o.logAndEcho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74h2o.logloss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74h2o.ls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75h2o.makeGLMModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75h2o.match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76h2o.mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76h2o.mean_residual_deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77h2o.median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78h2o.merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79h2o.metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80h2o.mktime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81h2o.month . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82h2o.mse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83h2o.nacnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84h2o.naiveBayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84h2o.nchar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86h2o.networkTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86h2o.nlevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86h2o.no_progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87h2o.null_deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87h2o.null_dof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88h2o.num_iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88h2o.openLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89h2o.parseRaw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89h2o.parseSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90h2o.performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91h2o.prcomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92h2o.proj_archetypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94h2o.quantile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95h2o.r2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96h2o.randomForest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97h2o.rbind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99h2o.reconstruct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100h2o.removeAll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101h2o.removeVecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102h2o.rep_len . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102h2o.residual_deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103h2o.residual_dof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103h2o.rm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104h2o.round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104h2o.runif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105h2o.saveModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106h2o.scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107h2o.scoreHistory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107h2o.sd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108h2o.sdev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4 R topics documented:

h2o.setLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109h2o.setTimezone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109h2o.show_progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109h2o.shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110h2o.signif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111h2o.splitFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111h2o.startLogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112h2o.stopLogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113h2o.strsplit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113h2o.sub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114h2o.substring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114h2o.summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115h2o.svd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116h2o.table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117h2o.tabulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118h2o.tolower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119h2o.totss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119h2o.tot_withinss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120h2o.toupper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120h2o.trim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121h2o.unique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121h2o.var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121h2o.varimp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122h2o.week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123h2o.weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123h2o.which . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124h2o.withinss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124h2o.year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125H2OClusteringModel-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125H2OConnection-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126H2OFrame-Extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127H2OGrid-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128H2OModel-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129H2OModelFuture-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130H2OModelMetrics-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130housevotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131is.character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132is.factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132is.numeric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132ModelAccessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133na.omit.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134names.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Ops.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135plot.H2OModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136plot.H2OTabulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137predict.H2OModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138print.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

h2o-package 5

print.H2OTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139prostate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140range.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140str.H2OFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141summary,H2OGrid-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141summary,H2OModel-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142walking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142zzz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Index 144

h2o-package H2O R Interface

Description

This is a package for running H2O via its REST API from within R. To communicate with a H2Oinstance, the version of the R package must match the version of H2O. When connecting to a newH2O cluster, it is necessary to re-run the initializer.

Details

Package: h2oType: PackageVersion: 3.8.0.6Branch: rel-tukeyDate: Mon Feb 22 18:41:52 PST 2016License: Apache License (== 2.0)Depends: R (>= 2.13.0), RCurl, jsonlite, statmod, tools, methods, utils

This package allows the user to run basic H2O commands using R commands. In order to use it,you must first have H2O running. To run H2O on your local machine, call h2o.init without anyarguments, and H2O will be automatically launched at localhost:54321, where the IP is "127.0.0.1"and the port is 54321. If H2O is running on a cluster, you must provide the IP and port of the remotemachine as arguments to the h2o.init() call.

H2O supports a number of standard statistical models, such as GLM, K-means, and Random Forest.For example, to run GLM, call h2o.glm with the H2O parsed data and parameters (response vari-able, error distribution, etc...) as arguments. (The operation will be done on the server associatedwith the data object where H2O is running, not within the R environment).

Note that no actual data is stored in the R workspace; and no actual work is carried out by R. R onlysaves the named objects, which uniquely identify the data set, model, etc on the server. When theuser makes a request, R queries the server via the REST API, which returns a JSON file with therelevant information that R then displays in the console.

If you are using an older version of H2O, use the following porting guide to update your scripts:Porting Scripts

6 apply

Author(s)

Anqi Fu, Tom Kraljevic and Petr Maj, with contributions from the H2O.ai team

Maintainer: Tom Kraljevic <[email protected]>

References

• H2O.ai Homepage

• H2O Documentation

• H2O on GitHub

aaa Starting H2O For examples

Description

Starting H2O For examples

Examples

h2o.init()

apply Apply on H2O Datasets

Description

Method for apply on H2OFrame objects.

Usage

apply(X, MARGIN, FUN, ...)

Arguments

X an H2OFrame object on which apply will operate.

MARGIN the vector on which the function will be applied over, either 1 for rows or 2 forcolumns.

FUN the function to be applied.

... optional arguments to FUN.

Value

Produces a new H2OFrame of the output of the applied function. The output is stored in H2O sothat it can be used in subsequent H2O processes.

as.character.H2OFrame 7

See Also

apply for the base generic

Examples

h2o.init()irisPath = system.file("extdata", "iris.csv", package="h2o")iris.hex = h2o.importFile(path = irisPath, destination_frame = "iris.hex")summary(apply(iris.hex, 2, sum))

as.character.H2OFrame Convert an H2OFrame to a String

Description

Convert an H2OFrame to a String

Usage

## S3 method for class H2OFrameas.character(x, ...)

Arguments

x An H2OFrame object

... Further arguments to be passed from or to other methods.

as.data.frame.H2OFrame

Converts a Parsed H2O data into a Data H2OFrame

Description

Downloads the H2O data and then scans it in to an R data frame.

Usage

## S3 method for class H2OFrameas.data.frame(x, ...)

Arguments

x An H2OFrame object.

... Further arguments to be passed down from other methods.

8 as.factor

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)as.data.frame(prostate.hex)

as.factor Convert H2O Data to Factors

Description

Convert a column into a factor column.

Usage

as.factor(x)

Arguments

x a column from an H2OFrame data set.

See Also

is.factor.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)prostate.hex[,2] <- as.factor(prostate.hex[,2])summary(prostate.hex)

as.h2o 9

as.h2o R data.frame -> H2OFrame

Description

Import a local R data frame to the H2O cloud.

Usage

as.h2o(x, destination_frame = "")

Arguments

x An R data frame.destination_frame

A string with the desired name for the H2OFrame.

as.matrix.H2OFrame Convert an H2OFrame to a matrix

Description

Convert an H2OFrame to a matrix

Usage

## S3 method for class H2OFrameas.matrix(x, ...)

Arguments

x An H2OFrame object

... Further arguments to be passed down from other methods.

10 as.vector.H2OFrame

as.numeric Convert H2O Data to Numeric

Description

Converts an H2O column into a numeric value column.

Usage

as.numeric(x)

Arguments

x a column from an H2OFrame data set.

... Further arguments to be passed from or to other methods.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)prostate.hex[,2] <- as.factor (prostate.hex[,2])prostate.hex[,2] <- as.numeric(prostate.hex[,2])

as.vector.H2OFrame Convert an H2OFrame to a vector

Description

Convert an H2OFrame to a vector

Usage

## S3 method for class H2OFrameas.vector(x,mode)

Arguments

x An H2OFrame object

mode Unused

australia 11

australia Australia Coastal Data

Description

Temperature, soil moisture, runoff, and other environmental measurements from the Australia coast.The data is available from http://cs.colby.edu/courses/S11/cs251/labs/lab07/AustraliaSubset.csv.

Format

A data frame with 251 rows and 8 columns

colnames Returns the column names of an H2OFrame

Description

Returns the column names of an H2OFrame

Usage

colnames(x, do.NULL = TRUE, prefix = "col")

Arguments

x An H2OFrame object.

do.NULL logical. If FALSE and names are NULL, names are created.

prefix for created names.

dim.H2OFrame Returns the Dimensions of an H2OFrame

Description

Returns the number of rows and columns for an H2OFrame object.

Usage

## S3 method for class H2OFramedim(x)

12 h2o.aic

Arguments

x An H2OFrame object.

See Also

dim for the base R method.

Examples

h2o.init()iris.hex <- as.h2o(iris)dim(iris.hex)

dimnames.H2OFrame Column names of an H2OFrame

Description

Column names of an H2OFrame

Usage

## S3 method for class H2OFramedimnames(x)

Arguments

x An H2OFrame

h2o.aic Retrieve the AIC. If "train", "valid", and "xval" parameters are FALSE(default), then the training AIC value is returned. If more than oneparameter is set to TRUE, then a named vector of AICs are returned,where the names are "train", "valid" or "xval".

Description

Retrieve the AIC. If "train", "valid", and "xval" parameters are FALSE (default), then the trainingAIC value is returned. If more than one parameter is set to TRUE, then a named vector of AICs arereturned, where the names are "train", "valid" or "xval".

Usage

h2o.aic(object, train = FALSE, valid = FALSE, xval = FALSE)

h2o.anomaly 13

Arguments

object An H2OModel or H2OModelMetrics.

train Retrieve the training AIC

valid Retrieve the validation AIC

xval Retrieve the cross-validation AIC

h2o.anomaly Anomaly Detection via H2O Deep Learning Model

Description

Detect anomalies in an H2O dataset using an H2O deep learning model with auto-encoding.

Usage

h2o.anomaly(object, data, per_feature = FALSE)

Arguments

object An H2OAutoEncoderModel object that represents the model to be used foranomaly detection.

data An H2OFrame object.

per_feature Whether to return the per-feature squared reconstruction error

Value

Returns an H2OFrame object containing the reconstruction MSE or the per-feature squared error.

See Also

h2o.deeplearning for making an H2OAutoEncoderModel.

Examples

library(h2o)h2o.init()prosPath = system.file("extdata", "prostate.csv", package = "h2o")prostate.hex = h2o.importFile(path = prosPath)prostate.dl = h2o.deeplearning(x = 3:9, training_frame = prostate.hex, autoencoder = TRUE,

hidden = c(10, 10), epochs = 5)prostate.anon = h2o.anomaly(prostate.dl, prostate.hex)head(prostate.anon)prostate.anon.per.feature = h2o.anomaly(prostate.dl, prostate.hex, per_feature=TRUE)head(prostate.anon.per.feature)

14 h2o.assign

h2o.anyFactor Check H2OFrame columns for factors

Description

Determines if any column of an H2OFrame object contains categorical data.

Usage

h2o.anyFactor(x)

Arguments

x An H2OFrame object.

Value

Returns a logical value indicating whether any of the columns in x are factors.

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o")iris.hex <- h2o.importFile(path = irisPath)h2o.anyFactor(iris.hex)

h2o.assign Rename an H2O object.

Description

Makes a copy of the data frame and gives it the desired the key.

Usage

h2o.assign(data, key)

Arguments

data An H2OFrame object

key The hex key to be associated with the H2O parsed data object

h2o.auc 15

h2o.auc Retrieve the AUC

Description

Retrieves the AUC value from an H2OBinomialMetrics. If "train", "valid", and "xval" parametersare FALSE (default), then the training AUC value is returned. If more than one parameter is set toTRUE, then a named vector of AUCs are returned, where the names are "train", "valid" or "xval".

Usage

h2o.auc(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OBinomialMetrics object.

train Retrieve the training AUC

valid Retrieve the validation AUC

xval Retrieve the cross-validation AUC

See Also

h2o.giniCoef for the Gini coefficient, h2o.mse for MSE, and h2o.metric for the various thresh-old metrics. See h2o.performance for creating H2OModelMetrics objects.

Examples

library(h2o)h2o.init()

prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)

hex[,2] <- as.factor(hex[,2])model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli")perf <- h2o.performance(model, hex)h2o.auc(perf)

16 h2o.biases

h2o.betweenss Get the between cluster sum of squares. If "train", "valid", and "xval"parameters are FALSE (default), then the training betweenss value isreturned. If more than one parameter is set to TRUE, then a namedvector of betweenss’ are returned, where the names are "train", "valid"or "xval".

Description

Get the between cluster sum of squares. If "train", "valid", and "xval" parameters are FALSE(default), then the training betweenss value is returned. If more than one parameter is set to TRUE,then a named vector of betweenss’ are returned, where the names are "train", "valid" or "xval".

Usage

h2o.betweenss(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OClusteringModel object.

train Retrieve the training between cluster sum of squares

valid Retrieve the validation between cluster sum of squares

xval Retrieve the cross-validation between cluster sum of squares

h2o.biases Return the respective bias vector

Description

Return the respective bias vector

Usage

h2o.biases(object, vector_id = 1)

Arguments

object An H2OModel or H2OModelMetrics

vector_id An integer, ranging from 1 to number of layers + 1, that specifies the bias vectorto return.

h2o.cbind 17

h2o.cbind Combine H2O Datasets by Columns

Description

Takes a sequence of H2O data sets and combines them by column

Usage

h2o.cbind(...)

Arguments

... A sequence of H2OFrame arguments. All datasets must exist on the same H2Oinstance (IP and port) and contain the same number of rows.

Value

An H2OFrame object containing the combined . . . arguments column-wise.

See Also

cbind for the base R method.

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)prostate.cbind <- h2o.cbind(prostate.hex, prostate.hex)head(prostate.cbind)

h2o.centers Retrieve the Model Centers

Description

Retrieve the Model Centers

Usage

h2o.centers(object)

Arguments

object An H2OClusteringModel object.

18 h2o.centroid_stats

h2o.centersSTD Retrieve the Model Centers STD

Description

Retrieve the Model Centers STD

Usage

h2o.centersSTD(object)

Arguments

object An H2OClusteringModel object.

h2o.centroid_stats Retrieve the centroid statistics If "train", "valid", and "xval" parame-ters are FALSE (default), then the training centroid stats value is re-turned. If more than one parameter is set to TRUE, then a named list ofcentroid stats data frames are returned, where the names are "train","valid" or "xval".

Description

Retrieve the centroid statistics If "train", "valid", and "xval" parameters are FALSE (default), thenthe training centroid stats value is returned. If more than one parameter is set to TRUE, then anamed list of centroid stats data frames are returned, where the names are "train", "valid" or "xval".

Usage

h2o.centroid_stats(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OClusteringModel object.

train Retrieve the training centroid statistics

valid Retrieve the validation centroid statistics

xval Retrieve the cross-validation centroid statistics

h2o.clearLog 19

h2o.clearLog Delete All H2O R Logs

Description

Clear all H2O R command and error response logs from the local disk. Used primarily for debug-ging purposes.

Usage

h2o.clearLog()

See Also

h2o.startLogging, h2o.stopLogging, h2o.openLog

Examples

library(h2o)h2o.init()h2o.startLogging()ausPath = system.file("extdata", "australia.csv", package="h2o")australia.hex = h2o.importFile(path = ausPath)h2o.stopLogging()h2o.clearLog()

h2o.clusterInfo Print H2O cluster info

Description

Print H2O cluster info

Usage

h2o.clusterInfo()

20 h2o.clusterStatus

h2o.clusterIsUp Determine if an H2O cluster is up or not

Description

Determine if an H2O cluster is up or not

Usage

h2o.clusterIsUp(conn = h2o.getConnection())

Arguments

conn H2OConnection object

Value

TRUE if the cluster is up; FALSE otherwise

h2o.clusterStatus Return the status of the cluster

Description

Retrieve information on the status of the cluster running H2O.

Usage

h2o.clusterStatus()

See Also

H2OConnection, h2o.init

Examples

h2o.init()h2o.clusterStatus()

h2o.cluster_sizes 21

h2o.cluster_sizes Retrieve the cluster sizes If "train", "valid", and "xval" parameters areFALSE (default), then the training cluster sizes value is returned. Ifmore than one parameter is set to TRUE, then a named list of clus-ter size vectors are returned, where the names are "train", "valid" or"xval".

Description

Retrieve the cluster sizes If "train", "valid", and "xval" parameters are FALSE (default), then thetraining cluster sizes value is returned. If more than one parameter is set to TRUE, then a namedlist of cluster size vectors are returned, where the names are "train", "valid" or "xval".

Usage

h2o.cluster_sizes(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OClusteringModel object.

train Retrieve the training cluster sizes

valid Retrieve the validation cluster sizes

xval Retrieve the cross-validation cluster sizes

h2o.coef Retrieve the model coefficeints

Description

Retrieve the model coefficeints

Usage

h2o.coef(object)

Arguments

object an H2OModel object.

22 h2o.confusionMatrix

h2o.coef_norm Retrieve the normalized coefficients

Description

Retrieve the normalized coefficients

Usage

h2o.coef_norm(object)

Arguments

object an H2OModel object.

h2o.confusionMatrix Access H2O Confusion Matrices

Description

Retrieve either a single or many confusion matrices from H2O objects.

Usage

h2o.confusionMatrix(object, ...)

## S4 method for signature H2OModelh2o.confusionMatrix(object, newdata, valid = FALSE, ...)

## S4 method for signature H2OModelMetricsh2o.confusionMatrix(object, thresholds = NULL,

metrics = NULL)

Arguments

object Either an H2OModel object or an H2OModelMetrics object.

... Extra arguments for extracting train or valid confusion matrices.

newdata An H2OFrame object that can be scored on. Requires a valid response column.

valid Retrieve the validation metric.

thresholds (Optional) A value or a list of valid values between 0.0 and 1.0. This value isonly used in the case of H2OBinomialMetrics objects.

metrics (Optional) A metric or a list of valid metrics ("min_per_class_accuracy", "abso-lute_MCC", "tnr", "fnr", "fpr", "tpr", "precision", "accuracy", "f0point5", "f2","f1"). This value is only used in the case of H2OBinomialMetrics objects.

h2o.createFrame 23

Details

The H2OModelMetrics version of this function will only take H2OBinomialMetrics or H2OMultinomialMetricsobjects. If no threshold is specified, all possible thresholds are selected.

Value

Calling this function on H2OModel objects returns a confusion matrix corresponding to the predictfunction. If used on an H2OBinomialMetrics object, returns a list of matrices corresponding to thenumber of thresholds specified.

See Also

predict for generating prediction frames, h2o.performance for creating H2OModelMetrics.

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)hex[,2] <- as.factor(hex[,2])model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli")h2o.confusionMatrix(model, hex)# Generating a ModelMetrics objectperf <- h2o.performance(model, hex)h2o.confusionMatrix(perf)

h2o.createFrame Data H2OFrame Creation in H2O

Description

Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified bythe user.

Usage

h2o.createFrame(rows = 10000, cols = 10, randomize = TRUE, value = 0,real_range = 100, categorical_fraction = 0.2, factors = 100,integer_fraction = 0.2, integer_range = 100, binary_fraction = 0.1,binary_ones_fraction = 0.02, missing_fraction = 0.01,response_factors = 2, has_response = FALSE, seed)

24 h2o.createFrame

Arguments

rows The number of rows of data to generate.

cols The number of columns of data to generate. Excludes the response column ifhas_response = TRUE.

randomize A logical value indicating whether data values should be randomly generated.This must be TRUE if either categorical_fraction or integer_fraction isnon-zero.

value If randomize = FALSE, then all real-valued entries will be set to this value.

real_range The range of randomly generated real values.categorical_fraction

The fraction of total columns that are categorical.

factors The number of (unique) factor levels in each categorical column.integer_fraction

The fraction of total columns that are integer-valued.

integer_range The range of randomly generated integer values.binary_fraction

The fraction of total columns that are binary-valued.binary_ones_fraction

The fraction of values in a binary column that are set to 1.missing_fraction

The fraction of total entries in the data frame that are set to NA.response_factors

If has_response = TRUE, then this is the number of factor levels in the responsecolumn.

has_response A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columnswill be cols+1.

seed A seed used to generate random values when randomize = TRUE.

Value

Returns an H2OFrame object.

Examples

library(h2o)h2o.init()hex <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1,

factors = 5, integer_fraction = 0.5, integer_range = 1,has_response = TRUE)

head(hex)summary(hex)

hex2 <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5,categorical_fraction = 0, integer_fraction = 0)

h2o.cut 25

summary(hex2)

h2o.cut Cut H2O Numeric Data to Factor

Description

Divides the range of the H2O data into intervals and codes the values according to which intervalthey fall in. The leftmost interval corresponds to the level one, the next is level two, etc.

Usage

h2o.cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE,dig.lab = 3, ...)

## S3 method for class H2OFramecut(x, breaks, labels = NULL, include.lowest = FALSE,right = TRUE, dig.lab = 3, ...)

Arguments

x An H2OFrame object with a single numeric column.

breaks A numeric vector of two or more unique cut points.

labels Labels for the levels of the resulting category. By default, labels are constructedsing "(a,b]" interval notation.

include.lowest Logical, indicationg if an ’x[i]’ equal to the lowest (or highest, for right =FALSE ’breaks’ value should be included

right /codeLogical, indicating if the intervals should be closed on the right (openedon the left) or vice versa.

dig.lab Integer which is used when labels are not given, determines the number of digitsused in formatting the break numbers.

... Further arguments passed to or from other methods.

Value

Returns an H2OFrame object containing the factored data with intervals as levels.

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o")iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex")summary(iris.hex)

26 h2o.dayOfWeek

# Cut sepal length column into intervals determined by min/max/quantilessepal_len.cut = cut(iris.hex$sepal_len, c(4.2, 4.8, 5.8, 6, 8))head(sepal_len.cut)summary(sepal_len.cut)

h2o.day Convert Milliseconds to Day of Month in H2O Datasets

Description

Converts the entries of an H2OFrame object from milliseconds to days of the month (on a 1 to 31scale).

Usage

h2o.day(x)

day(x)

## S3 method for class H2OFrameday(x)

Arguments

x An H2OFrame object.

Value

An H2OFrame object containing the entries of x converted to days of the month.

See Also

h2o.month

h2o.dayOfWeek Convert Milliseconds to Day of Week in H2O Datasets

Description

Converts the entries of an H2OFrame object from milliseconds to days of the week (on a 0 to 6scale).

h2o.dct 27

Usage

h2o.dayOfWeek(x)

dayOfWeek(x)

## S3 method for class H2OFramedayOfWeek(x)

Arguments

x An H2OFrame object.

Value

An H2OFrame object containing the entries of x converted to days of the week.

See Also

h2o.day, h2o.month

h2o.dct Compute DCT of an H2OFrame

Description

Compute the Discrete Cosine Transform of every row in the H2OFrame

Usage

h2o.dct(data, destination_frame, dimensions, inverse = FALSE)

Arguments

data An H2OFrame object representing the dataset to transform

destination_frame

A frame ID for the result

dimensions An array containing the 3 integer values for height, width, depth of each sample.The product of HxWxD must total up to less than the number of columns. For1D, use c(L,1,1), for 2D, use C(N,M,1).

inverse Whether to perform the inverse transform

28 h2o.ddply

Examples

library(h2o)h2o.init()df <- h2o.createFrame(rows = 1000, cols = 8*16*24,

categorical_fraction = 0, integer_fraction = 0, missing_fraction = 0)df1 <- h2o.dct(data=df, dimensions=c(8*16*24,1,1))df2 <- h2o.dct(data=df1,dimensions=c(8*16*24,1,1),inverse=TRUE)max(abs(df1-df2))

df1 <- h2o.dct(data=df, dimensions=c(8*16,24,1))df2 <- h2o.dct(data=df1,dimensions=c(8*16,24,1),inverse=TRUE)max(abs(df1-df2))

df1 <- h2o.dct(data=df, dimensions=c(8,16,24))df2 <- h2o.dct(data=df1,dimensions=c(8,16,24),inverse=TRUE)max(abs(df1-df2))

h2o.ddply Split H2O Dataset, Apply Function, and Return Results

Description

For each subset of an H2O data set, apply a user-specified function, then combine the results. Thisis an experimental feature.

Usage

h2o.ddply(X, .variables, FUN, ..., .progress = "none")

Arguments

X An H2OFrame object to be processed.

.variables Variables to split X by, either the indices or names of a set of columns.

FUN Function to apply to each subset grouping.

... Additional arguments passed on to FUN.

.progress Name of the progress bar to use. #TODO: (Currently unimplemented)

Value

Returns an H2OFrame object containing the results from the split/apply operation, arranged

See Also

ddply for the plyr library implementation.

h2o.deepfeatures 29

Examples

library(h2o)h2o.init()

# Import iris dataset to H2OirisPath <- system.file("extdata", "iris_wheader.csv", package = "h2o")iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex")# Add function taking mean of sepal_len columnfun = function(df) { sum(df[,1], na.rm = TRUE)/nrow(df) }# Apply function to groups by class of flower# uses h2os ddply, since iris.hex is an H2OFrame objectres = h2o.ddply(iris.hex, "class", fun)head(res)

h2o.deepfeatures Feature Generation via H2O Deep Learning Model

Description

Extract the non-linear feature from an H2O data set using an H2O deep learning model.

Usage

h2o.deepfeatures(object, data, layer = 1)

Arguments

object An H2OModel object that represents the deep learning model to be used forfeature extraction.

data An H2OFrame object.

layer Index of the hidden layer to extract.

Value

Returns an H2OFrame object with as many features as the number of units in the hidden layer ofthe specified index.

See Also

link{h2o.deeplearning} for making deep learning models.

30 h2o.deeplearning

Examples

library(h2o)h2o.init()prosPath = system.file("extdata", "prostate.csv", package = "h2o")prostate.hex = h2o.importFile(path = prosPath)prostate.dl = h2o.deeplearning(x = 3:9, y = 2, training_frame = prostate.hex,

hidden = c(100, 200), epochs = 5)prostate.deepfeatures_layer1 = h2o.deepfeatures(prostate.dl, prostate.hex, layer = 1)prostate.deepfeatures_layer2 = h2o.deepfeatures(prostate.dl, prostate.hex, layer = 2)head(prostate.deepfeatures_layer1)head(prostate.deepfeatures_layer2)

h2o.deeplearning Build a Deep Neural Network

Description

Builds a feed-forward multilayer artificial neural network on an H2OFrame

Usage

h2o.deeplearning(x, y, training_frame, model_id = "",overwrite_with_best_model, validation_frame = NULL, checkpoint,autoencoder = FALSE, use_all_factor_levels = TRUE, standardize = TRUE,activation = c("Rectifier", "Tanh", "TanhWithDropout","RectifierWithDropout", "Maxout", "MaxoutWithDropout"), hidden = c(200,200), epochs = 10, train_samples_per_iteration = -2,target_ratio_comm_to_comp = 0.05, seed, adaptive_rate = TRUE,rho = 0.99, epsilon = 1e-08, rate = 0.005, rate_annealing = 1e-06,rate_decay = 1, momentum_start = 0, momentum_ramp = 1e+06,momentum_stable = 0, nesterov_accelerated_gradient = TRUE,input_dropout_ratio = 0, hidden_dropout_ratios, l1 = 0, l2 = 0,max_w2 = Inf, initial_weight_distribution = c("UniformAdaptive","Uniform", "Normal"), initial_weight_scale = 1, loss = c("Automatic","CrossEntropy", "Quadratic", "Absolute", "Huber"), distribution = c("AUTO","gaussian", "bernoulli", "multinomial", "poisson", "gamma", "tweedie","laplace", "huber", "quantile"), quantile_alpha = 0.5,tweedie_power = 1.5, score_interval = 5, score_training_samples,score_validation_samples, score_duty_cycle, classification_stop,regression_stop, stopping_rounds = 5, stopping_metric = c("AUTO","deviance", "logloss", "MSE", "AUC", "r2", "misclassification"),stopping_tolerance = 0, max_runtime_secs = 0, quiet_mode,max_confusion_matrix_size, max_hit_ratio_k, balance_classes = FALSE,class_sampling_factors, max_after_balance_size, score_validation_sampling,missing_values_handling = c("MeanImputation", "Skip"), diagnostics,

h2o.deeplearning 31

variable_importances, fast_mode, ignore_const_cols, force_load_balance,replicate_training_data, single_node_mode, shuffle_training_data, sparse,col_major, average_activation, sparsity_beta, max_categorical_features,reproducible = FALSE, export_weights_and_biases = FALSE,offset_column = NULL, weights_column = NULL, nfolds = 0,fold_column = NULL, fold_assignment = c("AUTO", "Random", "Modulo"),keep_cross_validation_predictions = FALSE)

Arguments

x A vector containing the character names of the predictors in the model.

y The name of the response variable in the model.

training_frame An H2OFrame object containing the variables in the model.

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

overwrite_with_best_model

Logical. If TRUE, overwrite the final model with the best model found duringtraining. Defaults to TRUE.

validation_frame

An H2OFrame object indicating the validation dataset used to construct the con-fusion matrix. Defaults to NULL. If left as NULL, this defaults to the trainingdata when nfolds = 0.

checkpoint Model checkpoint (either key or H2ODeepLearningModel) to resume trainingwith.

autoencoder Enable auto-encoder for model building.use_all_factor_levels

Logical. Use all factor levels of categorical variance. Otherwise the first factorlevel is omitted (without loss of accuracy). Useful for variable importances andauto-enabled for autoencoder.

standardize Logical. If enabled, automatically standardize the data. If disabled, the usermust provide properly scaled input data.

activation A string indicating the activation function to use. Must be either "Tanh", "Tan-hWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", or "MaxoutWith-Dropout"

hidden Hidden layer sizes (e.g. c(100,100)).

epochs How many times the dataset should be iterated (streamed), can be fractional.train_samples_per_iteration

Number of training samples (globally) per MapReduce iteration. Special valuesare: 0 one epoch; -1 all available data (e.g., replicated training data); or -2 auto-tuning (default)

target_ratio_comm_to_comp

Target ratio of communication overhead to computation. Only for multi-nodeoperation and train_samples_per_iteration=-2 (auto-tuning). Higher values canlead to faster convergence.

32 h2o.deeplearning

seed Seed for random numbers (affects sampling) - Note: only reproducible whenrunning single threaded

adaptive_rate Logical. Adaptive learning rate (ADAELTA).

rho Adaptive learning rate time decay factor (similarity to prior updates).

epsilon Adaptive learning rate parameter, similar to learn rate annealing during initialtraining phase. Typical values are between 1.0e-10 and 1.0e-4

rate Learning rate (higher => less stable, lower => slower convergence).

rate_annealing Learning rate annealing: (rate)/(1 + rateannealing ∗ samples)rate_decay Learning rate decay factor between layers (N-th layer: rate ∗ α(N − 1))

momentum_start Initial momentum at the beginning of training (try 0.5).

momentum_ramp Number of training samples for which momentum increases.momentum_stable

Final momentum after the amp is over (try 0.99).nesterov_accelerated_gradient

Logical. Use Nesterov accelerated gradient (recommended).input_dropout_ratio

A fraction of the features for each training row to be omitted from training inorder to improve generalization (dimension sampling).

hidden_dropout_ratios

Input layer dropout ratio (can improve generalization) specify one value perhidden layer, defaults to 0.5.

l1 L1 regularization (can add stability and improve generalization, causes manyweights to become 0).

l2 L2 regularization (can add stability and improve generalization, causes manyweights to be small).

max_w2 Constraint for squared sum of incoming weights per unit (e.g. Rectifier).initial_weight_distribution

Can be "Uniform", "UniformAdaptive", or "Normal".initial_weight_scale

Uniform: -value ... value, Normal: stddev

loss Loss function: "Automatic", "CrossEntropy" (for classification only), "Quadratic","Absolute" (experimental) or "Huber" (experimental)

distribution A character string. The distribution function of the response. Must be "AUTO","bernoulli", "multinomial", "poisson", "gamma", "tweedie", "laplace", "huber","quantile" or "gaussian"

quantile_alpha Quantile (only for Quantile regression, must be between 0 and 1)

tweedie_power Tweedie power (only for Tweedie distribution, must be between 1 and 2).

score_interval Shortest time interval (in secs) between model scoring.score_training_samples

Number of training set samples for scoring (0 for all).score_validation_samples

Number of validation set samples for scoring (0 for all).

h2o.deeplearning 33

score_duty_cycle

Maximum duty cycle fraction for scoring (lower: more training, higher: morescoring).

classification_stop

Stopping criterion for classification error fraction on training data (-1 to disable).regression_stop

Stopping criterion for regression error (MSE) on training data (-1 to disable).stopping_rounds

Early stopping based on convergence of stopping_metric. Stop if simple movingaverage of length k of the stopping_metric does not improve (by stopping_tolerance)for k=stopping_rounds scoring events. Can only trigger after at least 2k scoringevents. Use 0 to disable.

stopping_metric

Metric to use for convergence checking, only for _stopping_rounds > 0 Canbe one of "AUTO", "deviance", "logloss", "MSE", "AUC", "r2", "misclassifica-tion".

stopping_tolerance

Relative tolerance for metric-based stopping criterion (if relative improvementis not at least this much, stop).

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

quiet_mode Enable quiet mode for less output to standard output.max_confusion_matrix_size

Max. size (number of classes) for confusion matrices to be shownmax_hit_ratio_k

Max number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable).

balance_classes

Balance training data class counts via over/under-sampling (for imbalanced data).class_sampling_factors

Desired over/under-sampling ratios per class (in lexicographic order). If notspecified, sampling factors will be automatically computed to obtain class bal-ance during training. Requires balance_classes.

max_after_balance_size

Maximum relative size of the training data after balancing class counts (can beless than 1.0).

score_validation_sampling

Method used to sample validation dataset for scoring.missing_values_handling

Handling of missing values. Either MeanImputation (default) or Skip.

diagnostics Enable diagnostics for hidden layers.variable_importances

Compute variable importances for input features (Gedeon method) - can be slowfor large networks.

fast_mode Enable fast mode (minor approximations in back-propagation).

34 h2o.deeplearning

ignore_const_cols

Ignore constant columns (no information can be gained anyway).

force_load_balance

Force extra load balancing to increase training speed for small datasets (to keepall cores busy).

replicate_training_data

Replicate the entire training dataset onto every node for faster training.

single_node_mode

Run on a single node for fine-tuning of model parameters.

shuffle_training_data

Enable shuffling of training data (recommended if training data is replicated andtrain_samples_per_iteration is close to numRows ∗ numNodes.

sparse Sparse data handling (more efficient for data with lots of 0 values).

col_major Use a column major weight matrix for input layer. Can speed up forward prop-agation, but might slow down backpropagation (Experimental).

average_activation

Average activation for sparse auto-encoder (Experimental).

sparsity_beta Sparsity regularization (Experimental).

max_categorical_features

Max. number of categorical features, enforced via hashing Experimental).

reproducible Force reproducibility on small data (requires setting the seed argument and thiswill be slow - only uses 1 thread).

export_weights_and_biases

Whether to export Neural Network weights and biases to H2O. Frames"

offset_column Specify the offset column.

weights_column Specify the weights column.

nfolds (Optional) Number of folds for cross-validation. If nfolds >= 2, then validationmust remain empty.

fold_column (Optional) Column with cross-validation fold index assignment per observation.

fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified. Mustbe "AUTO", "Random" or "Modulo".

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation models.

... extra parameters to pass onto functions (not implemented)

See Also

predict.H2OModel for prediction.

h2o.describe 35

Examples

library(h2o)h2o.init()iris.hex <- as.h2o(iris)iris.dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris.hex)

# now make a predictionpredictions <- h2o.predict(iris.dl, iris.hex)

h2o.describe H2O Description of A Dataset

Description

Reports the "Flow" style summary rollups on an instance of H2OFrame. Includes information aboutcolumn types, mins/maxs/missing/zero counts/stds/number of levels

Usage

h2o.describe(frame)

Arguments

frame An H2OFrame object.

Value

A table with the Frame stats.

Examples

library(h2o)h2o.init()prosPath = system.file("extdata", "prostate.csv", package="h2o")prostate.hex = h2o.importFile(path = prosPath)h2o.describe(prostate.hex)

36 h2o.downloadCSV

h2o.downloadAllLogs Download H2O Log Files to Disk

Description

h2o.downloadAllLogs downloads all H2O log files to local disk. Generally used for debuggingpurposes.

Usage

h2o.downloadAllLogs(dirname = ".", filename = NULL)

Arguments

dirname (Optional) A character string indicating the directory that the log file should besaved in.

filename (Optional) A character string indicating the name that the log file should besaved to.

h2o.downloadCSV Download H2O Data to Disk

Description

Download an H2O data set to a CSV file on the local disk

Usage

h2o.downloadCSV(data, filename)

Arguments

data an H2OFrame object to be downloaded.

filename A string indicating the name that the CSV file should be should be saved to.

Warning

Files located on the H2O server may be very large! Make sure you have enough hard drive space toaccomodate the entire file.

h2o.download_pojo 37

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package = "h2o")iris.hex <- h2o.uploadFile(path = irisPath)

myFile <- paste(getwd(), "my_iris_file.csv", sep = .Platform$file.sep)h2o.downloadCSV(iris.hex, myFile)file.info(myFile)file.remove(myFile)

h2o.download_pojo Download the Scoring POJO (Plain Old Java Object) of an H2OModel

Description

Download the Scoring POJO (Plain Old Java Object) of an H2O Model

Usage

h2o.download_pojo(model, path = "", getjar = TRUE)

Arguments

model An H2OModel

path The path to the directory to store the POJO (no trailing slash). If "", then printto to console. The file name will be a compilable java file name.

getjar Whether to also download the h2o-genmodel.jar file needed to compile the POJO

Value

If path is "", then pretty print the POJO to the console. Otherwise save it to the specified directory.

Examples

library(h2o)h <- h2o.init(nthreads=-1)fr <- as.h2o(iris)my_model <- h2o.gbm(x=1:4, y=5, training_frame=fr)

h2o.download_pojo(my_model) # print the model to screen# h2o.download_pojo(my_model, getwd()) # save the POJO and jar file to the current working# directory, NOT RUN# h2o.download_pojo(my_model, getwd(), getjar = FALSE ) # save only the POJO to the current

38 h2o.exportFile

# working directory, NOT RUNh2o.download_pojo(my_model, getwd()) # save to the current working directory

h2o.exportFile Export an H2O Data Frame (H2OFrame) to a File

Description

Exports an H2OFrame (which can be either VA or FV) to a file. This file may be on the H2Oinstace’s local filesystem, or to HDFS (preface the path with hdfs://) or to S3N (preface the pathwith s3n://).

Usage

h2o.exportFile(data, path, force = FALSE)

Arguments

data An H2OFrame object.

path The path to write the file to. Must include the directory and filename. May beprefaced with hdfs:// or s3n://. Each row of data appears as line of the file.

force logical, indicates how to deal with files that already exist.

Details

In the case of existing files forse = TRUE will overwrite the file. Otherwise, the operation will fail.

Examples

## Not run:library(h2o)h2o.init()irisPath <- system.file("extdata", "iris.csv", package = "h2o")iris.hex <- h2o.uploadFile(path = irisPath)

# These arent real paths# h2o.exportFile(iris.hex, path = "/path/on/h2o/server/filesystem/iris.csv")# h2o.exportFile(iris.hex, path = "hdfs://path/in/hdfs/iris.csv")# h2o.exportFile(iris.hex, path = "s3n://path/in/s3/iris.csv")

## End(Not run)

h2o.exportHDFS 39

h2o.exportHDFS Export a Model to HDFS

Description

Exports an H2OModel to HDFS.

Usage

h2o.exportHDFS(object, path, force = FALSE)

Arguments

object an H2OModel class object.

path The path to write the model to. Must include the driectory and filename.

force logical, indicates how to deal with files that already exist.

h2o.filterNACols Filter NA Columns

Description

Filter NA Columns

Usage

h2o.filterNACols(data, frac = 0.2)

Arguments

data A dataset to filter on.

frac The threshold of NAs to allow per column (columns >= this threshold are fil-tered)

40 h2o.find_threshold_by_max_metric

h2o.find_row_by_threshold

Find the threshold, give the max metric. No duplicate thresholds al-lowed

Description

Find the threshold, give the max metric. No duplicate thresholds allowed

Usage

h2o.find_row_by_threshold(object, threshold)

Arguments

object H2OBinomialMetrics

threshold number between 0 and 1

h2o.find_threshold_by_max_metric

Find the threshold, give the max metric

Description

Find the threshold, give the max metric

Usage

h2o.find_threshold_by_max_metric(object, metric)

Arguments

object H2OBinomialMetrics

metric "F1," for example

h2o.gainsLift 41

h2o.gainsLift Access H2O Gains/Lift Tables

Description

Retrieve either a single or many Gains/Lift tables from H2O objects.

Usage

h2o.gainsLift(object, ...)

## S4 method for signature H2OModelh2o.gainsLift(object, newdata, valid = FALSE,

xval = FALSE, ...)

## S4 method for signature H2OModelMetricsh2o.gainsLift(object)

Arguments

object Either an H2OModel object or an H2OModelMetrics object.

newdata An H2OFrame object that can be scored on. Requires a valid response column.

valid Retrieve the validation metric.

xval Retrieve the cross-validation metric.

... further arguments to be passed to/from this method.

Details

The H2OModelMetrics version of this function will only take H2OBinomialMetrics objects.

Value

Calling this function on H2OModel objects returns a Gains/Lift table corresponding to the predictfunction.

See Also

predict for generating prediction frames, h2o.performance for creating H2OModelMetrics.

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)hex[,2] <- as.factor(hex[,2])

42 h2o.gbm

model <- h2o.gbm(x = 3:9, y = 2, distribution = "bernoulli",training_frame = hex, validation_frame = hex, nfolds=3)

h2o.gainsLift(model) ## extract training metricsh2o.gainsLift(model, valid=TRUE) ## extract validation metrics (here: the same)h2o.gainsLift(model, xval =TRUE) ## extract cross-validation metricsh2o.gainsLift(model, newdata=hex) ## score on new data (here: the same)# Generating a ModelMetrics objectperf <- h2o.performance(model, hex)h2o.gainsLift(perf) ## extract from existing metrics object

h2o.gbm Gradient Boosted Machines

Description

Builds gradient boosted classification trees, and gradient boosted regression trees on a parsed dataset.

Usage

h2o.gbm(x, y, training_frame, model_id, checkpoint, ignore_const_cols = TRUE,distribution = c("AUTO", "gaussian", "bernoulli", "multinomial", "poisson","gamma", "tweedie", "laplace", "quantile"), quantile_alpha = 0.5,tweedie_power = 1.5, ntrees = 50, max_depth = 5, min_rows = 10,learn_rate = 0.1, sample_rate = 1, col_sample_rate = 1,col_sample_rate_per_tree = 1, nbins = 20, nbins_top_level,nbins_cats = 1024, validation_frame = NULL, balance_classes = FALSE,max_after_balance_size = 1, seed, build_tree_one_node = FALSE,nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random","Modulo"), keep_cross_validation_predictions = FALSE,score_each_iteration = FALSE, score_tree_interval = 0,stopping_rounds = 0, stopping_metric = c("AUTO", "deviance", "logloss","MSE", "AUC", "r2", "misclassification"), stopping_tolerance = 0.001,max_runtime_secs = 0, offset_column = NULL, weights_column = NULL)

Arguments

x A vector containing the names or indices of the predictor variables to use inbuilding the GBM model.

y The name or index of the response variable. If the data does not contain a header,this is the column index number starting at 0, and increasing from left to right.(The response must be either an integer or a categorical variable).

training_frame An H2OFrame object containing the variables in the model.

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

h2o.gbm 43

checkpoint "Model checkpoint (either key or H2ODeepLearningModel) to resume trainingwith."

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

distribution A character string. The distribution function of the response. Must be "AUTO","bernoulli", "multinomial", "poisson", "gamma", "tweedie", "laplace", "quan-tile" or "gaussian"

quantile_alpha Quantile (only for Quantile regression, must be between 0 and 1)

tweedie_power Tweedie power (only for Tweedie distribution, must be between 1 and 2)

ntrees A nonnegative integer that determines the number of trees to grow.

max_depth Maximum depth to grow the tree.

min_rows Minimum number of rows to assign to teminal nodes.

learn_rate Learning rate (from 0.0 to 1.0)

sample_rate Row sample rate (from 0.0 to 1.0)col_sample_rate

Column sample rate (from 0.0 to 1.0)col_sample_rate_per_tree

Column sample rate per tree (from 0.0 to 1.0)

nbins For numerical columns (real/int), build a histogram of (at least) this many bins,then split at the best point.

nbins_top_level

For numerical columns (real/int), build a histogram of (at most) this many binsat the root level, then decrease by factor of two per level.

nbins_cats For categorical columns (factors), build a histogram of this many bins, then splitat the best point. Higher values can lead to more overfitting.

validation_frame

An H2OFrame object indicating the validation dataset used to contruct the con-fusion matrix. Defaults to NULL. If left as NULL, this defaults to the trainingdata when nfolds = 0.

balance_classes

logical, indicates whether or not to balance training data class counts via over/under-sampling (for imbalanced data).

max_after_balance_size

Maximum relative size of the training data after balancing class counts (canbe less than 1.0). Ignored if balance_classes is FALSE, which is the defaultbehavior.

seed Seed for random numbers (affects sampling).build_tree_one_node

Run on one node only; no network overhead but fewer cpus used. Suitable forsmall datasets.

nfolds (Optional) Number of folds for cross-validation. If nfolds >= 2, then validationmust remain empty.

44 h2o.gbm

fold_column (Optional) Column with cross-validation fold index assignment per observationfold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified Mustbe "AUTO", "Random" or "Modulo".

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation modelsscore_each_iteration

Attempts to score each tree.score_tree_interval

Score the model after every so many trees. Disabled if set to 0.stopping_rounds

Early stopping based on convergence of stopping_metric. Stop if simple movingaverage of length k of the stopping_metric does not improve (by stopping_tolerance)for k=stopping_rounds scoring events. Can only trigger after at least 2k scoringevents. Use 0 to disable.

stopping_metric

Metric to use for convergence checking, only for _stopping_rounds > 0 Canbe one of "AUTO", "deviance", "logloss", "MSE", "AUC", "r2", "misclassifica-tion".

stopping_tolerance

Relative tolerance for metric-based stopping criterion (if relative improvementis not at least this much, stop)

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

offset_column Specify the offset column.

weights_column Specify the weights column.

Details

The default distribution function will guess the model type based on the response column type.In order to run properly, the response column must be an numeric for "gaussian" or an enum for"bernoulli" or "multinomial".

See Also

predict.H2OModel for prediction.

Examples

library(h2o)h2o.init()

# Run regression GBM on australia.hex dataausPath <- system.file("extdata", "australia.csv", package="h2o")australia.hex <- h2o.uploadFile(path = ausPath)independent <- c("premax", "salmax","minairtemp", "maxairtemp", "maxsst",

"maxsoilmoist", "Max_czcs")

h2o.getConnection 45

dependent <- "runoffnew"h2o.gbm(y = dependent, x = independent, training_frame = australia.hex,

ntrees = 3, max_depth = 3, min_rows = 2)

h2o.getConnection Retrieve an H2O Connection

Description

Attempt to recover an h2o connection.

Usage

h2o.getConnection()

Value

Returns an H2OConnection object.

h2o.getFrame Get an R Reference to an H2O Dataset, that will NOT be GC’d bydefault

Description

Get the reference to a frame with the given id in the H2O instance.

Usage

h2o.getFrame(id)

Arguments

id A string indicating the unique frame of the dataset to retrieve.

46 h2o.getGrid

h2o.getFutureModel Get future model

Description

Get future model

Usage

h2o.getFutureModel(object)

Arguments

object H2OModel

h2o.getGrid Get a grid object from H2O distributed K/V store.

Description

Get a grid object from H2O distributed K/V store.

Usage

h2o.getGrid(grid_id, sort_by, decreasing)

Arguments

grid_id ID of existing grid object to fetch

sort_by Sort the models in the grid space by a metric. Choices are "logloss", "resid-ual_deviance", "mse", "auc", "r2", "accuracy", "precision", "recall", "f1", etc.

decreasing Specify whether sort order should be decreasing

Examples

library(h2o)library(jsonlite)h2o.init()iris.hex <- as.h2o(iris)h2o.grid("gbm", grid_id = "gbm_grid_id", x = c(1:4), y = 5,

training_frame = iris.hex, hyper_params = list(ntrees = c(1,2,3)))grid <- h2o.getGrid("gbm_grid_id")# Get grid summarysummary(grid)# Fetch grid models

h2o.getId 47

model_ids <- grid@model_idsmodels <- lapply(model_ids, function(id) { h2o.getModel(id)})

h2o.getId Get back-end distributed key/value store id from an H2OFrame.

Description

Get back-end distributed key/value store id from an H2OFrame.

Usage

h2o.getId(x)

Arguments

x An H2OFrame

Value

The id

h2o.getModel Get an R reference to an H2O model

Description

Returns a reference to an existing model in the H2O instance.

Usage

h2o.getModel(model_id)

Arguments

model_id A string indicating the unique model_id of the model to retrieve.

Value

Returns an object that is a subclass of H2OModel.

48 h2o.getTypes

Examples

library(h2o)h2o.init()

iris.hex <- as.h2o(iris, "iris.hex")model_id <- h2o.gbm(x = 1:4, y = 5, training_frame = iris.hex)@model_idmodel.retrieved <- h2o.getModel(model_id)

h2o.getTimezone Get the Time Zone on the H2O Cloud Returns a string

Description

Get the Time Zone on the H2O Cloud Returns a string

Usage

h2o.getTimezone()

h2o.getTypes Get the types-per-column

Description

Get the types-per-column

Usage

h2o.getTypes(x)

Arguments

x An H2OFrame

Value

A list of types

h2o.getVersion 49

h2o.getVersion Get h2o version

Description

Get h2o version

Usage

h2o.getVersion()

h2o.giniCoef Retrieve the GINI Coefficcient

Description

Retrieves the GINI coefficient from an H2OBinomialMetrics. If "train", "valid", and "xval" param-eters are FALSE (default), then the training GINIvalue is returned. If more than one parameter isset to TRUE, then a named vector of GINIs are returned, where the names are "train", "valid" or"xval".

Usage

h2o.giniCoef(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object an H2OBinomialMetrics object.

train Retrieve the training GINI Coefficcient

valid Retrieve the validation GINI Coefficcient

xval Retrieve the cross-validation GINI Coefficcient

See Also

h2o.auc for AUC, h2o.giniCoef for the GINI coefficient, and h2o.metric for the various. Seeh2o.performance for creating H2OModelMetrics objects. threshold metrics.

50 h2o.glm

Examples

library(h2o)h2o.init()

prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)

hex[,2] <- as.factor(hex[,2])model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli")perf <- h2o.performance(model, hex)h2o.giniCoef(perf)

h2o.glm H2O Generalized Linear Models

Description

Fit a generalized linear model, specified by a response variable, a set of predictors, and a descriptionof the error distribution.

Usage

h2o.glm(x, y, training_frame, model_id, validation_frame = NULL,ignore_const_cols = TRUE, max_iterations = 50, beta_epsilon = 0,solver = c("IRLSM", "L_BFGS"), standardize = TRUE,family = c("gaussian", "binomial", "poisson", "gamma", "tweedie","multinomial"), link = c("family_default", "identity", "logit", "log","inverse", "tweedie"), tweedie_variance_power = NaN,tweedie_link_power = NaN, alpha = 0.5, prior = NULL, lambda = 1e-05,lambda_search = FALSE, nlambdas = -1, lambda_min_ratio = -1,nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random","Modulo"), keep_cross_validation_predictions = FALSE,beta_constraints = NULL, offset_column = NULL, weights_column = NULL,intercept = TRUE, max_active_predictors = -1, objective_epsilon = -1,gradient_epsilon = -1, non_negative = FALSE, compute_p_values = FALSE,remove_collinear_columns = FALSE, max_runtime_secs = 0,missing_values_handling = c("MeanImputation", "Skip"))

Arguments

x A vector containing the names or indices of the predictor variables to use inbuilding the GLM model.

y A character string or index that represent the response variable in the model.

training_frame An H2OFrame object containing the variables in the model.

h2o.glm 51

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

validation_frame

An H2OFrame object containing the variables in the model. Defaults to NULL.ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

max_iterations A non-negative integer specifying the maximum number of iterations.

beta_epsilon A non-negative number specifying the magnitude of the maximum differencebetween the coefficient estimates from successive iterations. Defines the con-vergence criterion for h2o.glm.

solver A character string specifying the solver used: IRLSM (supports more features),L_BFGS (scales better for datasets with many columns)

standardize A logical value indicating whether the numeric predictors should be standard-ized to have a mean of 0 and a variance of 1 prior to training the models.

family A character string specifying the distribution of the model: gaussian, binomial,poisson, gamma, tweedie.

link A character string specifying the link function. The default is the canonical linkfor the family. The supported links for each of the family specifications are:"gaussian": "identity", "log", "inverse""binomial": "logit", "log""poisson": "log", "identity""gamma": "inverse", "log", "identity""tweedie": "tweedie"

tweedie_variance_power

A numeric specifying the power for the variance function when family = "tweedie".tweedie_link_power

A numeric specifying the power for the link function when family = "tweedie".

alpha A numeric in [0, 1] specifying the elastic-net mixing parameter. The elastic-netpenalty is defined to be:

P (α, β) = (1− α)/2||β||22 + α||β||1 =∑j

[(1− α)/2β2j + α|βj |]

making alpha = 1 the lasso penalty and alpha = 0 the ridge penalty.

prior (Optional) A numeric specifying the prior probability of class 1 in the responsewhen family = "binomial". The default prior is the observational frequencyof class 1. Must be from (0,1) exclusive range or NULL (no prior).

lambda A non-negative shrinkage parameter for the elastic-net, which multipliesP (α, β)in the objective function. When lambda = 0, no elastic-net penalty is appliedand ordinary generalized linear models are fit.

lambda_search A logical value indicating whether to conduct a search over the space of lambdavalues starting from the lambda max, given lambda is interpreted as lambda min.

nlambdas The number of lambda values to use when lambda_search = TRUE.

52 h2o.glm

lambda_min_ratio

Smallest value for lambda as a fraction of lambda.max. By default if the numberof observations is greater than the the number of variables then lambda_min_ratio= 0.0001; if the number of observations is less than the number of variables thenlambda_min_ratio = 0.01.

nfolds (Optional) Number of folds for cross-validation. If nfolds >= 2, then validationmust remain empty.

fold_column (Optional) Column with cross-validation fold index assignment per observation.fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified Mustbe "AUTO", "Random" or "Modulo".

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation models.beta_constraints

A data.frame or H2OParsedData object with the columns ["names", "lower_bounds","upper_bounds", "beta_given", "rho"], where each row corresponds to a predic-tor in the GLM. "names" contains the predictor names, "lower_bounds" and"upper_bounds" are the lower and upper bounds of beta, "beta_given" is somesupplied starting values for beta, and "rho" is the proximal penalty constant thatis used with "beta_given". If "rho" is not specified when "beta_given" is thenwe will take the default rho value of zero.

offset_column Specify the offset column.weights_column Specify the weights column.intercept Logical, include constant term (intercept) in the model.max_active_predictors

(Optional) Convergence criteria for number of predictors when using L1 penalty.objective_epsilon

Convergence criteria. Converge if relative change in objective function is belowthis threshold.

gradient_epsilon

Convergence criteria. Converge if gradient l-infinity norm is below this thresh-old.

non_negative Logical, allow only positive coefficients.compute_p_values

(Optional) Logical, compute p-values, only allowed with IRLSM solver and noregularization. May fail if there are collinear predictors.

remove_collinear_columns

(Optional) Logical, valid only with no regularization. If set, co-linear columnswill be automatically ignored (coefficient will be 0).

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.missing_values_handling

(Optional) Controls handling of missing values. Can be either "MeanImputa-tion" or "Skip". MeanImputation replaces missing values with mean for nu-meric and most frequent level for categorical, Skip ignores observations withany missing value. Applied both during model training *AND* scoring.

... (Currently Unimplemented) coefficients.

h2o.glm 53

Value

A subclass of H2OModel is returned. The specific subclass depends on the machine learning task athand (if it’s binomial classification, then an H2OBinomialModel is returned, if it’s regression then aH2ORegressionModel is returned). The default print-out of the models is shown, but further GLM-specifc information can be queried out of the object. To access these various items, please refer tothe seealso section below.

Upon completion of the GLM, the resulting object has coefficients, normalized coefficients, resid-ual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regres-sion), degrees of freedom, and confusion matrices. Please refer to the more in-depth GLM doc-umentation available here: http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shannon/2/docs-website/h2o-docs/index.html#Data+Science+Algorithms-GLM,

See Also

predict.H2OModel for prediction, h2o.mse, h2o.auc, h2o.confusionMatrix, h2o.performance,h2o.giniCoef, h2o.logloss, h2o.varimp, h2o.scoreHistory

Examples

h2o.init()

# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPSprostatePath = system.file("extdata", "prostate.csv", package = "h2o")prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex,

family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE)

# Run GLM of VOL ~ CAPSULE + AGE + RACE + PSA + GLEASONmyX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL"))h2o.glm(y = "VOL", x = myX, training_frame = prostate.hex, family = "gaussian",

nfolds = 0, alpha = 0.1, lambda_search = FALSE)

# GLM variable importance# Also see:# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.Rdata.hex = h2o.importFile(

path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv",destination_frame = "data.hex")

myX = 1:20myY="y"my.glm = h2o.glm(x=myX, y=myY, training_frame=data.hex, family="binomial", standardize=TRUE,

lambda_search=TRUE)

54 h2o.glrm

h2o.glrm Generalized Low Rank Model

Description

Generalized low rank decomposition of an H2O data frame.

Usage

h2o.glrm(training_frame, cols, k, model_id, validation_frame, loading_name,ignore_const_cols, transform = c("NONE", "DEMEAN", "DESCALE", "STANDARDIZE","NORMALIZE"), loss = c("Quadratic", "L1", "Huber", "Poisson", "Hinge","Logistic"), multi_loss = c("Categorical", "Ordinal"), loss_by_col = NULL,loss_by_col_idx = NULL, regularization_x = c("None", "Quadratic", "L2","L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex"),regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative","OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0,max_iterations = 1000, max_updates = 2 * max_iterations,init_step_size = 1, min_step_size = 0.001, init = c("Random","PlusPlus", "SVD"), svd_method = c("GramSVD", "Power", "Randomized"),user_y = NULL, user_x = NULL, expand_user_y = TRUE,impute_original = FALSE, recover_svd = FALSE, seed,max_runtime_secs = 0)

Arguments

training_frame An H2OFrame object containing the variables in the model.cols (Optional) A vector containing the data columns on which k-means operates.k The rank of the resulting decomposition. This must be between 1 and the num-

ber of columns in the training frame, inclusive.model_id (Optional) The unique id assigned to the resulting model. If none is given, an id

will automatically be generated.validation_frame

An H2OFrame object containing the variables in the model.loading_name (Optional) The unique name assigned to the loading matrix X in the XY decom-

position. Automatically generated if none is provided.ignore_const_cols

(Optional) A logical value indicating whether to ignore constant columns in thetraining frame. A column is constant if all of its non-missing values are the samevalue.

transform A character string that indicates how the training data should be transformedbefore running PCA. Possible values are "NONE": for no transformation, "DE-MEAN": for subtracting the mean of each column, "DESCALE": for dividingby the standard deviation of each column, "STANDARDIZE": for demeaningand descaling, and "NORMALIZE": for demeaning and dividing each columnby its range (max - min).

h2o.glrm 55

loss A character string indicating the default loss function for numeric columns. Pos-sible values are "Quadratic" (default), "L1", "Huber", "Poisson", "Hinge" and"Logistic".

multi_loss A character string indicating the default loss function for enum columns. Possi-ble values are "Categorical" and "Ordinal".

loss_by_col A vector of strings indicating the loss function for specific columns by corre-sponding index in loss_by_col_idx. Will override loss for numeric columns andmulti_loss for enum columns.

loss_by_col_idx

A vector of column indices to which the corresponding loss functions in loss_by_colare assigned. Must be zero indexed.

regularization_x

A character string indicating the regularization function for the X matrix. Possi-ble values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneS-parse", "UnitOneSparse", and "Simplex".

regularization_y

A character string indicating the regularization function for the Y matrix. Possi-ble values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneS-parse", "UnitOneSparse", and "Simplex".

gamma_x The weight on the X matrix regularization term.

gamma_y The weight on the Y matrix regularization term.

max_iterations The maximum number of iterations to run the optimization loop. Each iterationconsists of an update of the X matrix, followed by an update of the Y matrix.

max_updates The maximum number of updates of X or Y to run. Each update consists of anupdate of either the X matrix or the Y matrix. For example, if max_updates =1 and max_iterations = 1, the algorithm will initialize X and Y, update X once,and terminate without updating Y.

init_step_size Initial step size. Divided by number of columns in the training frame when cal-culating the proximal gradient update. The algorithm begins at init_step_sizeand decreases the step size at each iteration until a termination condition isreached.

min_step_size Minimum step size upon which the algorithm is terminated.

init A character string indicating how to select the initial Y matrix. Possible valuesare "Random": for initialization to a random array from the standard normaldistribution, "PlusPlus": for initialization using the clusters from k-means++initialization, or "SVD": for initialization using the first k right singular vec-tors. Additionally, the user may specify the initial Y as a matrix, data.frame,H2OFrame, or list of vectors.

svd_method (Optional) A character string that indicates how SVD should be calculated dur-ing initialization. Possible values are "GramSVD": distributed computation ofthe Gram matrix followed by a local SVD using the JAMA package, "Power":computation of the SVD using the power iteration method, "Randomized": (de-fault) approximate SVD by projecting onto a random subspace (see references).

user_y (Optional) A matrix, data.frame, H2OFrame, or list of vectors specifying theinitial Y. Only used when init = "User". The number of rows must equal k.

56 h2o.glrm

user_x (Optional) A matrix, data.frame, H2OFrame, or list of vectors specifying theinitial X. Only used when init = "User". The number of columns must equal k.

expand_user_y A logical value indicating whether the categorical columns of user_y should beone-hot expanded. Only used when init = "User" and user_y is specified.

impute_original

A logical value indicating whether to reconstruct the original training data byreversing the transformation during prediction. Model metrics are calculatedwith respect to the original data.

recover_svd A logical value indicating whether the singular values and eigenvectors shouldbe recovered during post-processing of the generalized low rank decomposition.

seed (Optional) Random seed used to initialize the X and Y matrices.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2ODimReductionModel.

References

M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). Generalized Low Rank Models[http://arxiv.org/abs/1410.0342].Unpublished manuscript, Stanford Electrical Engineering Department. N. Halko, P.G. Martinsson,J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approxi-mate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review sec-tion, Vol. 53, num. 2, pp. 217-288, June 2011.

See Also

h2o.kmeans, h2o.svd, h2o.prcomp

Examples

library(h2o)h2o.init()ausPath <- system.file("extdata", "australia.csv", package="h2o")australia.hex <- h2o.uploadFile(path = ausPath)h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",

gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)

h2o.grid 57

h2o.grid H2O Grid Support

Description

Provides a set of functions to launch a grid search and get its results.

Usage

h2o.grid(algorithm, grid_id, ..., hyper_params = list(),is_supervised = NULL, do_hyper_params_check = FALSE,search_criteria = NULL)

Arguments

algorithm Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm,deeplearning, naivebayes, pca).

grid_id (Optional) ID for resulting grid search. If it is not specified then it is autogener-ated.

... arguments describing parameters to use with algorithm (i.e., x, y, training_frame).Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning- for available parameters.

hyper_params List of lists of hyper parameters (i.e., list(ntrees=c(1,2), max_depth=c(5,7))).

is_supervised (Optional) If specified then override the default heuristic which decides if thegiven algorithm name and parameters specify a supervised or unsupervised al-gorithm.

do_hyper_params_check

Perform client check for specified hyper parameters. It can be time expensivefor large hyper space.

search_criteria

(Optional) List of control parameters for smarter hyperparameter search. Thedefault strategy ’Cartesian’ covers the entire space of hyperparameter combi-nations. Specify the ’RandomDiscrete’ strategy to get random search of allthe combinations of your hyperparameters. RandomDiscrete should be usu-ally combined with at least one early stopping criterion, max_models and/ormax_runtime_secs, e.g. list(strategy = "RandomDiscrete", max_models = 42, max_runtime_secs = 28800).

Details

Launch grid search with given algorithm and parameters.

58 h2o.group_by

Examples

library(h2o)library(jsonlite)h2o.init()iris.hex <- as.h2o(iris)grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris.hex,

hyper_params = list(ntrees = c(1,2,3)))# Get grid summarysummary(grid)# Fetch grid modelsmodel_ids <- grid@model_idsmodels <- lapply(model_ids, function(id) { h2o.getModel(id)})

h2o.group_by Group and Apply by Column

Description

Performs a group by and apply similar to ddply.

Usage

h2o.group_by(data, by, ..., gb.control = list(na.methods = NULL, col.names =NULL))

Arguments

data an H2OFrame object.

by a list of column names

gb.control a list of how to handle NA values in the dataset as well as how to name outputcolumns. See Details: for more help.

... any supported aggregate function.

Details

In the case of na.methods within gb.control, there are three possible settings. "all" will includeNAs in computation of functions. "rm" will completely remove all NA fields. "ignore" will removeNAs from the numerator but keep the rows for computational purposes. If a list smaller than thenumber of columns groups is supplied, the list will be padded by "ignore".

Similar to na.methods, col.names will pad the list with the default column names if the length isless than the number of colums groups supplied.

Value

Returns a new H2OFrame object with columns equivalent to the number of groups created

h2o.gsub 59

h2o.gsub String Global Substitute

Description

Creates a copy of the target column in which each string has all occurence of the regex patternreplaced with the replacement substring.

Usage

h2o.gsub(pattern, replacement, x, ignore.case = FALSE)

Arguments

pattern The pattern to replace.

replacement The replacement pattern.

x The column on which to operate.

ignore.case Case sensitive or not

h2o.head Return the Head or Tail of an H2O Dataset.

Description

Returns the first or last rows of an H2OFrame object.

Usage

h2o.head(x, ..., n = 6L)

## S3 method for class H2OFramehead(x, ..., n = 6L)

h2o.tail(x, ..., n = 6L)

## S3 method for class H2OFrametail(x, ..., n = 6L)

Arguments

x An H2OFrame object.

... Further arguments passed to or from other methods.

n (Optional) A single integer. If positive, number of rows in x to return. If nega-tive, all but the n first/last number of rows in x.

60 h2o.hist

Value

An H2OFrame containing the first or last n rows of an H2OFrame object.

Examples

library(h2o)h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)ausPath <- system.file("extdata", "australia.csv", package="h2o")australia.hex <- h2o.uploadFile(path = ausPath)head(australia.hex, 10)tail(australia.hex, 10)

h2o.hist Compute A Histogram

Description

Compute a histogram over a numeric column. If breaks=="FD", the MAD is used over the IQR incomputing bin width. Note that we do not beautify the breakpoints as R does.

Usage

h2o.hist(x, breaks = "Sturges", plot = TRUE)

Arguments

x A single numeric column from an H2OFrame.

breaks Can be one of the following: A string: "Sturges", "Rice", "sqrt", "Doane", "FD","Scott" A single number for the number of breaks splitting the range of the vecinto number of breaks bins of equal width A vector of numbers giving the splitpoints, e.g., c(-50,213.2123,9324834)

plot A logical value indicating whether or not a plot should be generated (default isTRUE).

h2o.hit_ratio_table 61

h2o.hit_ratio_table Retrieve the Hit Ratios If "train", "valid", and "xval" parameters areFALSE (default), then the training Hit Ratios value is returned. If morethan one parameter is set to TRUE, then a named list of Hit Ratiotables are returned, where the names are "train", "valid" or "xval".

Description

Retrieve the Hit Ratios If "train", "valid", and "xval" parameters are FALSE (default), then thetraining Hit Ratios value is returned. If more than one parameter is set to TRUE, then a named listof Hit Ratio tables are returned, where the names are "train", "valid" or "xval".

Usage

h2o.hit_ratio_table(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModel object.

train Retrieve the training Hit Ratio

valid Retrieve the validation Hit Ratio

xval Retrieve the cross-validation Hit Ratio

h2o.hour Convert Milliseconds to Hour of Day in H2O Datasets

Description

Converts the entries of an H2OFrame object from milliseconds to hours of the day (on a 0 to 23scale).

Usage

h2o.hour(x)

hour(x)

## S3 method for class H2OFramehour(x)

Arguments

x An H2OFrame object.

62 h2o.ifelse

Value

An H2OFrame object containing the entries of x converted to hours of the day.

See Also

h2o.day

h2o.ifelse H2O Apply Conditional Statement

Description

Applies conditional statements to numeric vectors in H2O parsed data objects when the data arenumeric.

Usage

h2o.ifelse(test, yes, no)

ifelse(test, yes, no)

Arguments

test A logical description of the condition to be met (>, <, =, etc...)

yes The value to return if the condition is TRUE.

no The value to return if the condition is FALSE.

Details

Only numeric values can be tested, and only numeric results can be returned for either condition.Categorical data is not currently supported for this funciton and returned values cannot be categori-cal in nature.

Value

Returns a vector of new values matching the conditions stated in the ifelse call.

Examples

h2o.init()ausPath = system.file("extdata", "australia.csv", package="h2o")australia.hex = h2o.importFile(path = ausPath)australia.hex[,9] <- ifelse(australia.hex[,3] < 279.9, 1, 0)summary(australia.hex)

h2o.importFile 63

h2o.importFile Import Files into H2O

Description

Imports files into an H2O cloud. The default behavior is to pass-through to the parse phase auto-matically.

Usage

h2o.importFolder(path, pattern = "", destination_frame = "", parse = TRUE,header = NA, sep = "", col.names = NULL, col.types = NULL,na.strings = NULL)

h2o.importURL(path, destination_frame = "", parse = TRUE, header = NA,sep = "", col.names = NULL, na.strings = NULL)

h2o.importHDFS(path, pattern = "", destination_frame = "", parse = TRUE,header = NA, sep = "", col.names = NULL, na.strings = NULL)

h2o.uploadFile(path, destination_frame = "", parse = TRUE, header = NA,sep = "", col.names = NULL, col.types = NULL, na.strings = NULL,progressBar = FALSE, parse_type = NULL)

Arguments

path The complete URL or normalized file path of the file to be imported. Each rowof data appears as one line of the file.

pattern (Optional) Character string containing a regular expression to match file(s) inthe folder.

destination_frame

(Optional) The unique hex key assigned to the imported file. If none is given, akey will automatically be generated based on the URL path.

parse (Optional) A logical value indicating whether the file should be parsed afterimport.

header (Optional) A logical value indicating whether the first line of the file containscolumn headers. If left empty, the parser will try to automatically detect this.

sep (Optional) The field separator character. Values on each line of the file are sep-arated by this character. If sep = "", the parser will automatically detect theseparator.

col.names (Optional) An H2OFrame object containing a single delimited line with the col-umn names for the file.

col.types (Optional) A vector to specify whether columns should be forced to a certaintype upon import parsing.

na.strings (Optional) H2O will interpret these strings as missing.

64 h2o.impute

progressBar (Optional) When FALSE, tell H2O parse call to block synchronously instead ofpolling. This can be faster for small datasets but loses the progress bar.

parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF","XLS", "CSV", "SVMLight"

Details

Other than h2o.uploadFile, if the given path is relative, then it will be relative to the start locationof the H2O instance. Additionally, the file must be on the same machine as the H2O cloud. In thecase of h2o.uploadFile, a relative path will resolve relative to the working directory of the currentR session.

Import an entire directory of files. If the given path is relative, then it will be relative to the startlocation of the H2O instance. The default behavior is to pass-through to the parse phase automati-cally.

h2o.importURL and h2o.importHDFS are both deprecated functions. Instead, use h2o.importFile

Examples

h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)prosPath = system.file("extdata", "prostate.csv", package = "h2o")prostate.hex = h2o.uploadFile(path = prosPath, destination_frame = "prostate.hex")class(prostate.hex)summary(prostate.hex)

h2o.impute Basic Imputation of H2O Vectors

Description

Perform simple imputation on a single vector by filling missing values with aggregates computedon the "na.rm’d" vector. Additionally, it’s possible to perform imputation based on groupings ofcolumns from within data; these columns can be passed by index or name to the by parameter. If afactor column is supplied, then the method must be one "mode". Anything else results in a full stop.

Usage

h2o.impute(data, column = 0, method = c("mean", "median", "mode"),combine_method = c("interpolate", "average", "lo", "hi"), by = NULL,groupByFrame = NULL, values = NULL)

h2o.init 65

Arguments

data The dataset containing the column to impute.

column The column to impute.

method "mean" replaces NAs with the column mean; "median" replaces NAs with thecolumn median; "mode" replaces with the most common factor (for factor columnsonly);

combine_method If method is "median", then choose how to combine quantiles on even samplesizes. This parameter is ignored in all other cases.

by group by columns

groupByFrame Impute the column col with this pre-computed grouped frame.

values A vector of impute values (one per column). NaN indicates to skip the column

Details

The default method is selected based on the type of the column to impute. If the column is numericthen "mean" is selected; if it is categorical, then "mode" is selected. Otherwise column types (e.g.String, Time, UUID) are not supported.

Value

an H2OFrame with imputed values

Examples

h2o.init()fr <- as.h2o(iris, destination_frame="iris")fr[sample(nrow(fr),40),5] <- NA # randomly replace 50 values with NA# impute with a group byfr <- h2o.impute(fr, "Species", "mode", by=c("Sepal.Length", "Sepal.Width"))

h2o.init Initialize and Connect to H2O

Description

Attempts to start and/or connect to and H2O instance.

Usage

h2o.init(ip = "localhost", port = 54321, startH2O = TRUE,forceDL = FALSE, enable_assertions = TRUE, license = NULL,nthreads = -2, max_mem_size = NULL, min_mem_size = NULL,ice_root = tempdir(), strict_version_check = TRUE,proxy = NA_character_, https = FALSE, insecure = FALSE,username = NA_character_, password = NA_character_)

66 h2o.init

Arguments

ip Object of class character representing the IP address of the server where H2Ois running.

port Object of class numeric representing the port number of the H2O server.startH2O (Optional) A logical value indicating whether to try to start H2O from R if no

connection with H2O is detected. This is only possible if ip = "localhost"or ip = "127.0.0.1". If an existing connection is detected, R does not startH2O.

forceDL (Optional) A logical value indicating whether to force download of the H2Oexecutable. Defaults to FALSE, so the executable will only be downloaded if itdoes not already exist in the h2o R library resources directory h2o/java/h2o.jar.This value is only used when R starts H2O.

enable_assertions

(Optional) A logical value indicating whether H2O should be launched withassertions enabled. Used mainly for error checking and debugging purposes.This value is only used when R starts H2O.

license (Optional) A character string value specifying the full path of the license file.This value is only used when R starts H2O.

nthreads (Optional) Number of threads in the thread pool. This relates very closely to thenumber of CPUs used. -2 means use the CRAN default of 2 CPUs. -1 means useall CPUs on the host. A positive integer specifies the number of CPUs directly.This value is only used when R starts H2O.

max_mem_size (Optional) A character string specifying the maximum size, in bytes, of thememory allocation pool to H2O. This value must a multiple of 1024 greaterthan 2MB. Append the letter m or M to indicate megabytes, or g or G to indicategigabytes. This value is only used when R starts H2O.

min_mem_size (Optional) A character string specifying the minimum size, in bytes, of thememory allocation pool to H2O. This value must a multiple of 1024 greaterthan 2MB. Append the letter m or M to indicate megabytes, or g or G to indicategigabytes. This value is only used when R starts H2O.

ice_root (Optional) A directory to handle object spillage. The defaul varies by OS.strict_version_check

(Optional) Setting this to FALSE is unsupported and should only be done whenadvised by technical support.

proxy (Optional) A character string specifying the proxy path.https (Optional) Set this to TRUE to use https instead of http.insecure (Optional) Set this to TRUE to disable SSL certificate checking.username (Optional) Username to login with.password (Optional) Password to login with.

Details

By default, this method first checks if an H2O instance is connectible. If it cannot connect andstart = TRUE with ip = "localhost", it will attempt to start and instance of H2O at local-host:54321. Otherwise it stops with an error.

h2o.init 67

When initializing H2O locally, this method searches for h2o.jar in the R library resources (system.file("java", "h2o.jar", package = "h2o")),and if the file does not exist, it will automatically attempt to download the correct version fromAmazon S3. The user must have Internet access for this process to be successful.

Once connected, the method checks to see if the local H2O R package version matches the versionof H2O running on the server. If there is a mismatch and the user indicates she wishes to upgrade,it will remove the local H2O R package and download/install the H2O R package from the server.

Value

this method will load it and return a H2OConnection object containing the IP address and portnumber of the H2O server.

Note

Users may wish to manually upgrade their package (rather than waiting until being prompted),which requires that they fully uninstall and reinstall the H2O package, and the H2O client package.You must unload packages running in the environment before upgrading. It’s recommended thatusers restart R or R studio after upgrading

See Also

H2O R package documentation for more details. h2o.shutdown for shutting down from R.

Examples

## Not run:# Try to connect to a local H2O instance that is already running.# If not found, start a local H2O instance from R with the default settings.h2o.init()

# Try to connect to a local H2O instance.# If not found, raise an error.h2o.init(startH2O = FALSE)

# Try to connect to a local H2O instance that is already running.# If not found, start a local H2O instance from R with 5 gigabytes of memory.h2o.init(max_mem_size = "5g")

# Try to connect to a local H2O instance that is already running.# If not found, start a local H2O instance from R that uses 5 gigabytes of memory.h2o.init(max_mem_size = "5g")

## End(Not run)

68 h2o.insertMissingValues

h2o.insertMissingValues

Inserting Missing Values to an H2O DataH2OFrame

Description

*This is primarily used for testing*. Randomly replaces a user-specified fraction of entries in anH2O dataset with missing values.

Usage

h2o.insertMissingValues(data, fraction = 0.1, seed = -1)

Arguments

data An H2OFrame object representing the dataset.

fraction A number between 0 and 1 indicating the fraction of entries to replace withmissing.

seed A random number used to select which entries to replace with missing values.Default of seed = -1 will automatically generate a seed in H2O.

WARNING

This will modify the original dataset. Unless this is intended, this function should only be called ona subset of the original.

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris.csv", package = "h2o")iris.hex <- h2o.importFile(path = irisPath)summary(iris.hex)irismiss.hex <- h2o.insertMissingValues(iris.hex, fraction = 0.25)head(irismiss.hex)summary(irismiss.hex)

h2o.interaction 69

h2o.interaction Categorical Interaction Feature Creation in H2O

Description

Creates a data frame in H2O with n-th order interaction features between categorical columns, asspecified by the user.

Usage

h2o.interaction(data, destination_frame, factors, pairwise, max_factors,min_occurrence)

Arguments

data An H2OFrame object containing the categorical columns.destination_frame

A string indicating the destination key. If empty, this will be auto-generated byH2O.

factors Factor columns (either indices or column names).

pairwise Whether to create pairwise interactions between factors (otherwise create onehigher-order interaction). Only applicable if there are 3 or more factors.

max_factors Max. number of factor levels in pair-wise interaction terms (if enforced, oneextra catch-all factor will be made)

min_occurrence Min. occurrence threshold for factor levels in pair-wise interaction terms

Value

Returns an H2OFrame object.

Examples

library(h2o)h2o.init()

# Create some random datamyframe = h2o.createFrame(rows = 20, cols = 5,

seed = -12301283, randomize = TRUE, value = 0,categorical_fraction = 0.8, factors = 10, real_range = 1,integer_fraction = 0.2, integer_range = 10,binary_fraction = 0, binary_ones_fraction = 0.5,missing_fraction = 0.2,response_factors = 1)

# Turn integer column into a categoricalmyframe[,5] <- as.factor(myframe[,5])head(myframe, 20)

70 h2o.killMinus3

# Create pairwise interactionspairwise <- h2o.interaction(myframe, destination_frame = pairwise,

factors = list(c(1,2),c("C2","C3","C4")),pairwise=TRUE, max_factors = 10, min_occurrence = 1)

head(pairwise, 20)h2o.levels(pairwise,2)

# Create 5-th order interactionhigherorder <- h2o.interaction(myframe, destination_frame = higherorder, factors = c(1,2,3,4,5),

pairwise=FALSE, max_factors = 10000, min_occurrence = 1)head(higherorder, 20)

# Limit the number of factors of the "categoricalized" integer column# to at most 3 factors, and only if they occur at least twicehead(myframe[,5], 20)trim_integer_levels <- h2o.interaction(myframe, destination_frame = trim_integers, factors = "C5",

pairwise = FALSE, max_factors = 3, min_occurrence = 2)head(trim_integer_levels, 20)

# Put all togethermyframe <- h2o.cbind(myframe, pairwise, higherorder, trim_integer_levels)myframehead(myframe,20)summary(myframe)

h2o.is_client Check Client Mode Connection

Description

Check Client Mode Connection

Usage

h2o.is_client()

h2o.killMinus3 Dump the stack into the JVM’s stdout.

Description

A poor man’s profiler, but effective.

Usage

h2o.killMinus3()

h2o.kmeans 71

h2o.kmeans KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, ignore_const_cols = TRUE,max_iterations = 1000, standardize = TRUE, init = c("Furthest","Random", "PlusPlus"), seed, nfolds = 0, fold_column = NULL,fold_assignment = c("AUTO", "Random", "Modulo"),keep_cross_validation_predictions = FALSE, max_runtime_secs = 0)

Arguments

training_frame An H2OFrame object containing the variables in the model.

x (Optional) A vector containing the data columns on which k-means operates.

k The number of clusters. Must be between 1 and 1e7 inclusive. k may be omittedif the user specifies the initial centers in the init parameter. If k is not omitted,in this case, then it should be equal to the number of user-specified centers.

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

max_iterations The maximum number of iterations allowed. Must be between 0

standardize Logical, indicates whether the data should be standardized before running k-means.

init A character string that selects the initial set of k cluster centers. Possible valuesare "Random": for random initialization, "PlusPlus": for k-means plus initial-ization, or "Furthest": for initialization at the furthest point from each succes-sive center. Additionally, the user may specify a the initial centers as a ma-trix, data.frame, H2OFrame, or list of vectors. For matrices, data.frames, andFrames, each row of the respective structure is an initial center. For lists ofvectors, each vector is an initial center.

seed (Optional) Random seed used to initialize the cluster centroids.

nfolds (Optional) Number of folds for cross-validation. If nfolds >= 2, then validationmust remain empty.

fold_column (Optional) Column with cross-validation fold index assignment per observationfold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified Mustbe "AUTO", "Random" or "Modulo"

72 h2o.levels

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation modelsmax_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2OClusteringModel.

See Also

h2o.cluster_sizes, h2o.totss, h2o.num_iterations, h2o.betweenss, h2o.tot_withinss,h2o.withinss, h2o.centersSTD, h2o.centers

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

h2o.levels Return the levels from the column requested column.

Description

Return the levels from the column requested column.

Usage

h2o.levels(x, i)

Arguments

x An H2OFrame object.i Optional, the index of the column whose domain is to be returned.

See Also

levels for the base R method.

Examples

iris.hex <- as.h2o(iris)h2o.levels(iris.hex, 5) # returns "setosa" "versicolor" "virginica"

h2o.listTimezones 73

h2o.listTimezones List all of the Time Zones Acceptable by the H2O Cloud.

Description

List all of the Time Zones Acceptable by the H2O Cloud.

Usage

h2o.listTimezones()

h2o.loadModel Load H2O Model from HDFS or Local Disk

Description

Load a saved H2O model from disk.

Usage

h2o.loadModel(path)

Arguments

path The path of the H2O Model to be imported. and port of the server running H2O.

Value

Returns a H2OModel object of the class corresponding to the type of model built.

See Also

h2o.saveModel, H2OModel

Examples

## Not run:# library(h2o)# h2o.init()# prosPath = system.file("extdata", "prostate.csv", package = "h2o")# prostate.hex = h2o.importFile(path = prosPath, destination_frame = "prostate.hex")# prostate.glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"),# training_frame = prostate.hex, family = "binomial", alpha = 0.5)# glmmodel.path = h2o.saveModel(prostate.glm, dir = "/Users/UserName/Desktop")# glmmodel.load = h2o.loadModel(glmmodel.path)

## End(Not run)

74 h2o.logloss

h2o.logAndEcho Log a message on the server-side logs

Description

This is helpful when running several pieces of work one after the other on a single H2O cluster andyou want to make a notation in the H2O server side log where one piece of work ends and the nextpiece of work begins.

Usage

h2o.logAndEcho(message)

Arguments

message A character string with the message to write to the log.

Details

h2o.logAndEcho sends a message to H2O for logging. Generally used for debugging purposes.

h2o.logloss Retrieve the Log Loss Value

Description

Retrieves the log loss output for a H2OBinomialMetrics or H2OMultinomialMetrics object If "train","valid", and "xval" parameters are FALSE (default), then the training Log Loss value is returned.If more than one parameter is set to TRUE, then a named vector of Log Losses are returned, wherethe names are "train", "valid" or "xval".

Usage

h2o.logloss(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object a H2OModelMetrics object of the correct type.

train Retrieve the training Log Loss

valid Retrieve the validation Log Loss

xval Retrieve the cross-validation Log Loss

h2o.ls 75

h2o.ls List Keys on an H2O Cluster

Description

Accesses a list of object keys in the running instance of H2O.

Usage

h2o.ls()

Value

Returns a list of hex keys in the current H2O instance.

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)h2o.ls()

h2o.makeGLMModel Set betas of an existing H2O GLM Model

Description

This function allows setting betas of an existing glm model.

Usage

h2o.makeGLMModel(model, beta)

Arguments

model an H2OModel corresponding from a h2o.glm call.

beta a new set of betas (a named vector)

76 h2o.mean

h2o.match Value Matching in H2O

Description

match and %in% return values similar to the base R generic functions.

Usage

h2o.match(x, table, nomatch = 0, incomparables = NULL)

match.H2OFrame(x, table, nomatch = 0, incomparables = NULL)

x %in% table

Arguments

x a categorical vector from an H2OFrame object with values to be matched.

table an R object to match x against.

nomatch the value to be returned in the case when no match is found.

incomparables a vector of calues that cannot be matched. Any value in x matching a value inthis vector is assigned the nomatch value.

See Also

match for base R implementation.

Examples

h2o.init()hex <- as.h2o(iris)h2o.match(hex[,5], c("setosa", "versicolor"))

h2o.mean Mean of a column

Description

Obtain the mean of a column of a parsed H2O data object.

h2o.mean_residual_deviance 77

Usage

h2o.mean(x, ..., na.rm = TRUE)

## S3 method for class H2OFramemean(x, ..., na.rm = TRUE)

Arguments

x An H2OFrame object.... Further arguments to be passed from or to other methods.na.rm A logical value indicating whether NA or missing values should be stripped be-

fore the computation.

See Also

mean for the base R implementation.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)mean(prostate.hex$AGE)

h2o.mean_residual_deviance

Retrieve the Mean Residual Deviance value

Description

Retrieves the Mean Residual Deviance value from an H2O model. If "train", "valid", and "xval"parameters are FALSE (default), then the training Mean Residual Deviance value is returned. Ifmore than one parameter is set to TRUE, then a named vector of Mean Residual Deviances arereturned, where the names are "train", "valid" or "xval".

Usage

h2o.mean_residual_deviance(object, train = FALSE, valid = FALSE,xval = FALSE)

Arguments

object An H2OModel object.train Retrieve the training Mean Residual Deviancevalid Retrieve the validation Mean Residual Deviancexval Retrieve the cross-validation Mean Residual Deviance

78 h2o.median

Examples

library(h2o)

h <- h2o.init()fr <- as.h2o(iris)

m <- h2o.deeplearning(x=2:5,y=1,training_frame=fr)

h2o.mean_residual_deviance(m)

h2o.median H2O Median

Description

Compute the median of an H2OFrame.

Usage

h2o.median(x, na.rm = TRUE)

## S3 method for class H2OFramemedian(x, na.rm = TRUE)

Arguments

x An H2OFrame object.

na.rm a logical, indicating whether na’s are omitted.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath, destination_frame = "prostate.hex")

h2o.merge 79

h2o.merge Merge Two H2O Data Frames

Description

Merges two H2OFrame objects by shared column names. Unlike the base R implementation,h2o.merge only supports merging through shared column names.

Usage

h2o.merge(x, y, all.x = FALSE, all.y = FALSE, by.x = NULL, by.y = NULL,method = "hash")

Arguments

x,y H2OFrame objects

all.x If all.x is true, all rows in the x will be included, even if there is no matchingrow in y, and vice-versa for all.y.

all.y see all.x

by.x x columns used for merging.

by.y y columns used for merging.

method auto, radix, or hash (default)

Details

In order for h2o.merge to work in multinode clusters, one of the datasets must be small enough toexist in every node. Currently, this function only supports all.x = TRUE. All other permutationswill fail.

Examples

h2o.init()left <- data.frame(fruit = c(apple, orange, banana, lemon, strawberry, blueberry),color = c(red, orange, yellow, yellow, red, blue))right <- data.frame(fruit = c(apple, orange, banana, lemon, strawberry, watermelon),citrus = c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))l.hex <- as.h2o(left)r.hex <- as.h2o(right)left.hex <- h2o.merge(l.hex, r.hex, all.x = TRUE)

80 h2o.metric

h2o.metric H2O Model Metric Accessor Functions

Description

A series of functions that retrieve model metric details.

Usage

h2o.metric(object, thresholds, metric)

h2o.F0point5(object, thresholds)

h2o.F1(object, thresholds)

h2o.F2(object, thresholds)

h2o.accuracy(object, thresholds)

h2o.error(object, thresholds)

h2o.maxPerClassError(object, thresholds)

h2o.mcc(object, thresholds)

h2o.precision(object, thresholds)

h2o.tpr(object, thresholds)

h2o.fpr(object, thresholds)

h2o.fnr(object, thresholds)

h2o.tnr(object, thresholds)

h2o.recall(object, thresholds)

h2o.sensitivity(object, thresholds)

h2o.fallout(object, thresholds)

h2o.missrate(object, thresholds)

h2o.specificity(object, thresholds)

Arguments

object An H2OModelMetrics object of the correct type.

h2o.mktime 81

thresholds (Optional) A value or a list of values between 0.0 and 1.0.

metric (Optional) A specified paramter to retrieve.

Details

Many of these functions have an optional thresholds parameter. Currently only increments of 0.1are allowed. If not specified, the functions will return all possible values. Otherwise, the functionwill return the value for the indicated threshold.

Currently, the these functions are only supported by H2OBinomialMetrics objects.

Value

Returns either a single value, or a list of values.

See Also

h2o.auc for AUC, h2o.giniCoef for the GINI coefficient, and h2o.mse for MSE. See h2o.performancefor creating H2OModelMetrics objects.

Examples

library(h2o)h2o.init()

prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)

hex[,2] <- as.factor(hex[,2])model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli")perf <- h2o.performance(model, hex)h2o.F1(perf)

h2o.mktime Compute msec since the Unix Epoch

Description

Compute msec since the Unix Epoch

Usage

h2o.mktime(year = 1970, month = 0, day = 0, hour = 0, minute = 0,second = 0, msec = 0)

82 h2o.month

Arguments

year Defaults to 1970

month zero based (months are 0 to 11)

day zero based (days are 0 to 30)

hour hour

minute minute

second second

msec msec

h2o.month Convert Milliseconds to Months in H2O Datasets

Description

Converts the entries of an H2OFrame object from milliseconds to months (on a 1 to 12 scale).

Usage

h2o.month(x)

month(x)

## S3 method for class H2OFramemonth(x)

Arguments

x An H2OFrame object.

Value

An H2OFrame object containing the entries of x converted to months of the year.

See Also

h2o.year

h2o.mse 83

h2o.mse Retrieves Mean Squared Error Value

Description

Retrieves the mean squared error value from an H2OModelMetrics object. If "train", "valid", and"xval" parameters are FALSE (default), then the training MSEvalue is returned. If more than oneparameter is set to TRUE, then a named vector of MSEs are returned, where the names are "train","valid" or "xval".

Usage

h2o.mse(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModelMetrics object of the correct type.

train Retrieve the training MSE

valid Retrieve the validation MSE

xval Retrieve the cross-validation MSE

Details

This function only supports H2OBinomialMetrics, H2OMultinomialMetrics, and H2ORegressionMetricsobjects.

See Also

h2o.auc for AUC, h2o.mse for MSE, and h2o.metric for the various threshold metrics. Seeh2o.performance for creating H2OModelMetrics objects.

Examples

library(h2o)h2o.init()

prosPath <- system.file("extdata", "prostate.csv", package="h2o")hex <- h2o.uploadFile(prosPath)

hex[,2] <- as.factor(hex[,2])model <- h2o.gbm(x = 3:9, y = 2, training_frame = hex, distribution = "bernoulli")perf <- h2o.performance(model, hex)h2o.mse(perf)

84 h2o.naiveBayes

h2o.nacnt Count of NAs per column

Description

Gives the count of NAs per column.

Usage

h2o.nacnt(x)

Arguments

x An H2OFrame object.

Examples

h2o.init()iris.hex <- as.h2o(iris)h2o.nacnt(iris.hex) # should return all 0sh2o.insertMissingValues(iris.hex)h2o.nacnt(iris.hex)

h2o.naiveBayes Naive Bayes Model in H2O

Description

Compute naive Bayes probabilities on an H2O dataset.

Usage

h2o.naiveBayes(x, y, training_frame, model_id, ignore_const_cols = TRUE,laplace = 0, threshold = 0.001, eps = 0, compute_metrics = TRUE,max_runtime_secs = 0)

Arguments

x A vector containing the names or indices of the predictor variables to use inbuilding the model.

y The name or index of the response variable. If the data does not contain a header,this is the column index number starting at 0, and increasing from left to right.The response must be a categorical variable with at least two levels.

training_frame An H2OFrame object containing the variables in the model.

h2o.naiveBayes 85

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

laplace A positive number controlling Laplace smoothing. The default zero disablessmoothing.

threshold The minimum standard deviation to use for observations without enough data.Must be at least 1e-10.

eps A threshold cutoff to deal with numeric instability, must be positive.

compute_metrics

A logical value indicating whether model metrics should be computed. Set toFALSE to reduce the runtime of the algorithm.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Details

The naive Bayes classifier assumes independence between predictor variables conditional on theresponse, and a Gaussian distribution of numeric predictors with mean and standard deviation com-puted from the training dataset. When building a naive Bayes classifier, every row in the trainingdataset that contains at least one NA will be skipped completely. If the test dataset has missingvalues, then those predictors are omitted in the probability calculation during prediction.

The naive Bayes classifier assumes independence between predictor variables conditional on theresponse, and a Gaussian distribution of numeric predictors with mean and standard deviation com-puted from the training dataset. When building a naive Bayes classifier, every row in the trainingdataset that contains at least one NA will be skipped completely. If the test dataset has missingvalues, then those predictors are omitted in the probability calculation during prediction.

Value

Returns an object of class H2OBinomialModel if the response has two categorical levels, andH2OMultinomialModel otherwise.

Examples

h2o.init()votesPath <- system.file("extdata", "housevotes.csv", package="h2o")votes.hex <- h2o.uploadFile(path = votesPath, header = TRUE)h2o.naiveBayes(x = 2:17, y = 1, training_frame = votes.hex, laplace = 3)

86 h2o.nlevels

h2o.nchar String length

Description

String length

Usage

h2o.nchar(x)

Arguments

x The column whose string lengths will be returned.

h2o.networkTest View Network Traffic Speed

Description

View speed with various file sizes.

Usage

h2o.networkTest()

Value

Returns a table listing the network speed for 1B, 10KB, and 10MB.

h2o.nlevels Get the number of factor levels for this frame.

Description

Get the number of factor levels for this frame.

Usage

h2o.nlevels(x)

Arguments

x An H2OFrame object.

See Also

nlevels for the base R method.

h2o.no_progress 87

h2o.no_progress Disable Progress Bar

Description

Disable Progress Bar

Usage

h2o.no_progress()

h2o.null_deviance Retrieve the null deviance If "train", "valid", and "xval" parametersare FALSE (default), then the training null deviance value is returned.If more than one parameter is set to TRUE, then a named vector ofnull deviances are returned, where the names are "train", "valid" or"xval".

Description

Retrieve the null deviance If "train", "valid", and "xval" parameters are FALSE (default), then thetraining null deviance value is returned. If more than one parameter is set to TRUE, then a namedvector of null deviances are returned, where the names are "train", "valid" or "xval".

Usage

h2o.null_deviance(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModel or H2OModelMetrics

train Retrieve the training null deviance

valid Retrieve the validation null deviance

xval Retrieve the cross-validation null deviance

88 h2o.num_iterations

h2o.null_dof Retrieve the null degrees of freedom If "train", "valid", and "xval"parameters are FALSE (default), then the training null degrees of free-dom value is returned. If more than one parameter is set to TRUE,then a named vector of null degrees of freedom are returned, wherethe names are "train", "valid" or "xval".

Description

Retrieve the null degrees of freedom If "train", "valid", and "xval" parameters are FALSE (default),then the training null degrees of freedom value is returned. If more than one parameter is set toTRUE, then a named vector of null degrees of freedom are returned, where the names are "train","valid" or "xval".

Usage

h2o.null_dof(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModel or H2OModelMetrics

train Retrieve the training null degrees of freedom

valid Retrieve the validation null degrees of freedom

xval Retrieve the cross-validation null degrees of freedom

h2o.num_iterations Retrieve the number of iterations.

Description

Retrieve the number of iterations.

Usage

h2o.num_iterations(object)

Arguments

object An H2OClusteringModel object.

... further arguments to be passed on (currently unimplemented)

h2o.openLog 89

h2o.openLog View H2O R Logs

Description

Open existing logs of H2O R POST commands and error resposnes on local disk. Used primarilyfor debugging purposes.

Usage

h2o.openLog(type)

Arguments

type Currently unimplemented.

See Also

h2o.startLogging, h2o.stopLogging, h2o.clearLog

Examples

## Not run:h2o.init()

h2o.startLogging()ausPath = system.file("extdata", "australia.csv", package="h2o")australia.hex = h2o.importFile(path = ausPath)h2o.stopLogging()

# Not run to avoid windows being opened during R CMD check# h2o.openLog("Command")# h2o.openLog("Error")

## End(Not run)

h2o.parseRaw H2O Data Parsing

Description

The second phase in the data ingestion step.

Usage

h2o.parseRaw(data, destination_frame = "", header = NA, sep = "",col.names = NULL, col.types = NULL, na.strings = NULL,blocking = FALSE, parse_type = NULL)

90 h2o.parseSetup

Arguments

data An H2OFrame object to be parsed.destination_frame

(Optional) The hex key assigned to the parsed file.

header (Optional) A logical value indicating whether the first row is the column header.If missing, H2O will automatically try to detect the presence of a header.

sep (Optional) The field separator character. Values on each line of the file are sep-arated by this character. If sep = "", the parser will automatically detect theseparator.

col.names (Optional) An H2OFrame object containing a single delimited line with the col-umn names for the file.

col.types (Optional) A vector specifying the types to attempt to force over columns.

na.strings (Optional) H2O will interpret these strings as missing.

blocking (Optional) Tell H2O parse call to block synchronously instead of polling. Thiscan be faster for small datasets but loses the progress bar.

parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF","XLS", "CSV", "SVMLight"

Details

Parse the Raw Data produced by the import phase.

h2o.parseSetup Get a parse setup back for the staged data.

Description

Get a parse setup back for the staged data.

Usage

h2o.parseSetup(data, destination_frame = "", header = NA, sep = "",col.names = NULL, col.types = NULL, na.strings = NULL,parse_type = NULL)

Arguments

data An H2OFrame object to be parsed.destination_frame

(Optional) The hex key assigned to the parsed file.

header (Optional) A logical value indicating whether the first row is the column header.If missing, H2O will automatically try to detect the presence of a header.

h2o.performance 91

sep (Optional) The field separator character. Values on each line of the file are sep-arated by this character. If sep = "", the parser will automatically detect theseparator.

col.names (Optional) An H2OFrame object containing a single delimited line with the col-umn names for the file.

col.types (Optional) A vector specifying the types to attempt to force over columns.

na.strings (Optional) H2O will interpret these strings as missing.

parse_type (Optional) Specify which parser type H2O will use. Valid types are "ARFF","XLS", "CSV", "SVMLight"

h2o.performance Model Performance Metrics in H2O

Description

Given a trained h2o model, compute its performance on the given dataset

Usage

h2o.performance(model, newdata = NULL, train = FALSE, valid = FALSE,xval = FALSE, data = NULL)

Arguments

model An H2OModel object

newdata An H2OFrame. The model will make predictions on this dataset, and subse-quently score them. The dataset should match the dataset that was used to trainthe model, in terms of column names, types, and dimensions. If newdata ispassed in, then train, valid, and xval are ignored.

train A logical value indicating whether to return the training metrics (constructedduring training).

valid A logical value indicating whether to return the validation metrics (constructedduring training).

xval A logical value indicating whether to return the cross-validation metrics (con-structed during training).

data (DEPRECATED) An H2OFrame. This argument is now called ‘newdata‘.

Value

Returns an object of the H2OModelMetrics subclass.

92 h2o.prcomp

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)prostate.hex$CAPSULE <- as.factor(prostate.hex$CAPSULE)prostate.gbm <- h2o.gbm(3:9, "CAPSULE", prostate.hex)h2o.performance(model = prostate.gbm, newdata=prostate.hex)

h2o.prcomp Principal Components Analysis

Description

Principal components analysis of an H2O data frame using the power method to calculate the sin-gular value decomposition of the Gram matrix.

Usage

h2o.prcomp(training_frame, x, k, model_id, ignore_const_cols = TRUE,max_iterations = 1000, transform = c("NONE", "DEMEAN", "DESCALE","STANDARDIZE"), pca_method = c("GramSVD", "Power", "Randomized", "GLRM"),use_all_factor_levels = FALSE, compute_metrics = TRUE,impute_missing = FALSE, seed, max_runtime_secs = 0)

Arguments

training_frame An H2OFrame object containing the variables in the model.

x (Optional) A vector containing the data columns on which SVD operates.

k The number of principal components to be computed. This must be between 1and min(ncol(training_frame), nrow(training_frame)) inclusive.

model_id (Optional) The unique hex key assigned to the resulting model. Automaticallygenerated if none is provided.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

max_iterations The maximum number of iterations to run each power iteration loop. Must bebetween 1 and 1e6 inclusive.

transform A character string that indicates how the training data should be transformedbefore running PCA. Possible values are "NONE": for no transformation, "DE-MEAN": for subtracting the mean of each column, "DESCALE": for dividingby the standard deviation of each column, "STANDARDIZE": for demeaningand descaling, and "NORMALIZE": for demeaning and dividing each columnby its range (max - min).

h2o.prcomp 93

pca_method A character string that indicates how PCA should be calculated. Possible valuesare "GramSVD": distributed computation of the Gram matrix followed by alocal SVD using the JAMA package, "Power": computation of the SVD usingthe power iteration method, "Randomized": approximate SVD by projectingonto a random subspace (see references), "GLRM": fit a generalized low rankmodel with an l2 loss function (no regularization) and solve for the SVD usinglocal matrix algebra.

use_all_factor_levels

(Optional) A logical value indicating whether all factor levels should be includedin each categorical column expansion. If FALSE, the indicator column corre-sponding to the first factor level of every categorical variable will be dropped.Defaults to FALSE.

compute_metrics

(Optional) A logical value indicating whether to compute metrics on the trainingdata, which requires additional calculation time. Only used if pca_method ="GLRM". Defaults to TRUE.

impute_missing (Optional) A logical value indicating whether missing values should be imputedwith the mean of the corresponding column. This is necessary if too many en-tries are NA when using methods like GramSVD. Defaults to FALSE.

seed (Optional) Random seed used to initialize the right singular vectors at the begin-ning of each power method iteration.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Value

Returns an object of class H2ODimReductionModel.

References

N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithmsfor constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev.,Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

See Also

h2o.svd, h2o.glrm

Examples

library(h2o)h2o.init()ausPath <- system.file("extdata", "australia.csv", package="h2o")australia.hex <- h2o.uploadFile(path = ausPath)h2o.prcomp(training_frame = australia.hex, k = 8, transform = "STANDARDIZE")

94 h2o.proj_archetypes

h2o.proj_archetypes Convert Archetypes to Features from H2O GLRM Model

Description

Project each archetype in an H2O GLRM model into the corresponding feature space from the H2Otraining frame.

Usage

h2o.proj_archetypes(object, data, reverse_transform = FALSE)

Arguments

object An H2ODimReductionModel object that represents the model containing archetypesto be projected.

data An H2OFrame object representing the training data for the H2O GLRM model.

reverse_transform

(Optional) A logical value indicating whether to reverse the transformation frommodel-building by re-scaling columns and adding back the offset to each columnof the projected archetypes.

Value

Returns an H2OFrame object containing the projection of the archetypes down into the originalfeature space, where each row is one archetype.

See Also

h2o.glrm for making an H2ODimReductionModel.

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o")iris.hex <- h2o.uploadFile(path = irisPath)iris.glrm <- h2o.glrm(training_frame = iris.hex, k = 4, loss = "Quadratic",

multi_loss = "Categorical", max_iterations = 1000)iris.parch <- h2o.proj_archetypes(iris.glrm, iris.hex)head(iris.parch)

h2o.quantile 95

h2o.quantile Quantiles of H2O Frames.

Description

Obtain and display quantiles for H2O parsed data.

Usage

h2o.quantile(x, probs = c(0.001, 0.01, 0.1, 0.25, 0.333, 0.5, 0.667, 0.75,0.9, 0.99, 0.999), combine_method = c("interpolate", "average", "avg","low", "high"), weights_column = NULL, ...)

## S3 method for class H2OFramequantile(x, probs = c(0.001, 0.01, 0.1, 0.25, 0.333, 0.5,0.667, 0.75, 0.9, 0.99, 0.999), combine_method = c("interpolate", "average","avg", "low", "high"), weights_column = NULL, ...)

Arguments

x An H2OFrame object with a single numeric column.

probs Numeric vector of probabilities with values in [0,1].

combine_method How to combine quantiles for even sample sizes. Default is to do linear inter-polation. E.g., If method is "lo", then it will take the lo value of the quantile.Abbreviations for average, low, and high are acceptable (avg, lo, hi).

weights_column (Optional) String name of the observation weights column in x or an H2OFrameobject with a single numeric column of observation weights.

... Further arguments passed to or from other methods.

Details

quantile.H2OFrame, a method for the quantile generic. Obtain and return quantiles for anH2OFrame object.

Value

A vector describing the percentiles at the given cutoffs for the H2OFrame object.

Examples

# Request quantiles for an H2O parsed data set:library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)# Request quantiles for a subset of columns in an H2O parsed data set

96 h2o.r2

quantile(prostate.hex[,3])for(i in 1:ncol(prostate.hex))

quantile(prostate.hex[,i])

h2o.r2 Retrieve the R2 value

Description

Retrieves the R2 value from an H2O model. If "train", "valid", and "xval" parameters are FALSE(default), then the training R2 value is returned. If more than one parameter is set to TRUE, then anamed vector of R2s are returned, where the names are "train", "valid" or "xval".

Usage

h2o.r2(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModel object.

train Retrieve the training R2

valid Retrieve the validation set R2 if a validation set was passed in during modelbuild time.

xval Retrieve the cross-validation R2

Examples

library(h2o)

h <- h2o.init()fr <- as.h2o(iris)

m <- h2o.deeplearning(x=2:5,y=1,training_frame=fr)

h2o.r2(m)

h2o.randomForest 97

h2o.randomForest Build a Big Data Random Forest Model

Description

Builds a Random Forest Model on an H2OFrame

Usage

h2o.randomForest(x, y, training_frame, model_id, validation_frame = NULL,ignore_const_cols = TRUE, checkpoint, mtries = -1, sample_rate = 0.632,col_sample_rate_per_tree = 1, build_tree_one_node = FALSE, ntrees = 50,max_depth = 20, min_rows = 1, nbins = 20, nbins_top_level,nbins_cats = 1024, binomial_double_trees = FALSE,balance_classes = FALSE, max_after_balance_size = 5, seed,offset_column = NULL, weights_column = NULL, nfolds = 0,fold_column = NULL, fold_assignment = c("AUTO", "Random", "Modulo"),keep_cross_validation_predictions = FALSE, score_each_iteration = FALSE,score_tree_interval = 0, stopping_rounds = 0,stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "AUC", "r2","misclassification"), stopping_tolerance = 0.001, max_runtime_secs = 0)

Arguments

x A vector containing the names or indices of the predictor variables to use inbuilding the GBM model.

y The name or index of the response variable. If the data does not contain a header,this is the column index number starting at 1, and increasing from left to right.(The response must be either an integer or a categorical variable).

training_frame An H2OFrame object containing the variables in the model.

model_id (Optional) The unique id assigned to the resulting model. If none is given, an idwill automatically be generated.

validation_frame

An H2OFrame object containing the variables in the model. Default is NULL.ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns inthe training frame.

checkpoint "Model checkpoint (either key or H2ODeepLearningModel) to resume trainingwith."

mtries Number of variables randomly sampled as candidates at each split. If set to -1,defaults to sqrtp for classification, and p/3 for regression, where p is the numberof predictors.

sample_rate Sample rate, from 0 to 1.0.col_sample_rate_per_tree

Column sample rate per tree (from 0.0 to 1.0)

98 h2o.randomForest

build_tree_one_node

Run on one node only; no network overhead but fewer cpus used. Suitable forsmall datasets.

ntrees A nonnegative integer that determines the number of trees to grow.

max_depth Maximum depth to grow the tree.

min_rows Minimum number of rows to assign to teminal nodes.

nbins For numerical columns (real/int), build a histogram of (at least) this many bins,then split at the best point.

nbins_top_level

For numerical columns (real/int), build a histogram of (at most) this many binsat the root level, then decrease by factor of two per level.

nbins_cats For categorical columns (factors), build a histogram of this many bins, then splitat the best point. Higher values can lead to more overfitting.

binomial_double_trees

For binary classification: Build 2x as many trees (one per class) - can lead tohigher accuracy.

balance_classes

logical, indicates whether or not to balance training data class counts via over/under-sampling (for imbalanced data)

max_after_balance_size

Maximum relative size of the training data after balancing class counts (canbe less than 1.0). Ignored if balance_classes is FALSE, which is the defaultbehavior.

seed Seed for random numbers (affects sampling) - Note: only reproducible whenrunning single threaded

offset_column Specify the offset column.

weights_column Specify the weights column.

nfolds (Optional) Number of folds for cross-validation. If nfolds >= 2, then validationmust remain empty.

fold_column (Optional) Column with cross-validation fold index assignment per observationfold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified Mustbe "AUTO", "Random" or "Modulo"

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation modelsscore_each_iteration

Attempts to score each tree.score_tree_interval

Score the model after every so many trees. Disabled if set to 0.stopping_rounds

Early stopping based on convergence of stopping_metric. Stop if simple movingaverage of length k of the stopping_metric does not improve (by stopping_tolerance)for k=stopping_rounds scoring events. Can only trigger after at least 2k scoringevents. Use 0 to disable.

h2o.rbind 99

stopping_metric

Metric to use for convergence checking, only for _stopping_rounds > 0 Canbe one of "AUTO", "deviance", "logloss", "MSE", "AUC", "r2", "misclassifica-tion".

stopping_tolerance

Relative tolerance for metric-based stopping criterion (if relative improvementis not at least this much, stop)

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

... (Currently Unimplemented)

Value

Creates a H2OModel object of the right type.

See Also

predict.H2OModel for prediction.

h2o.rbind Combine H2O Datasets by Rows

Description

Takes a sequence of H2O data sets and combines them by rows

Usage

h2o.rbind(...)

Arguments

... A sequence of H2OFrame arguments. All datasets must exist on the same H2Oinstance (IP and port) and contain the same number of rows.

Value

An H2OFrame object containing the combined . . . arguments row-wise.

See Also

rbind for the base R method.

100 h2o.reconstruct

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)prostate.cbind <- h2o.rbind(prostate.hex, prostate.hex)head(prostate.cbind)

h2o.reconstruct Reconstruct Training Data via H2O GLRM Model

Description

Reconstruct the training data and impute missing values from the H2O GLRM model by computingthe matrix product of X and Y, and transforming back to the original feature space by minimizingeach column’s loss function.

Usage

h2o.reconstruct(object, data, reverse_transform = FALSE)

Arguments

object An H2ODimReductionModel object that represents the model to be used forreconstruction.

data An H2OFrame object representing the training data for the H2O GLRM model.Used to set the domain of each column in the reconstructed frame.

reverse_transform

(Optional) A logical value indicating whether to reverse the transformation frommodel-building by re-scaling columns and adding back the offset to each columnof the reconstructed frame.

Value

Returns an H2OFrame object containing the approximate reconstruction of the training data;

See Also

h2o.glrm for making an H2ODimReductionModel.

h2o.removeAll 101

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o")iris.hex <- h2o.uploadFile(path = irisPath)iris.glrm <- h2o.glrm(training_frame = iris.hex, k = 4, transform = "STANDARDIZE",

loss = "Quadratic", multi_loss = "Categorical", max_iterations = 1000)iris.rec <- h2o.reconstruct(iris.glrm, iris.hex, reverse_transform = TRUE)head(iris.rec)

h2o.removeAll Remove All Objects on the H2O Cluster

Description

Removes the data from the h2o cluster, but does not remove the local references.

Usage

h2o.removeAll(timeout_secs = 0)

Arguments

timeout_secs Timeout in seconds. Default is no timeout.

See Also

h2o.rm

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package = "h2o")prostate.hex <- h2o.uploadFile(path = prosPath)h2o.ls()h2o.removeAll()h2o.ls()

102 h2o.rep_len

h2o.removeVecs Delete Columns from an H2OFrame

Description

Delete the specified columns from the H2OFrame. Returns an H2OFrame without the specifiedcolumns.

Usage

h2o.removeVecs(data, cols)

Arguments

data The H2OFrame.

cols The columns to remove.

h2o.rep_len Replicate Elements of Vectors or Lists into H2O

Description

h2o.rep performs just as rep does. It replicates the values in x in the H2O backend.

Usage

h2o.rep_len(x, length.out)

Arguments

x a vector (of any mode including a list) or a factor

length.out non negative integer. The desired length of the output vector.

Value

Creates an H2OFrame vector of the same type as x

h2o.residual_deviance 103

h2o.residual_deviance Retrieve the residual deviance If "train", "valid", and "xval" parame-ters are FALSE (default), then the training residual deviance value isreturned. If more than one parameter is set to TRUE, then a namedvector of residual deviances are returned, where the names are "train","valid" or "xval".

Description

Retrieve the residual deviance If "train", "valid", and "xval" parameters are FALSE (default), thenthe training residual deviance value is returned. If more than one parameter is set to TRUE, then anamed vector of residual deviances are returned, where the names are "train", "valid" or "xval".

Usage

h2o.residual_deviance(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OModel or H2OModelMetrics

train Retrieve the training residual deviance

valid Retrieve the validation residual deviance

xval Retrieve the cross-validation residual deviance

h2o.residual_dof Retrieve the residual degrees of freedom If "train", "valid", and "xval"parameters are FALSE (default), then the training residual degrees offreedom value is returned. If more than one parameter is set to TRUE,then a named vector of residual degrees of freedom are returned, wherethe names are "train", "valid" or "xval".

Description

Retrieve the residual degrees of freedom If "train", "valid", and "xval" parameters are FALSE (de-fault), then the training residual degrees of freedom value is returned. If more than one parameteris set to TRUE, then a named vector of residual degrees of freedom are returned, where the namesare "train", "valid" or "xval".

Usage

h2o.residual_dof(object, train = FALSE, valid = FALSE, xval = FALSE)

104 h2o.round

Arguments

object An H2OModel or H2OModelMetrics

train Retrieve the training residual degrees of freedom

valid Retrieve the validation residual degrees of freedom

xval Retrieve the cross-validation residual degrees of freedom

h2o.rm Delete Objects In H2O

Description

Remove the h2o Big Data object(s) having the key name(s) from ids.

Usage

h2o.rm(ids)

Arguments

ids The object or hex key associated with the object to be removed or a vector/listof those things.

See Also

h2o.assign, h2o.ls

h2o.round Round doubles/floats to the given number of decimal places.

Description

Round doubles/floats to the given number of decimal places.

Usage

h2o.round(x, digits = 0)

round(x, digits = 0)

Arguments

x An H2OFrame object.

digits Number of decimal places to round doubles/floats. Rounding to a negative num-ber of decimal places is

h2o.runif 105

See Also

round for the base R implementation.

h2o.runif Produce a Vector of Random Uniform Numbers

Description

Creates a vector of random uniform numbers equal in length to the length of the specified H2Odataset.

Usage

h2o.runif(x, seed = -1)

Arguments

x An H2OFrame object.

seed A random seed used to generate draws from the uniform distribution.

Value

A vector of random, uniformly distributed numbers. The elements are between 0 and 1.

Examples

library(h2o)h2o.init()prosPath = system.file("extdata", "prostate.csv", package="h2o")prostate.hex = h2o.importFile(path = prosPath, destination_frame = "prostate.hex")s = h2o.runif(prostate.hex)summary(s)

prostate.train = prostate.hex[s <= 0.8,]prostate.train = h2o.assign(prostate.train, "prostate.train")prostate.test = prostate.hex[s > 0.8,]prostate.test = h2o.assign(prostate.test, "prostate.test")nrow(prostate.train) + nrow(prostate.test)

106 h2o.saveModel

h2o.saveModel Save an H2O Model Object to Disk

Description

Save an H2OModel to disk.

Usage

h2o.saveModel(object, path = "", force = FALSE)

Arguments

object an H2OModel object.

path string indicating the directory the model will be written to.

force logical, indicates how to deal with files that already exist.

Details

In the case of existing files force = TRUE will overwrite the file. Otherwise, the operation will fail.

See Also

h2o.loadModel for loading a model to H2O from disk

Examples

## Not run:# library(h2o)# h2o.init()# prostate.hex <- h2o.importFile(path = paste("https://raw.github.com",# "h2oai/h2o-2/master/smalldata/logreg/prostate.csv", sep = "/"),# destination_frame = "prostate.hex")# prostate.glm <- h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"),# training_frame = prostate.hex, family = "binomial", alpha = 0.5)# h2o.saveModel(object = prostate.glm, path = "/Users/UserName/Desktop", force=TRUE)

## End(Not run)

h2o.scale 107

h2o.scale Scaling and Centering of an H2OFrame

Description

Centers and/or scales the columns of an H2O dataset.

Usage

h2o.scale(x, center = TRUE, scale = TRUE)

## S3 method for class H2OFramescale(x, center = TRUE, scale = TRUE)

Arguments

x An H2OFrame object.

center either a logical value or numeric vector of length equal to the number ofcolumns of x.

scale either a logical value or numeric vector of length equal to the number ofcolumns of x.

Examples

library(h2o)h2o.init()irisPath <- system.file("extdata", "iris_wheader.csv", package="h2o")iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = "iris.hex")summary(iris.hex)

# Scale and center all the numeric columns in iris data setscale(iris.hex[, 1:4])

h2o.scoreHistory Retrieve Model Score History

Description

Retrieve Model Score History

Usage

h2o.scoreHistory(object)

108 h2o.sdev

Arguments

object An H2OModel object.

h2o.sd Standard Deviation of a column of data.

Description

Obtain the standard deviation of a column of data.

Usage

h2o.sd(x, na.rm = FALSE)

sd(x, na.rm = FALSE)

Arguments

x An H2OFrame object.na.rm logical. Should missing values be removed?

See Also

h2o.var for variance, and sd for the base R implementation.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)sd(prostate.hex$AGE)

h2o.sdev Retrieve the standard deviations of principal components

Description

Retrieve the standard deviations of principal components

Usage

h2o.sdev(object)

Arguments

object An H2ODimReductionModel object.

h2o.setLevels 109

h2o.setLevels Set Levels of H2O Factor Column

Description

Works on a single categorical vector. New domains must be aligned with the old domains. This callhas SIDE EFFECTS and mutates the column in place (does not make a copy).

Usage

h2o.setLevels(x, levels)

Arguments

x A single categorical column.

levels A character vector specifying the new levels. The number of new levels mustmatch the number of old levels.

h2o.setTimezone Set the Time Zone on the H2O Cloud

Description

Set the Time Zone on the H2O Cloud

Usage

h2o.setTimezone(tz)

Arguments

tz The desired timezone.

h2o.show_progress Enable Progress Bar

Description

Enable Progress Bar

Usage

h2o.show_progress()

110 h2o.shutdown

h2o.shutdown Shut Down H2O Instance

Description

Shut down the specified instance. All data will be lost.

Usage

h2o.shutdown(prompt = TRUE)

Arguments

prompt A logical value indicating whether to prompt the user before shutting downthe H2O server.

Details

This method checks if H2O is running at the specified IP address and port, and if it is, shuts downthat H2O instance.

WARNING

All data, models, and other values stored on the server will be lost! Only call this function if youand all other clients connected to the H2O server are finished and have saved your work.

Note

Users must call h2o.shutdown explicitly in order to shut down the local H2O instance started by R.If R is closed before H2O, then an attempt will be made to automatically shut down H2O. This onlyapplies to local instances started with h2o.init, not remote H2O servers.

See Also

h2o.init

Examples

# Dont run automatically to prevent accidentally shutting down a cloud## Not run:library(h2o)h2o.init()h2o.shutdown()

## End(Not run)

h2o.signif 111

h2o.signif Round doubles/floats to the given number of significant digits.

Description

Round doubles/floats to the given number of significant digits.

Usage

h2o.signif(x, digits = 6)

signif(x, digits = 6)

Arguments

x An H2OFrame object.

digits Number of significant digits to round doubles/floats.

See Also

signif for the base R implementation.

h2o.splitFrame Split an H2O Data Set

Description

Split an existing H2O data set according to user-specified ratios.

Usage

h2o.splitFrame(data, ratios = 0.75, destination_frames, seed = -1)

Arguments

data An H2OFrame object representing the dataste to split.

ratios A numeric value or array indicating the ratio of total rows contained in eachsplit. Must total up to less than 1.

destination_frames

An array of frame IDs equal to the number of ratios specified plus one.

seed Random seed.

112 h2o.startLogging

Examples

library(h2o)h2o.init()irisPath = system.file("extdata", "iris.csv", package = "h2o")iris.hex = h2o.importFile(path = irisPath)iris.split = h2o.splitFrame(iris.hex, ratios = c(0.2, 0.5))head(iris.split[[1]])summary(iris.split[[1]])

h2o.startLogging Start Writing H2O R Logs

Description

Begin logging H2o R POST commands and error responses to local disk. Used primarily for de-buggin purposes.

Usage

h2o.startLogging(file)

Arguments

file a character string name for the file, automatically generated

See Also

h2o.stopLogging, h2o.clearLog, h2o.openLog

Examples

library(h2o)h2o.init()h2o.startLogging()ausPath = system.file("extdata", "australia.csv", package="h2o")australia.hex = h2o.importFile(path = ausPath)h2o.stopLogging()

h2o.stopLogging 113

h2o.stopLogging Stop Writing H2O R Logs

Description

Halt logging of H2O R POST commands and error responses to local disk. Used primarily fordebugging purposes.

Usage

h2o.stopLogging()

See Also

h2o.startLogging, h2o.clearLog, h2o.openLog

Examples

library(h2o)h2o.init()h2o.startLogging()ausPath = system.file("extdata", "australia.csv", package="h2o")australia.hex = h2o.importFile(path = ausPath)h2o.stopLogging()

h2o.strsplit String Split

Description

String Split

Usage

h2o.strsplit(x, split)

Arguments

x The column whose strings must be split.

split The pattern to split on.

114 h2o.substring

h2o.sub String Substitute

Description

Creates a copy of the target column in which each string has the first occurence of the regex patternreplaced with the replacement substring.

Usage

h2o.sub(pattern, replacement, x, ignore.case = FALSE)

Arguments

pattern The pattern to replace.

replacement The replacement pattern.

x The column on which to operate.

ignore.case Case sensitive or not

h2o.substring Substring

Description

Returns a copy of the target column that is a substring at the specified start and stop indices, inclu-sive. If the stop index is not specified, then the substring extends to the end of the original string. Ifstart is longer than the number of characters in the original string, or is greater than stop, an emptystring is returned. Negative start is coerced to 0.

Usage

h2o.substring(x, start, stop = "[]")

h2o.substr(x, start, stop = "[]")

Arguments

x The column on which to operate.

start The index of the first element to be included in the substring.

stop Optional, The index of the last element to be included in the substring.

h2o.summary 115

h2o.summary Summarizes the columns of an H2OFrame.

Description

A method for the summary generic. Summarizes the columns of an H2O data frame or subset ofcolumns and rows using vector notation (e.g. dataset[row, col])

Usage

h2o.summary(object, factors = 6L, ...)

## S3 method for class H2OFramesummary(object, factors, ...)

Arguments

object An H2OFrame object.

factors The number of factors to return in the summary. Default is the top 6.

... Further arguments passed to or from other methods.

Value

A table displaying the minimum, 1st quartile, median, mean, 3rd quartile and maximum for eachnumeric column, and the levels and category counts of the levels in each categorical column.

Examples

library(h2o)h2o.init()prosPath = system.file("extdata", "prostate.csv", package="h2o")prostate.hex = h2o.importFile(path = prosPath)summary(prostate.hex)summary(prostate.hex$GLEASON)summary(prostate.hex[,4:6])

116 h2o.svd

h2o.svd Singular Value Decomposition

Description

Singular value decomposition of an H2O data frame using the power method.

Usage

h2o.svd(training_frame, x, nv, destination_key, max_iterations = 1000,transform = "NONE", svd_method = c("GramSVD", "Power", "Randomized"),seed, use_all_factor_levels, max_runtime_secs = 0)

Arguments

training_frame An H2OFrame object containing the variables in the model.

x (Optional) A vector containing the data columns on which SVD operates.

nv The number of right singular vectors to be computed. This must be between 1and min(ncol(training_frame), nrow(training_frame)) inclusive.

destination_key

(Optional) The unique hex key assigned to the resulting model. Automaticallygenerated if none is provided.

max_iterations The maximum number of iterations to run each power iteration loop. Must bebetween 1 and 1e6 inclusive.

transform A character string that indicates how the training data should be transformedbefore running PCA. Possible values are: "NONE" for no transformation; "DE-MEAN" for subtracting the mean of each column; "DESCALE" for dividing bythe standard deviation of each column; "STANDARDIZE" for demeaning anddescaling; and "NORMALIZE" for demeaning and dividing each column by itsrange (max - min).

svd_method A character string that indicates how SVD should be calculated. Possible valuesare "GramSVD": distributed computation of the Gram matrix followed by alocal SVD using the JAMA package, "Power": computation of the SVD usingthe power iteration method, "Randomized": approximate SVD by projectingonto a random subspace (see references).

seed (Optional) Random seed used to initialize the right singular vectors at the begin-ning of each power method iteration.

use_all_factor_levels

(Optional) A logical value indicating whether all factor levels should be includedin each categorical column expansion. If FALSE, the indicator column corre-sponding to the first factor level of every categorical variable will be dropped.Defaults to TRUE.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

h2o.table 117

Value

Returns an object of class H2ODimReductionModel.

References

N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithmsfor constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev.,Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

Examples

library(h2o)h2o.init()ausPath <- system.file("extdata", "australia.csv", package="h2o")australia.hex <- h2o.uploadFile(path = ausPath)h2o.svd(training_frame = australia.hex, nv = 8)

h2o.table Cross Tabulation and Table Creation in H2O

Description

Uses the cross-classifying factors to build a table of counts at each combination of factor levels.

Usage

h2o.table(x, y = NULL, dense = TRUE)

table.H2OFrame(x, y = NULL, dense = TRUE)

Arguments

x An H2OFrame object with at most two columns.

y An H2OFrame similar to x, or NULL.

dense A logical for dense representation, which lists only non-zero counts, 1 combi-nation per row. Set to FALSE to expand counts across all combinations.

Value

Returns a tabulated H2OFrame object.

118 h2o.tabulate

Examples

library(h2o)h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath, destination_frame = "prostate.hex")summary(prostate.hex)

# Counts of the ages of all patientshead(h2o.table(prostate.hex[,3]))h2o.table(prostate.hex[,3])

# Two-way table of ages (rows) and race (cols) of all patientshead(h2o.table(prostate.hex[,c(3,4)]))h2o.table(prostate.hex[,c(3,4)])

h2o.tabulate Tabulation between Two Columns of an H2OFrame

Description

Simple Co-Occurrence based tabulation of X vs Y, where X and Y are two Vecs in a given dataset.Uses histogram of given resolution in X and Y. Handles numerical/categorical data and missingvalues. Supports observation weights.

Usage

h2o.tabulate(data, x, y, weights_column = NULL, nbins_x = 50,nbins_y = 50)

Arguments

data An H2OFrame object.

x predictor column

y response column

weights_column (optional) observation weights column

nbins_x number of bins for predictor column

nbins_y number of bins for response column

Value

Returns two TwoDimTables of 3 columns each count_table: X Y counts response_table: X meanYcounts

h2o.tolower 119

Examples

library(h2o)h2o.init()df <- as.h2o(iris)tab <- h2o.tabulate(data = df, x = "Sepal.Length", y = "Petal.Width",

weights_column = NULL, nbins_x = 10, nbins_y = 10)plot(tab)

h2o.tolower To Lower

Description

To Lower

Usage

h2o.tolower(x)

Arguments

x An H2OFrame object whose strings should be lower’d

h2o.totss Get the total sum of squares. If "train", "valid", and "xval" parametersare FALSE (default), then the training totss value is returned. If morethan one parameter is set to TRUE, then a named vector of totss’ arereturned, where the names are "train", "valid" or "xval".

Description

Get the total sum of squares. If "train", "valid", and "xval" parameters are FALSE (default), thenthe training totss value is returned. If more than one parameter is set to TRUE, then a named vectorof totss’ are returned, where the names are "train", "valid" or "xval".

Usage

h2o.totss(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OClusteringModel object.train Retrieve the training total sum of squaresvalid Retrieve the validation total sum of squaresxval Retrieve the cross-validation total sum of squares

120 h2o.toupper

h2o.tot_withinss Get the total within cluster sum of squares. If "train", "valid", and"xval" parameters are FALSE (default), then the training tot_withinssvalue is returned. If more than one parameter is set to TRUE, thena named vector of tot_withinss’ are returned, where the names are"train", "valid" or "xval".

Description

Get the total within cluster sum of squares. If "train", "valid", and "xval" parameters are FALSE(default), then the training tot_withinss value is returned. If more than one parameter is set toTRUE, then a named vector of tot_withinss’ are returned, where the names are "train", "valid" or"xval".

Usage

h2o.tot_withinss(object, train = FALSE, valid = FALSE, xval = FALSE)

Arguments

object An H2OClusteringModel object.

train Retrieve the training total within cluster sum of squares

valid Retrieve the validation total within cluster sum of squares

xval Retrieve the cross-validation total within cluster sum of squares

h2o.toupper To Upper

Description

To Upper

Usage

h2o.toupper(x)

Arguments

x An H2OFrame object whose strings should be upper’d

h2o.trim 121

h2o.trim Trim Space

Description

Trim Space

Usage

h2o.trim(x)

Arguments

x The column whose strings should be trimmed.

h2o.unique H2O Unique

Description

Extract unique values in the column.

Usage

h2o.unique(x)

Arguments

x An H2OFrame object.

h2o.var Variance of a column or covariance of columns.

Description

Compute the variance or covariance matrix of one or two H2OFrames.

Usage

h2o.var(x, y = NULL, na.rm = FALSE, use)

var(x, y = NULL, na.rm = FALSE, use)

122 h2o.varimp

Arguments

x An H2OFrame object.

y NULL (default) or an H2OFrame. The default is equivalent to y = x.

na.rm logical. Should missing values be removed?

use An optional character string indicating how to handle missing values. This mustbe one of the following:

See Also

var for the base R implementation. h2o.sd for standard deviation.

Examples

h2o.init()prosPath <- system.file("extdata", "prostate.csv", package="h2o")prostate.hex <- h2o.uploadFile(path = prosPath)var(prostate.hex$AGE)

h2o.varimp Retrieve the variable importance.

Description

Retrieve the variable importance.

Usage

h2o.varimp(object)

Arguments

object An H2OModel object.

h2o.week 123

h2o.week Convert Milliseconds to Week of Week Year in H2O Datasets

Description

Converts the entries of an H2OFrame object from milliseconds to weeks of the week year (startingfrom 1).

Usage

h2o.week(x)

week(x)

## S3 method for class H2OFrameweek(x)

Arguments

x An H2OFrame object.

Value

An H2OFrame object containing the entries of x converted to weeks of the week year.

See Also

h2o.month

h2o.weights Retrieve the respective weight matrix

Description

Retrieve the respective weight matrix

Usage

h2o.weights(object, matrix_id = 1)

Arguments

object An H2OModel or H2OModelMetrics

matrix_id An integer, ranging from 1 to number of layers + 1, that specifies the weightmatrix to return.

124 h2o.withinss

h2o.which Which indices are TRUE?

Description

Give the TRUE indices of a logical object, allowing for array indices.

Usage

h2o.which(x)

Arguments

x An H2OFrame object.

See Also

which for the base R method.

Examples

h2o.init()iris.hex <- as.h2o(iris)h2o.which(iris.hex[,1]==4.4)

h2o.withinss Get the Within SS

Description

Get the Within SS

Usage

h2o.withinss(object)

Arguments

object An H2OClusteringModel object.

h2o.year 125

h2o.year Convert Milliseconds to Years in H2O Datasets

Description

Convert the entries of an H2OFrame object from milliseconds to years, indexed starting from 1900.

Usage

h2o.year(x)

year(x)

## S3 method for class H2OFrameyear(x)

Arguments

x An H2OFrame object.

Details

This method calls the function of the MutableDateTime class in Java.

Value

An H2OFrame object containig the entries of x converted to years starting from 1900, e.g. 69corresponds to the year 1969.

See Also

h2o.month

H2OClusteringModel-class

The H2OClusteringModel object.

Description

This virtual class represents a clustering model built by H2O.

Details

This object has slots for the key, which is a character string that points to the model key existing inthe H2O cloud, the data used to build the model (an object of class H2OFrame).

126 H2OConnection-class

Slots

model_id A character string specifying the key for the model fit in the H2O cloud’s key-valuestore.

algorithm A character string specifying the algorithm that was used to fit the model.

parameters A list containing the parameter settings that were used to fit the model that differfrom the defaults.

allparameters A list containing all parameters used to fit the model.

model A list containing the characteristics of the model returned by the algorithm.

size The number of points in each cluster.

totss Total sum of squared error to grand mean.

withinss A vector of within-cluster sum of squared error.

tot_withinss Total within-cluster sum of squared error.

betweenss Between-cluster sum of squared error.

H2OConnection-class The H2OConnection class.

Description

This class represents a connection to an H2O cloud.

Usage

## S4 method for signature H2OConnectionshow(object)

Arguments

object an H2OConnection object.

Details

Because H2O is not a master-slave architecture, there is no restriction on which H2O node is usedto establish the connection between R (the client) and H2O (the server).

A new H2O connection is established via the h2o.init() function, which takes as parameters the ‘ip‘and ‘port‘ of the machine running an instance to connect with. The default behavior is to connectwith a local instance of H2O at port 54321, or to boot a new local instance if one is not found atport 54321.

H2OFrame-Extract 127

Slots

ip A character string specifying the IP address of the H2O cloud.

port A numeric value specifying the port number of the H2O cloud.

proxy A character specifying the proxy path of the H2O cloud.

https Set this to TRUE to use https instead of http.

insecure Set this to TRUE to disable SSL certificate checking.

username Username to login with.

password Password to login with.

mutable An H2OConnectionMutableState object to hold the mutable state for the H2O connec-tion.

H2OFrame-Extract Extract or Replace Parts of an H2OFrame Object

Description

Operators to extract or replace parts of H2OFrame objects.

Usage

## S3 method for class H2OFramedata[row, col, drop = TRUE]

## S3 method for class H2OFramex$name

## S3 method for class H2OFramex[[i, exact = TRUE]]

## S3 method for class H2OFramex$name

## S3 method for class H2OFramex[[i, exact = TRUE]]

## S3 replacement method for class H2OFramedata[row, col, ...] <- value

## S3 replacement method for class H2OFramedata$name <- value

## S3 replacement method for class H2OFramedata[[name]] <- value

128 H2OGrid-class

Arguments

data object from which to extract element(s) or in which to replace element(s).

row index specifying row element(s) to extract or replace. Indices are numeric orcharacter vectors or empty (missing) or will be matched to the names.

col index specifying column element(s) to extract or replace.

drop Unused

x An H2OFrame

name a literal character string or a name (possibly backtick quoted).

i index

exact controls possible partial matching of [[ when extracting a character

... Further arguments passed to or from other methods.

value To be assigned

H2OGrid-class H2O Grid

Description

A class to contain the information about grid results

Format grid object in user-friendly way

Usage

## S4 method for signature H2OGridshow(object)

Arguments

object an H2OGrid object.

Slots

grid_id the final identifier of grid

model_ids list of model IDs which are included in the grid object

hyper_names list of parameter names used for grid search

failed_params list of model parameters which caused a failure during model building, it cancontain a null value

failure_details list of detailed messages which correspond to failed parameters field

failure_stack_traces list of stack traces corresponding to model failures reported by failed_paramsand failure_details fields

failed_raw_params list of failed raw parameters

summary_table table of models built with parameters and metric information.

H2OModel-class 129

See Also

H2OModel for the final model types.

H2OModel-class The H2OModel object.

Description

This virtual class represents a model built by H2O.

Usage

## S4 method for signature H2OModelshow(object)

Arguments

object an H2OModel object.

Details

This object has slots for the key, which is a character string that points to the model key existing inthe H2O cloud, the data used to build the model (an object of class H2OFrame).

Slots

model_id A character string specifying the key for the model fit in the H2O cloud’s key-valuestore.

algorithm A character string specifying the algorithm that were used to fit the model.

parameters A list containing the parameter settings that were used to fit the model that differfrom the defaults.

allparameters A list containg all parameters used to fit the model.

model A list containing the characteristics of the model returned by the algorithm.

130 H2OModelMetrics-class

H2OModelFuture-class H2O Future Model

Description

A class to contain the information for background model jobs.

Slots

job_key a character key representing the identification of the job process.

model_id the final identifier for the model

See Also

H2OModel for the final model types.

H2OModelMetrics-class The H2OModelMetrics Object.

Description

A class for constructing performance measures of H2O models.

Usage

## S4 method for signature H2OModelMetricsshow(object)

## S4 method for signature H2OBinomialMetricsshow(object)

## S4 method for signature H2OMultinomialMetricsshow(object)

## S4 method for signature H2ORegressionMetricsshow(object)

## S4 method for signature H2OClusteringMetricsshow(object)

## S4 method for signature H2OAutoEncoderMetricsshow(object)

## S4 method for signature H2ODimReductionMetricsshow(object)

housevotes 131

Arguments

object An H2OModelMetrics object

housevotes United States Congressional Voting Records 1984

Description

This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16key votes identified by the CQA. The CQA lists nine different types of votes: voted for, pairedfor, and announced for (these three simplified to yea), voted against, paired against, and announcedagainst (these three simplified to nay), voted present, voted present to avoid conflict of interest, anddid not vote or otherwise make a position known (these three simplified to an unknown disposition).

Format

A data frame with 435 rows and 17 columns

Source

Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: CongressionalQuarterly Inc., Washington, D.C., 1985

References

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machinelearning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: Universityof California, Department of Information and Computer Science.

iris Edgar Anderson’s Iris Data

Description

Measurements in centimeters of the sepal length and width and petal length and width, respectively,for three species of iris flowers.

Format

A data frame with 150 rows and 5 columns

Source

Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics,7, Part II, 179-188.

The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin ofthe American Iris Society, 59, 2-5.

132 is.numeric

is.character Check if character

Description

Check if character

Usage

is.character(x)

Arguments

x An H2OFrame object

is.factor Check if factor

Description

Check if factor

Usage

is.factor(x)

Arguments

x An H2OFrame object

is.numeric Check if numeric

Description

Check if numeric

Usage

is.numeric(x)

Arguments

x An H2OFrame object

ModelAccessors 133

ModelAccessors Accessor Methods for H2OModel Object

Description

Function accessor methods for various H2O output fields.

Usage

getParms(object)

## S4 method for signature H2OModelgetParms(object)

getCenters(object)

getCentersStd(object)

getWithinSS(object)

getTotWithinSS(object)

getBetweenSS(object)

getTotSS(object)

getIterations(object)

getClusterSizes(object)

## S4 method for signature H2OClusteringModelgetCenters(object)

## S4 method for signature H2OClusteringModelgetCentersStd(object)

## S4 method for signature H2OClusteringModelgetWithinSS(object)

## S4 method for signature H2OClusteringModelgetTotWithinSS(object)

## S4 method for signature H2OClusteringModelgetBetweenSS(object)

## S4 method for signature H2OClusteringModelgetTotSS(object)

134 names.H2OFrame

## S4 method for signature H2OClusteringModelgetIterations(object)

## S4 method for signature H2OClusteringModelgetClusterSizes(object)

Arguments

object an H2OModel class object.

na.omit.H2OFrame Remove Rows With NAs

Description

Remove Rows With NAs

Usage

## S3 method for class H2OFramena.omit(object, ...)

Arguments

object H2OFrame object

... Ignored

names.H2OFrame Column names of an H2OFrame

Description

Column names of an H2OFrame

Usage

## S3 method for class H2OFramenames(x)

Arguments

x An H2OFrame

Ops.H2OFrame 135

Ops.H2OFrame S3 Group Generic Functions for H2O

Description

Methods for group generic functions and H2O objects.

Usage

## S3 method for class H2OFrameOps(e1, e2)

## S3 method for class H2OFrameMath(x, ...)

## S3 method for class H2OFrameMath(x, ...)

## S3 method for class H2OFrameMath(x, ...)

## S3 method for class H2OFrameSummary(x, ..., na.rm)

## S3 method for class H2OFrame!x

## S3 method for class H2OFrameis.na(x)

## S3 method for class H2OFramet(x)

log(x, ...)

log10(x)

log2(x)

log1p(x)

trunc(x, ...)

x %*% y

nrow.H2OFrame(x)

136 plot.H2OModel

ncol.H2OFrame(x)

## S3 method for class H2OFramelength(x)

h2o.length(x)

## S3 replacement method for class H2OFramenames(x) <- value

colnames(x) <- value

Arguments

e1 object

e2 object

x object

... Further arguments passed to or from other methods.

na.rm logical. whether or not missing values should be removed

y object

value To be assigned

plot.H2OModel Plot an H2O Model

Description

Plots training set (and validation set if available) scoring history for an H2O Model

Usage

## S3 method for class H2OModelplot(x, timestep = "AUTO", metric = "AUTO", ...)

Arguments

x A fitted H2OModel object for which the scoring history plot is desired.

timestep A unit of measurement for the x-axis.

metric A unit of measurement for the y-axis.

... additional arguments to pass on.

Details

This method dispatches on the type of H2O model to select the correct scoring history. Thetimestep and metric arguments are restricted to what is available in the scoring history for aparticular type of model.

plot.H2OTabulate 137

Value

Returns a scoring history plot.

See Also

link{h2o.deeplearning}, link{h2o.gbm}, link{h2o.glm}, link{h2o.randomForest} for modelgeneration in h2o.

Examples

library(h2o)library(mlbench)h2o.init()

df <- as.h2o(mlbench::mlbench.friedman1(10000,1))rng <- h2o.runif(df, seed=1234)train <- df[rng<0.8,]valid <- df[rng>=0.8,]

gbm <- h2o.gbm(x = 1:10, y = "y", training_frame = train, validation_frame = valid,ntrees=500, learn_rate=0.01, score_each_iteration = TRUE)

plot(gbm)plot(gbm, timestep = "duration", metric = "deviance")plot(gbm, timestep = "number_of_trees", metric = "deviance")plot(gbm, timestep = "number_of_trees", metric = "MSE")

plot.H2OTabulate Plot an H2O Tabulate Heatmap

Description

Plots the simple co-occurrence based tabulation of X vs Y as a heatmap, where X and Y are twoVecs in a given dataset.

Usage

## S3 method for class H2OTabulateplot(x, xlab = x$cols[1], ylab = x$cols[2],

base_size = 12, ...)

Arguments

x An H2OTabulate object for which the heatmap plot is desired.

xlab A title for the x-axis. Defaults to what is specified in the given H2OTabulateobject.

138 predict.H2OModel

ylab A title for the y-axis. Defaults to what is specified in the given H2OTabulateobject.

base_size Base font size for plot.... additional arguments to pass on.

Value

Returns a ggplot2-based heatmap of co-occurance.

See Also

link{h2o.tabulate}

Examples

library(h2o)h2o.init()df <- as.h2o(iris)tab <- h2o.tabulate(data = df, x = "Sepal.Length", y = "Petal.Width",

weights_column = NULL, nbins_x = 10, nbins_y = 10)plot(tab)

predict.H2OModel Predict on an H2O Model

Description

Obtains predictions from various fitted H2O model objects.

Usage

## S3 method for class H2OModelpredict(object, newdata, ...)

h2o.predict(object, newdata, ...)

Arguments

object a fitted H2OModel object for which prediction is desirednewdata An H2OFrame object in which to look for variables with which to predict.... additional arguments to pass on.

Details

This method dispatches on the type of H2O model to select the correct prediction/scoring algorithm.The order of the rows in the results is the same as the order in which the data was loaded, even ifsome rows fail (for example, due to missing values or unseen factor levels).

print.H2OFrame 139

Value

Returns an H2OFrame object with probabilites and default predictions.

See Also

h2o.deeplearning, h2o.gbm, h2o.glm, h2o.randomForest for model generation in h2o.

print.H2OFrame Print An H2OFrame

Description

Print An H2OFrame

Usage

## S3 method for class H2OFrameprint(x, ...)

Arguments

x An H2OFrame object

... Further arguments to be passed from or to other methods.

print.H2OTable Print method for H2OTable objects

Description

This will print a truncated view of the table if there are more than 20 rows.

Usage

## S3 method for class H2OTableprint(x, header = TRUE, ...)

Arguments

x An H2OTable object

header A logical value dictating whether or not the table name should be printed.

... Further arguments passed to or from other methods.

Value

The original x object

140 range.H2OFrame

prostate Prostate Cancer Study

Description

Baseline exam results on prostate cancer patients from Dr. Donn Young at The Ohio State Univer-sity Comprehensive Cancer Center.

Format

A data frame with 380 rows and 9 columns

Source

Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition.

range.H2OFrame Range of an H2O Column

Description

Range of an H2O Column

Usage

## S3 method for class H2OFramerange(..., na.rm = TRUE)

Arguments

... An H2OFrame object.

na.rm ignore missing values

str.H2OFrame 141

str.H2OFrame Display the structure of an H2OFrame object

Description

Display the structure of an H2OFrame object

Usage

## S3 method for class H2OFramestr(object, ..., cols = FALSE)

Arguments

object An H2OFrame.

... Further arguments to be passed from or to other methods.

cols Print the per-column str for the H2OFrame

summary,H2OGrid-method

Format grid object in user-friendly way

Description

Format grid object in user-friendly way

Usage

## S4 method for signature H2OGridsummary(object, show_stack_traces = FALSE)

Arguments

object an H2OGrid object.

show_stack_traces

a flag to show stack traces for model failures

142 walking

summary,H2OModel-method

Print the Model Summary

Description

Print the Model Summary

Usage

## S4 method for signature H2OModelsummary(object, ...)

Arguments

object An H2OModel object.

... further arguments to be passed on (currently unimplemented)

walking Muscular Actuations for Walking Subject

Description

The musculoskeletal model, experimental data, settings files, and results for three-dimensional,muscle-actuated simulations at walking speed as described in Hamner and Delp (2013). Simulationswere generated using OpenSim 2.4. The data is available from https://simtk.org/project/xml/downloads.xml?group_id=603.

Format

A data frame with 151 rows and 124 columns

References

Hamner, S.R., Delp, S.L. Muscle contributions to fore-aft and vertical body mass center accelera-tions over a range of running speeds. Journal of Biomechanics, vol 46, pp 780-787. (2013)

zzz 143

zzz Shutdown H2O cloud after examples run

Description

Shutdown H2O cloud after examples run

Examples

library(h2o)h2o.init()h2o.shutdown(prompt = FALSE)Sys.sleep(3)

Index

!.H2OFrame (Ops.H2OFrame), 135∗Topic datasets

australia, 11housevotes, 131iris, 131prostate, 140walking, 142

∗Topic packageh2o-package, 5

[,H2OFrame-method (H2OFrame-Extract),127

[.H2OFrame (H2OFrame-Extract), 127[<-.H2OFrame (H2OFrame-Extract), 127[[.H2OFrame (H2OFrame-Extract), 127[[<-.H2OFrame (H2OFrame-Extract), 127$.H2OFrame (H2OFrame-Extract), 127$<-.H2OFrame (H2OFrame-Extract), 127%*% (Ops.H2OFrame), 135%in% (h2o.match), 76

aaa, 6apply, 6, 7as.character.H2OFrame, 7as.data.frame.H2OFrame, 7as.factor, 8as.h2o, 9as.matrix.H2OFrame, 9as.numeric, 10as.vector.H2OFrame, 10australia, 11

cbind, 17colnames, 11colnames<- (Ops.H2OFrame), 135cut.H2OFrame (h2o.cut), 25

day (h2o.day), 26dayOfWeek (h2o.dayOfWeek), 26ddply, 28dim, 12

dim.H2OFrame, 11dimnames.H2OFrame, 12

getBetweenSS (ModelAccessors), 133getBetweenSS,H2OClusteringModel-method

(ModelAccessors), 133getCenters (ModelAccessors), 133getCenters,H2OClusteringModel-method

(ModelAccessors), 133getCentersStd (ModelAccessors), 133getCentersStd,H2OClusteringModel-method

(ModelAccessors), 133getClusterSizes (ModelAccessors), 133getClusterSizes,H2OClusteringModel-method

(ModelAccessors), 133getIterations (ModelAccessors), 133getIterations,H2OClusteringModel-method

(ModelAccessors), 133getParms (ModelAccessors), 133getParms,H2OModel-method

(ModelAccessors), 133getTotSS (ModelAccessors), 133getTotSS,H2OClusteringModel-method

(ModelAccessors), 133getTotWithinSS (ModelAccessors), 133getTotWithinSS,H2OClusteringModel-method

(ModelAccessors), 133getWithinSS (ModelAccessors), 133getWithinSS,H2OClusteringModel-method

(ModelAccessors), 133

h2o (h2o-package), 5h2o-package, 5h2o.accuracy (h2o.metric), 80h2o.aic, 12h2o.anomaly, 13h2o.anyFactor, 14h2o.assign, 14, 104h2o.auc, 15, 49, 53, 81, 83h2o.betweenss, 16, 72

144

INDEX 145

h2o.biases, 16h2o.cbind, 17h2o.centers, 17, 72h2o.centersSTD, 18, 72h2o.centroid_stats, 18h2o.clearLog, 19, 89, 112, 113h2o.cluster_sizes, 21, 72h2o.clusterInfo, 19h2o.clusterIsUp, 20h2o.clusterStatus, 20h2o.coef, 21h2o.coef_norm, 22h2o.confusionMatrix, 22, 53h2o.confusionMatrix,H2OModel-method

(h2o.confusionMatrix), 22h2o.confusionMatrix,H2OModelMetrics-method

(h2o.confusionMatrix), 22h2o.createFrame, 23h2o.cut, 25h2o.day, 26, 27, 62h2o.dayOfWeek, 26h2o.dct, 27h2o.ddply, 28h2o.deepfeatures, 29h2o.deeplearning, 13, 30, 139h2o.describe, 35h2o.download_pojo, 37h2o.downloadAllLogs, 36h2o.downloadCSV, 36h2o.error (h2o.metric), 80h2o.exportFile, 38h2o.exportHDFS, 39h2o.F0point5 (h2o.metric), 80h2o.F1 (h2o.metric), 80h2o.F2 (h2o.metric), 80h2o.fallout (h2o.metric), 80h2o.filterNACols, 39h2o.find_row_by_threshold, 40h2o.find_threshold_by_max_metric, 40h2o.fnr (h2o.metric), 80h2o.fpr (h2o.metric), 80h2o.gainsLift, 41h2o.gainsLift,H2OModel-method

(h2o.gainsLift), 41h2o.gainsLift,H2OModelMetrics-method

(h2o.gainsLift), 41h2o.gbm, 42, 139h2o.getConnection, 45

h2o.getFrame, 45h2o.getFutureModel, 46h2o.getGrid, 46h2o.getId, 47h2o.getModel, 47h2o.getTimezone, 48h2o.getTypes, 48h2o.getVersion, 49h2o.giniCoef, 15, 49, 49, 53, 81h2o.glm, 5, 50, 139h2o.glrm, 54, 93, 94, 100h2o.grid, 57h2o.group_by, 58h2o.gsub, 59h2o.head, 59h2o.hist, 60h2o.hit_ratio_table, 61h2o.hour, 61h2o.ifelse, 62h2o.importFile, 63h2o.importFolder (h2o.importFile), 63h2o.importHDFS (h2o.importFile), 63h2o.importURL (h2o.importFile), 63h2o.impute, 64h2o.init, 20, 65, 110h2o.insertMissingValues, 68h2o.interaction, 69h2o.is_client, 70h2o.killMinus3, 70h2o.kmeans, 56, 71h2o.length (Ops.H2OFrame), 135h2o.levels, 72h2o.listTimezones, 73h2o.loadModel, 73, 106h2o.logAndEcho, 74h2o.logloss, 53, 74h2o.ls, 75, 104h2o.makeGLMModel, 75h2o.match, 76h2o.maxPerClassError (h2o.metric), 80h2o.mcc (h2o.metric), 80h2o.mean, 76h2o.mean_residual_deviance, 77h2o.median, 78h2o.merge, 79h2o.metric, 15, 49, 80, 83h2o.missrate (h2o.metric), 80h2o.mktime, 81

146 INDEX

h2o.month, 26, 27, 82, 123, 125h2o.mse, 15, 53, 81, 83, 83h2o.nacnt, 84h2o.naiveBayes, 84h2o.nchar, 86h2o.networkTest, 86h2o.nlevels, 86h2o.no_progress, 87h2o.null_deviance, 87h2o.null_dof, 88h2o.num_iterations, 72, 88h2o.openLog, 19, 89, 112, 113h2o.parseRaw, 89h2o.parseSetup, 90h2o.performance, 15, 23, 41, 49, 53, 81, 83,

91h2o.prcomp, 56, 92h2o.precision (h2o.metric), 80h2o.predict (predict.H2OModel), 138h2o.proj_archetypes, 94h2o.quantile, 95h2o.r2, 96h2o.randomForest, 97, 139h2o.rbind, 99h2o.recall (h2o.metric), 80h2o.reconstruct, 100h2o.removeAll, 101h2o.removeVecs, 102h2o.rep_len, 102h2o.residual_deviance, 103h2o.residual_dof, 103h2o.rm, 101, 104h2o.round, 104h2o.runif, 105h2o.saveModel, 73, 106h2o.scale, 107h2o.scoreHistory, 53, 107h2o.sd, 108, 122h2o.sdev, 108h2o.sensitivity (h2o.metric), 80h2o.setLevels, 109h2o.setTimezone, 109h2o.show_progress, 109h2o.shutdown, 67, 110h2o.signif, 111h2o.specificity (h2o.metric), 80h2o.splitFrame, 111h2o.startLogging, 19, 89, 112, 113

h2o.stopLogging, 19, 89, 112, 113h2o.strsplit, 113h2o.sub, 114h2o.substr (h2o.substring), 114h2o.substring, 114h2o.summary, 115h2o.svd, 56, 93, 116h2o.table, 117h2o.tabulate, 118h2o.tail (h2o.head), 59h2o.tnr (h2o.metric), 80h2o.tolower, 119h2o.tot_withinss, 72, 120h2o.totss, 72, 119h2o.toupper, 120h2o.tpr (h2o.metric), 80h2o.trim, 121h2o.unique, 121h2o.uploadFile (h2o.importFile), 63h2o.var, 108, 121h2o.varimp, 53, 122h2o.week, 123h2o.weights, 123h2o.which, 124h2o.withinss, 72, 124h2o.year, 82, 125H2OAutoEncoderMetrics-class

(H2OModelMetrics-class), 130H2OAutoEncoderModel, 13H2OAutoEncoderModel-class

(H2OModel-class), 129H2OBinomialMetrics, 15, 22, 23, 41, 49, 74,

81, 83H2OBinomialMetrics-class

(H2OModelMetrics-class), 130H2OBinomialModel, 53, 85H2OBinomialModel-class

(H2OModel-class), 129H2OClusteringMetrics-class

(H2OModelMetrics-class), 130H2OClusteringModel, 16–18, 21, 72, 88, 119,

120, 124H2OClusteringModel-class, 125H2OConnection, 20, 45H2OConnection (H2OConnection-class), 126H2OConnection-class, 126H2ODimReductionMetrics-class

(H2OModelMetrics-class), 130

INDEX 147

H2ODimReductionModel, 56, 93, 94, 100, 108,117

H2ODimReductionModel-class(H2OModel-class), 129

H2OFrame-Extract, 127H2OGrid (H2OGrid-class), 128H2OGrid-class, 128H2OModel, 13, 16, 21–23, 29, 39, 41, 47, 53,

61, 73, 75, 77, 87, 88, 91, 96, 99,103, 104, 106, 108, 122, 123, 129,130, 134, 136, 138, 142

H2OModel (H2OModel-class), 129H2OModel-class, 129H2OModelFuture-class, 130H2OModelMetrics, 13, 16, 22, 23, 41, 74, 80,

83, 87, 88, 91, 103, 104, 123H2OModelMetrics

(H2OModelMetrics-class), 130H2OModelMetrics-class, 130H2OMultinomialMetrics, 23, 74, 83H2OMultinomialMetrics-class

(H2OModelMetrics-class), 130H2OMultinomialModel, 85H2OMultinomialModel-class

(H2OModel-class), 129H2ORegressionMetrics, 83H2ORegressionMetrics-class

(H2OModelMetrics-class), 130H2ORegressionModel, 53H2ORegressionModel-class

(H2OModel-class), 129H2OUnknownMetrics-class

(H2OModelMetrics-class), 130H2OUnknownModel-class (H2OModel-class),

129head.H2OFrame (h2o.head), 59hour (h2o.hour), 61housevotes, 131

ifelse (h2o.ifelse), 62iris, 131is.character, 132is.factor, 8, 132is.na.H2OFrame (Ops.H2OFrame), 135is.numeric, 132

length.H2OFrame (Ops.H2OFrame), 135levels, 72log (Ops.H2OFrame), 135

log10 (Ops.H2OFrame), 135log1p (Ops.H2OFrame), 135log2 (Ops.H2OFrame), 135

match, 76match.H2OFrame (h2o.match), 76Math.H2OFrame (Ops.H2OFrame), 135mean, 77mean.H2OFrame (h2o.mean), 76median.H2OFrame (h2o.median), 78ModelAccessors, 133month (h2o.month), 82

na.omit.H2OFrame, 134names.H2OFrame, 134names<-.H2OFrame (Ops.H2OFrame), 135ncol.H2OFrame (Ops.H2OFrame), 135nlevels, 86nrow.H2OFrame (Ops.H2OFrame), 135

Ops.H2OFrame, 135

plot.H2OModel, 136plot.H2OTabulate, 137predict, 23, 41predict.H2OModel, 34, 44, 53, 99, 138print.H2OFrame, 139print.H2OTable, 139prostate, 140

quantile, 95quantile.H2OFrame (h2o.quantile), 95

range.H2OFrame, 140rbind, 99round, 105round (h2o.round), 104

scale.H2OFrame (h2o.scale), 107sd, 108sd (h2o.sd), 108show,H2OAutoEncoderMetrics-method

(H2OModelMetrics-class), 130show,H2OBinomialMetrics-method

(H2OModelMetrics-class), 130show,H2OClusteringMetrics-method

(H2OModelMetrics-class), 130show,H2OConnection-method

(H2OConnection-class), 126

148 INDEX

show,H2ODimReductionMetrics-method(H2OModelMetrics-class), 130

show,H2OGrid-method (H2OGrid-class), 128show,H2OModel-method (H2OModel-class),

129show,H2OModelMetrics-method

(H2OModelMetrics-class), 130show,H2OMultinomialMetrics-method

(H2OModelMetrics-class), 130show,H2ORegressionMetrics-method

(H2OModelMetrics-class), 130signif, 111signif (h2o.signif), 111str.H2OFrame, 141summary, 115summary,H2OGrid-method, 141summary,H2OModel-method, 142Summary.H2OFrame (Ops.H2OFrame), 135summary.H2OFrame (h2o.summary), 115

t.H2OFrame (Ops.H2OFrame), 135table.H2OFrame (h2o.table), 117tail.H2OFrame (h2o.head), 59trunc (Ops.H2OFrame), 135

var, 122var (h2o.var), 121

walking, 142week (h2o.week), 123which, 124

year (h2o.year), 125

zzz, 143