``pca_method`` -------------- - Available in: PCA - Hyperparameter: no Description ~~~~~~~~~~~ Use the ``pca_method`` parameter to specify the algorithm to use for computing the principal components. Available options include: - **GramSVD**: Uses a distributed computation of the Gram matrix, followed by a local SVD using the JAMA package - **Power**: Computes the SVD using the power iteration method (experimental) - **Randomized**: Uses randomized subspace iteration method - **GLRM**: Fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental) **Note**: For ``pca_method = Randomized``, the algorithm must deal with matrices of size *m* by *k* and *n* by *k*, where - *m* is number of rows, - *n* is expanded column size and - *k* is the number of eigenvectors desired. As a result, there is no advantage to be gained by trying to find the eigenvectors of the matrix transpose. In other words, when using PCA with wide datasets, users should not choose Randomize method. Related Parameters ~~~~~~~~~~~~~~~~~~ - None Example ~~~~~~~ .. tabs:: .. code-tab:: r R library(h2o) h2o.init() # Load the Birds dataset birds <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv") # Train using the Power pca_method birds_pca <- h2o.prcomp(training_frame = birds, transform = "STANDARDIZE", k = 3, pca_method = "Power", use_all_factor_levels = TRUE, impute_missing = TRUE) # View the importance of components birds_pca@model$importance Importance of components: pc1 pc2 pc3 Standard deviation 1.496991 1.351000 1.014182 Proportion of Variance 0.289987 0.236184 0.133098 Cumulative Proportion 0.289987 0.526171 0.659269 # View the eigenvectors birds_pca@model$eigenvectors Rotation: pc1 pc2 pc3 patch.Ref1a 0.007207 0.007449 0.001161 patch.Ref1b -0.003090 0.011257 -0.001066 patch.Ref1c 0.002962 0.008850 -0.000264 patch.Ref1d -0.001295 0.011003 0.000501 patch.Ref1e 0.006559 0.006904 -0.001206 --- pc1 pc2 pc3 S 0.463591 -0.053410 0.184799 year -0.055934 0.009691 -0.968635 area 0.533375 -0.289381 -0.130338 log.area. 0.583966 -0.262287 -0.089582 ENN -0.270615 -0.573900 0.038835 log.ENN. -0.231368 -0.640231 0.026325 # Train again using GLRM pca_method birds2_pca <- h2o.prcomp(training_frame = birds, transform = "STANDARDIZE", k = 3, pca_method = "GLRM", use_all_factor_levels = TRUE, impute_missing = TRUE) # View the importance of components birds2_pca@model$importance Importance of components: pc1 pc2 pc3 Standard deviation 2.659459 0.700971 0.404706 Proportion of Variance 0.915223 0.063583 0.021194 Cumulative Proportion 0.915223 0.978806 1.000000 # View the eigenvectors birds2_pca@model$eigenvectors Rotation: pc1 pc2 pc3 patch.Ref1a -0.092008 0.030110 -0.018916 patch.Ref1b -0.107461 0.040519 0.076546 patch.Ref1c -0.103785 0.059700 0.016164 patch.Ref1d -0.105764 0.044823 0.062234 patch.Ref1e -0.102115 0.058994 -0.037536 --- pc1 pc2 pc3 S 0.003558 0.111264 -0.422437 year 0.000008 -0.004418 0.032813 area 0.004551 0.049496 -0.444745 log.area. 0.002756 0.066183 -0.453866 ENN 0.013259 -0.274711 -0.053960 log.ENN. 0.009517 -0.282830 -0.107461 .. code-tab:: python import(h2o) h2o.init() from h2o.estimators.pca import H2OPrincipalComponentAnalysisEstimator # Load the Birds dataset birds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv") # Train with the Power pca_method birds.pca = H2OPrincipalComponentAnalysisEstimator(k = 3, transform = "STANDARDIZE", pca_method="Power", use_all_factor_levels=True, impute_missing=True) birds.pca.train(x=list(range(4)), training_frame=birds) # View the importance of components birds.pca.varimp(use_pandas=False) [(u'Standard deviation', 1.0505993078459912, 0.8950182545325247, 0.5587566783073901), (u'Proportion of Variance', 0.28699613488673914, 0.20828865401845226, 0.08117966990084355), (u'Cumulative Proportion', 0.28699613488673914, 0.4952847889051914, 0.5764644588060349)] # View the eigenvectors birds.pca.rotation() Rotation: pc1 pc2 pc3 ----------------- ------------------ ----------------- ---------------- patch.Ref1a 0.00732398141913 -0.0141576160836 0.0294419461081 patch.Ref1b -0.00482860843905 0.00867426840498 0.0330778190153 patch.Ref1c 0.00124768649004 -0.00274167383932 0.0312598825617 patch.Ref1d -0.000370181920761 0.000297923901103 0.0317439245635 patch.Ref1e 0.00223394447742 -0.00459462277502 0.0309648089406 --- --- --- --- landscape.Bauxite -0.0638494513759 0.136728811833 0.118858152002 landscape.Forest 0.0378085502606 -0.0833578672691 0.969316569884 landscape.Urban -0.0545759062856 0.111309410422 0.0354475756223 S 0.564501605704 -0.767095710638 -0.0466832766991 year -0.814596906726 -0.577331674836 -0.0101626722479 See the whole table with table.as_data_frame() # Train again with the GLRM pca_method birds2 = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv") birds2.pca = H2OPrincipalComponentAnalysisEstimator(k = 3, transform = "STANDARDIZE", pca_method="GLRM", use_all_factor_levels=True, impute_missing=True) birds2.pca.train(x=list(range(4)), training_frame=birds2) # View the importance of components birds2.pca.varimp(use_pandas=False) [(u'Standard deviation', 1.9286830840160667, 0.2896650415698226, 0.2053712844270903), (u'Proportion of Variance', 0.9672162180423401, 0.021816948059531167, 0.01096683389812861), (u'Cumulative Proportion', 0.9672162180423401, 0.9890331661018713, 0.9999999999999999)] # View the eigenvectors birds2.pca.rotation() Rotation: pc1 pc2 pc3 ----------------- ----------------- ----------------- ----------------- patch.Ref1a -0.0973454860413 0.0233748845619 -0.0407839669099 patch.Ref1b -0.0979880717715 -0.0167446302072 -0.0162149496631 patch.Ref1c -0.0971529563124 0.00536661170128 -0.0177009628488 patch.Ref1d -0.100657197505 0.00754923938494 -0.018364320893 patch.Ref1e -0.0982933822825 0.0158116058361 -0.0193764027317 --- --- --- --- landscape.Bauxite -0.0248166745792 -0.504864083913 0.074374750806 landscape.Forest -0.0296555294277 0.232678445269 -0.537738667852 landscape.Urban -0.0733909967344 -0.112998988851 0.0347355699687 S 0.00878461186804 0.649068763107 -0.130282514102 year -0.000583301909773 -0.0765116904321 -0.69416666169 # See the whole table with table.as_data_frame()