Sort columns

Use the sort function in Python or the arrange function in R to create a new frame that is sorted by column(s) in ascending (default) or descending order.

If only one column is specified in the sort, then the final results are sorted according to that one single column either in ascending (default) or in descending order. However, if you specify more than one column in the sort, then H2O-3 performs as described below:

Assuming two columns, X (first column) and Y (second column):

  • H2O-3 will sort on the first specified column, so in the case of [0,1], the X column will be sorted first. Similarly, in the case of [1,0], the Y column will be sorted first.

  • H2O-3 will sort on subsequent columns in the order they are specified, but only on those rows that have the same values as the first sorted column. No sorting will be done on subsequent columns if the values are not also duplicated in the first sorted column.

import h2o
h2o.init()

# Import the smallIntFloats dataset
df1 = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/synthetic/smallIntFloats.csv.zip")
df1
              C1               C10
----------------  ----------------
 68379                -1.61867e+07
     6.71089e+07   32768
 32768                -8.70946e+08
    32            131072
     2.68435e+08     -29.1003
     1.05383e+08      -2.39721e+08
350191             21551.4
  -188                 2.39872e+07
   493               525.825
     9.31041e+07      -1.63828e+08

[180000 rows x 2 columns]

# Sort on the first column only in ascending order (default)
sorted_column_indices=[0]
df2 = df1.sort(0)
df2
          C1               C10
------------  ----------------
-1.07359e+09  747438
-1.07356e+09      -2.09715e+06
-1.07352e+09       5.11077e+06
-1.07342e+09       2.22094e+06
-1.07336e+09      -5.7076
-1.07336e+09   -4650.33
-1.07326e+09      -1.04858e+06
-1.07307e+09    8192
-1.07291e+09      -1.49017
-1.07291e+09   -9337.5

[180000 rows x 2 columns]

# Sort on both columns in descending order, specifying to sort on C1 first
df3 = df1.sort([0,1], ascending=[False, False])
df3
         C1                C10
-----------  -----------------
1.07359e+09      256
1.07352e+09     -128
1.07326e+09       15.6169
1.07307e+09        1.88421
1.07276e+09      441.817
1.07267e+09     -512
1.07233e+09     1444.14
1.07184e+09  -231812
1.07096e+09        2.00296e+07
1.07082e+09        5.36871e+08

[180000 rows x 2 columns]

# Sort on the second column in descending order
df4 = df1.sort(1, ascending=False)
df4
               C1          C10
-----------------  -----------
      3.21418e+08  1.07366e+09
    448            1.07357e+09
     85            1.07329e+09
  -4096            1.07291e+09
     28            1.07289e+09
     -4.1943e+06   1.07275e+09
      6.61688e+06  1.07254e+09
 -50127            1.07235e+09
-262144            1.07207e+09
     55            1.07175e+09

[180000 rows x 2 columns]
# Currently, this function only supports `all.x = TRUE`. All other permutations will fail.
library(h2o)
h2o.init()

# Import the smallIntFloats dataset
X <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/synthetic/smallIntFloats.csv.zip")
X
         C1           C10
1     68379 -1.618668e+07
2  67108864  3.276800e+04
3     32768 -8.709456e+08
4        32  1.310720e+05
5 268435456 -2.910033e+01
6 105383117 -2.397206e+08

[180000 rows x 2 columns]

# Sort on the first column only in ascending order (default)
X_sorted1 <- h2o.arrange(X, C1)
X_sorted1
           C1           C10
1 -1073593184  7.474380e+05
2 -1073563127 -2.097152e+06
3 -1073521109  5.110769e+06
4 -1073416724  2.220942e+06
5 -1073361973 -5.707598e+00
6 -1073357712 -4.650334e+03

[180000 rows x 2 columns]

# Sort on both columns in descending order, specifying to sort on C1 first
X_sorted2 <- h2o.arrange(X, desc(C1), desc(C10))
X_sorted2
          C1         C10
1 1073593184  256.000000
2 1073521109 -128.000000
3 1073257966   15.616867
4 1073072648    1.884208
5 1072757094  441.816579
6 1072669626 -512.000000

[180000 rows x 2 columns]

# Sort on the second column in descending order
X_sorted3 <- h2o.arrange(X, desc(C10))
X_sorted3
         C1        C10
1 321417689 1073662860
2       448 1073574390
3        85 1073288384
4     -4096 1072908385
5        28 1072890306
6  -4194304 1072750253

[180000 rows x 2 columns]