Kubernetes Examples¶
This section provides a complete example for using the Enterprise Steam Python client on Kubernetes.
Launching and connecting to H2O cluster¶
This examples shows how to login to Steam and launch H2O cluster with 4 nodes and 10GB of memory per node.
The H2O cluster is using H2O version 3.28.0.2 and profile called default-h2o
and submitting to the default YARN queue.
All other H2O parameters are pre-filled according to the selected profile.
When the cluster is up we connect to it and start importing data.
import h2o
import h2osteam
from h2osteam.clients import H2oKubernetesClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oKubernetesClient().launch_cluster(name="test-cluster",
profile_name="default-h2o",
version="3.28.0.2",
node_count=4,
cpu_count=4,
gpu_count=0,
memory_gb=10)
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)
Providing dataset parameters to preset cluster size¶
This examples shows how to launch H2O cluster providing dataset information.
If you are not sure how to exactly size your cluster, you can provide either dataset_size_gb
(for raw data source) or dataset_dimension
tuple (for compressed data source) and specify whether you are going to use XGBoost algorithm on your cluster with using_xgboost
parameter.
Setting these parameters will size the cluster accordingly.
If your profile does not allow to allocate recommended resources for the cluster, maximum allowed resources will be used.
Also any user-specified values of nodes
, node_memory_gb
, or extra_memory_percent
will override recommended values.
Example using dataset_size_gb
when using a CSV file as a data source:
import h2o
import h2osteam
from h2osteam.clients import H2oKubernetesClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oKubernetesClient().launch_cluster(name="test-cluster",
profile_name="default-h2o",
version="3.28.0.2",
dataset_size_gb=20)
Example using dataset_dimension
, a tuple of (n_rows, n_cols) when using compressed file (e.q. parquet) as a data source:
import h2o
import h2osteam
from h2osteam.clients import H2oKubernetesClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oKubernetesClient().launch_cluster(name="test-cluster",
profile_name="default-h2o",
version="3.28.0.2",
dataset_dimension=(25000, 1250))
Connecting to existing H2O cluster¶
This example shows how to login to Steam and connect to existing H2O cluster called test-cluster
and import data.
import h2o
import h2osteam
from h2osteam.clients import H2oKubernetesClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oKubernetesClient.get_cluster("test-cluster")
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)
Launching and connecting to Driverless AI instance¶
This example shows how to create instance of Driverless AI v1.8.4.1, connect to it and upload dataset.
import h2osteam
from h2osteam.clients import DriverlessClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DriverlessClient().launch_instance(name="test-instance",
version="1.8.4.1",
profile_name="default-driverless-kubernetes")
client = instance.connect()
# Import the iris dataset
ds = client.datasets.create(
data='s3://h2o-public-test-data/smalldata/iris/iris.csv',
data_source='s3'
)
Connecting to existing Driverless AI instance¶
This example shows how to connect to existing Driverless AI instance called test-instance
and upload dataset.
import h2osteam
from h2osteam.clients import DriverlessClient
h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DriverlessClient().get_instance(name="test-instance")
client = instance.connect()
# Import the iris dataset
ds = client.datasets.create(
data='s3://h2o-public-test-data/smalldata/iris/iris.csv',
data_source='s3'
)
Managing multiple Steam connections in one session¶
This example shows how to manage multiple Steam connections in one session.
import h2osteam
from h2osteam.clients import DriverlessClient
adam_steam = h2osteam.login(url="https://steam.h2o.ai:9555", username="adam", password="adams-token-here", verify_ssl=True)
# Initialize client with steam object
adam_dai_client = DriverlessClient(adam_steam)
# Use client to start an instance
adam_instance = adam_dai_client.launch_instance(name="adams-instance",
version="1.9.2.0",
profile_name="default-driverless-kubernetes")
# Start an instance with different user within the same session
ben_steam = h2osteam.login(url="https://steam.h2o.ai:9555", username="ben", password="bens-token-here", verify_ssl=True)
ben_dai_client = DriverlessClient(ben_steam)
ben_dai_client.launch_instance(name="bens-instance",
version="1.9.2.0",
profile_name="default-driverless-kubernetes")
# Terminate instance of the first user
adam_instance.terminate()