Classes for communication with backend H2O servers.
Connect to an existing H2O server and send requests to it.
Start an H2O server on your local machine.
Handle to the remote H2O cluster – used mainly to retrieve information about it.
h2o module has convenience functions for accessing these classes, and those are the ones that are
recommended for everyday use. The following are the common use cases:
Connect to an existing remote H2O server:
Connect to a local server, or if there isn’t one start it and then connect:
Start multiple H2O servers locally (forming a cluster), and then connect to one of them:
from h2o.backend import H2OLocalServer for _ in range(5): hs = H2OLocalServer.start() h2o.connect(server=hs)
h2o.init() take many parameters that allow you to fine-tune the connection
settings. When used, they will create a new
H2OConnection object and store it in a global variable – this
connection will be used by all subsequent calls to
h2o. functions. At this moment there is no effective way to
have multiple connections to separate H2O servers open at the same time. Such facility may be added in the future.
Connection handle to an H2O cluster.
In a typical scenario you don’t need to access this class directly. Instead use
h2o.connect()to establish a connection, and
h2o.api()to make requests to the backend H2O server. However if your use-case is not typical, then read on.
Instances of this class may only be created through a static method
hc = H2OConnection.open(...)
Once opened, the connection remains active until the script exits (or until you explicitly
close()it). If the script exits with an exception, then the connection will fail to close, and the backend server will keep all the temporary frames and the open session.
Alternatively you can use this class as a context manager, which will ensure that the connection gets closed at the end of the
with ...block even if an exception occurs:
with H2OConnection.open() as hc: hc.info().pprint()
Once the connection is established, you can send REST API requests to the server using
open(server=None, url=None, ip=None, port=None, name=None, https=None, auth=None, verify_ssl_certificates=True, cacert=None, proxy=None, cookies=None, verbose=True, _msgs=None)¶
Establish connection to an existing H2O server.
The connection is not kept alive, so what this method actually does is it attempts to connect to the specified server, and checks that the server is healthy and responds to REST API requests. If the H2O server cannot be reached, an
H2OConnectionErrorwill be raised. On success this method returns a new
H2OConnectionobject, and it is the only “official” way to create instances of this class.
There are 3 ways to specify the target to connect to (these settings are mutually exclusive):
pass the full
urlfor the connection,
provide a triple of parameters
server (H2OLocalServer) – connect to the specified local server instance. There is a slight difference between connecting to a local server by specifying its ip and address, and connecting through an H2OLocalServer instance: if the server becomes unresponsive, then having access to its process handle will allow us to query the server status through OS, and potentially provide snapshot of the server’s error log in the exception information.
url – full url of the server to connect to.
ip – target server’s IP address or hostname (default “localhost”).
port – H2O server’s port (default 54321).
name – H2O cluster name.
https – if True then connect using https instead of http (default False).
verify_ssl_certificates – if False then SSL certificate checking will be disabled (default True). This setting should rarely be disabled, as it makes your connection vulnerable to man-in-the-middle attacks. When used, it will generate a warning from the requests library. Has no effect when
cacert – Path to a CA bundle file or a directory with certificates of trusted CAs (optional).
auth – authentication token for connecting to the remote server. This can be either a (username, password) tuple, or an authenticator (AuthBase) object. Please refer to the documentation in the
proxy – url address of a proxy server. If you do not specify the proxy, then the requests module will attempt to use a proxy specified in the environment (in HTTP_PROXY / HTTPS_PROXY variables). We check for the presence of these variables and issue a warning if they are found. In order to suppress that warning and use proxy from the environment, pass
cookies – Cookie (or list of) to add to requests
verbose – if True, then connection progress info will be printed to the stdout.
_msgs – custom messages to display during connection. This is a tuple (initial message, success message, failure message).
request(endpoint, data=None, json=None, filename=None, save_to=None)¶
Perform a REST API request to the backend H2O server.
endpoint – (str) The endpoint’s URL, for example “GET /4/schemas/KeyV4”
data – data payload for POST (and sometimes GET) requests. This should be a dictionary of simple key/value pairs (values can also be arrays), which will be sent over in x-www-form-encoded format.
json – also data payload, but it will be sent as a JSON body. Cannot be used together with data.
filename – file to upload to the server. Cannot be used with data or json.
save_to – if provided, will write the response to that file (additionally, the response will be streamed, so large files can be downloaded seamlessly). This parameter can be either a file name, or a folder name. If the folder doesn’t exist, it will be created automatically.
an H2OResponse object representing the server’s response (unless
save_toparameter is provided, in which case the output file’s name will be returned).
Close an existing connection; once closed it cannot be used again.
Strictly speaking it is not necessary to close all connection that you opened – we have several mechanisms in place that will do so automatically (__del__(), __exit__() and atexit() handlers), however there is also no good reason to make this method private.
Return the session id of the current connection.
The session id is issued (through an API request) the first time it is requested, but no sooner. This is because generating a session id puts it into the DKV on the server, which effectively locks the cluster. Once issued, the session id will stay the same until the connection is closed.
H2OCluster object describing the underlying cluster.
Base URL of the server, without trailing
"/". For example:
URL of the proxy server used for the connection (or None if there is no proxy).
Handler to the H2OLocalServer instance (if connected to one).
Total number of request requests made since the connection was opened (used for debug purposes).
Timeout length for each request, in seconds.
Start logging all API requests to the provided destination.
dest – Where to write the log: either a filename (str), or an open file handle (file). If not given, then a new temporary file will be created.
Stop logging API requests.
Handle to an H2O server launched locally.
hs = H2OLocalServer.start(...) # launch a new local H2O server hs.is_running() # check if the server is running hs.shutdown() # shut down the server hs.scheme # either "http" or "https" hs.ip # ip address of the server, typically "127.0.0.1" hs.port # port on which the server is listening
Once started, the server will run until the script terminates, or until you call .shutdown() on it. Moreover, if the server terminates with an exception, then the server will not stop and will continue to run even after Python process exits. This runaway process may end up being in a bad shape (e.g. frozen), then the only way to terminate it is to kill the java process from the terminal.
Alternatively, it is possible to start the server as a context manager, in which case it will be automatically shut down even if an exception occurs in Python (but not if the Python process is killed):
with H2OLocalServer.start() as hs: # do something with the server -- probably connect to it
start(jar_path=None, nthreads=-1, enable_assertions=True, max_mem_size=None, min_mem_size=None, ice_root=None, log_dir=None, log_level=None, max_log_file_size=None, port='54321+', name=None, extra_classpath=None, verbose=True, jvm_custom_args=None, bind_to_localhost=True)¶
Start new H2O server on the local machine.
jar_path – Path to the h2o.jar executable. If not given, then we will search for h2o.jar in the locations returned by ._jar_paths().
nthreads – Number of threads in the thread pool. This should be related to the number of CPUs used. -1 means use all CPUs on the host. A positive integer specifies the number of CPUs directly.
enable_assertions – If True, pass -ea option to the JVM.
max_mem_size – Maximum heap size (jvm option Xmx), in bytes.
min_mem_size – Minimum heap size (jvm option Xms), in bytes.
log_dir – Directory for H2O logs to be stored if a new instance is started. Default directory is determined by H2O internally.
log_level – The logger level for H2O if a new instance is started.
max_log_file_size – Maximum size of INFO and DEBUG log files. The file is rolled over after a specified size has been reached. (The default is 3MB. Minimum is 1MB and maximum is 99999MB)
ice_root – A directory where H2O stores its temporary files. Default location is determined by tempfile.mkdtemp().
port – Port where to start the new server. This could be either an integer, or a string of the form “DDDDD+”, indicating that the server should start looking for an open port starting from DDDDD and up.
name – name of the h2o cluster to be started
extra_classpath – List of paths to libraries that should be included on the Java classpath.
verbose – If True, then connection info will be printed to the stdout.
jvm_custom_args – Custom, user-defined arguments for the JVM H2O is instantiated in
bind_to_localhost – A flag indicating whether access to the H2O instance should be restricted to the local machine (default) or if it can be reached from other computers on the network. Only applicable when H2O is started from the Python client.
a new H2OLocalServer instance
Return True if the server process is still running, False otherwise.
Shut down the server by trying to terminate/kill its process.
First we attempt to terminate the server process gracefully (sending SIGTERM signal). However after _TIME_TO_KILL seconds if the process didn’t shutdown, we forcefully kill it with a SIGKILL signal.
Connection scheme, ‘http’ or ‘https’.
IP address of the server.
Port that the server is listening to.
H2O cluster name.