Skip to main content
Version: v0.7.x

Import Amazon S3 data

This tutorial guides you through importing data from Amazon S3 into your H2O Drive workspace. Follow the steps below to connect to your datasource and import the data.

Prerequisites

Before you begin, you will need:

  • Access to an AWS account
  • Access link to your Amazon S3 bucket

Step 1: connect the S3 bucket

Let's connect to your Amazon S3 data bucket.

  1. On the H2O Drive home page, click Import.
  2. Select Amazon S3 from the dropdown list of sources.

If the S3 bucket is public

  1. Leave the Select Credentials field blank.
  2. Select the Object is public checkbox and enter the AWS S3 Link.
  3. You have successfully imported your dataset. You should now be able to see it displayed on H2O Drive under the specified filename.
  4. Skip to Step 2: Share the dataset section.

If the S3 bucket is not public

  1. If you do not have a credentials profile already, click Add New Credentials.
  2. Enter the following details to connect to the S3 bucket.
    • Profile Name: A suitable name for your personal credentials profile. This is the name that will appear on the dropdown list (which you saw on the previous screen) when you are selecting the credentials profile you wish to use.
    • Bucket Name: The name of the S3 bucket.
    • AWS_ACCESS_KEY_ID: The access key associated with your S3 bucket.
    • AWS_SECRET_ACCESS_KEY: The access secret associated with your S3 bucket.
      note

      For more information, see AWS credentials and Methods for accessing a bucket in the AWS Documentation.

      import-s3-data
  3. Click Save.
  4. Select the credentials profile that you just created from the dropdown list.
  5. Enter the AWS S3 Link. You can either enter an AWS S3 URI or an Object URL. select-data-credentials
  6. Click Next. You have successfully imported a dataset! You should now be able to see it displayed on H2O Drive under the specified filename. imported-dataset

Step 2: share the dataset

  1. Select the imported dataset by clicking on the filename.
  2. Click Get Link to get a pre-signed link that you can share with other users or applications that need to access this dataset.
  3. Set the expiration time and click Get Link. set-expiry-time
  4. Copy the link that appears and click Close. You can now use the copied link to share this dataset.

You can import this dataset using the pre-signed link to H2O-3, Driverless AI, or share it with someone else who can then import it onto their H2O Drive instance using the HTTP download option.


Feedback