Skip to main content
Version: 0.17.6

Load Data

Loading data into DataChat is a two-step process:

  1. Upload your data, connect to a database, or load data from Google Drive or a URL so you can access it in a session.
  2. Bring your data into your active session to begin working with it.
note

Datasets can contain at most 1600 columns.

Upload Files

Uploading a file is the easiest way to get started and is a better option when working with small, static, or one-off datasets.

DataChat supports the following file types:

  • Comma-separated values (.csv)
  • Tab-separated values (.tsv)
  • Excel (.xlsx)
  • JavaScript Object Notation (.json)
  • ZIP archives (.zip)
  • TAR archives (.tar)

To upload a file:

  1. Go to the menu and click My Files.
  2. Click the Upload button. upload button
  3. Either drag and drop your file or click Browse to select the file. upload a file

Create a Database Connection

Setting up a database connection is a better option for larger datasets or datasets that change over time. To connect to a database, you can create a connection file by clicking the + New button, then select Connection.

connection button

DataChat supports the following database systems:

  • BigQuery
  • Databricks
  • Microsoft SQL Server
  • MySQL
  • PostgreSQL
  • Presto
  • Snowflake
  • ConnectionString
note

If you working from DataChat's cloud platform (apps.datachat.ai) and are connecting to a read-only database that's behind a firewall, refer to Add DataChat IP for Read-Only Databases for more information on allowing access to read-only databases.

To add a connection to your BigQuery database:

  1. Give the connection a name.
  2. In the Project field, enter the name of your BigQuery project.
  3. In the Database field, enter the name of your BigQuery database.
  4. Upload your credentials file. Refer to Google's documentation for more information on creating and downloading this file.

If you're not going to use a staging project for your intermediate tables, click Save. Otherwise, continue to the next section.

Add a Staging Project

When working in DataChat, you will create intermediate tables that need to be written to the database. If you want to limit write access to your primary project, you can create a staging project to which those intermediate tables can be written. First, create a new project. Then:

  1. In the Staging Project field, enter the name of the BigQuery project you want to use to stage your data.
  2. In the Staging Database Name field, enter the name of the BigQuery database you want to use to stage your data.
  3. Upload the credentials file used for that database in the Staging Credential File field.
  4. Optionally, set the connection to be read-only.
  5. Test your connection, then click Save.

big query connection window

Edit a Database Connection

To edit an existing database connection:

  1. From the home screen, navigate to the Connections page.
  2. Select the connection to edit.
  3. Click the Edit Connection button: edit connection button
  4. Make your edits and click Save.

Rename a Database Connection

To rename an existing database connection:

  1. From the home screen, navigate to the Connections page.
  2. Select the connection to rename.
  3. Click the Rename Connection button: rename connection button
  4. Rename your connection and click Save.

Delete a Database Connection

To delete an existing database connection:

  1. From the home screen, navigate to the Connections page.
  2. Select the connection to delete.
  3. Click the Delete Connection button: delete connection button

Load Data Into a Session

When you open a new session, the Load Data form appears by default. You can change this setting in the Settings menu. The Load Data form is always accessible from the Load button on the sidebar. The form suggests files, snapshots, databases, and workflows based on your past usage, though you can load data from other sources, including an external URL.

the load card

the load button

From a File

To load data from a file:

  1. Click the Load button to open the Load Data form.
  2. Select a file from the recommendations list.
  3. Click Submit.

Or, if your file isn't listed:

  1. Click the Load button to open the Load Data form.
  2. Click Load Data From My Files.
  3. Select the file to load.
  4. Click Submit.

the file form

You can also enter the following utterance in the chat box: Load data from the file <file name>.

note

For .xslx files that include sheets, you can limit imported data by excluding sheets or specifying cell ranges. See Load for details.

From a Snapshot

A snapshot is similar to a database that contains the datasets from another session that is created with the Save skill, either by you or by another user in your organization and shared with you.

  1. Click the Load button to open the Load Data form.
  2. Select a snapshot from the recommendations list.
  3. Click Submit.

Or, if your snapshot isn't listed:

  1. Click the Load button to open the Load Data form.
  2. Click Load Data From My Snapshots.
  3. Select the snapshot to load.
  4. Click "Submit".

the snapshot form

You can also enter the following utterance in the chat box: Load data from the snapshot <snapshot name>

From a URL

There are two ways you can load data from a file you've uploaded.

If you find a dataset online, you can load it into DataChat directly using the toolbar.

To load data from a URL:

  1. Click the Load button.
  2. Enter the URL to load.
  3. Click Submit.

the url form

For example, to upload the "Titanic" dataset from https://github.com/awesomedata/awesome-public-datasets/blob/master/Datasets/titanic.csv.zip, you can enter https://github.com/awesomedata/awesome-public-datasets/blob/master/Datasets/titanic.csv.zip in the Enter a URL field.

You can also enter the following utterance in the chat box: Load data from the URL <url>

From Google Drive

You can allow DataChat to load data from your Google Drive by authenticating it with Google and then loading data from the Google Drive URL.

In the chat box:

  1. Enter Authenticate with Google Drive. This opens a new browser tab.
  2. Select the Google account to use.
  3. Click Allow to let DataChat access your Google Drive files. You're then redirected back to your session in DataChat.
  4. Enter in the chat box: Load data from the URL <URL>, where <URL> is the URL of your file in Google Drive. Note that you must the use the URL you receive after clicking the Share button in Google Drive, but you do not need to share the file with any other accounts.

You can also enter the following utterance in the chat box: Load data from the database <connection name> where <connection name> is the name you specified in step 4.

From a Database

tip

If there are a large number of tables in your database, DataChat recommends specifying which tables you want to use when connecting to the database. Refer to the reference page for Load for more information.

To load data from a database:

  1. Click the Load button to open the Load Data form.
  2. Select a database from the recommendations list.
  3. Click Submit.

Or, if your database isn't listed:

  1. Click the Load button to open the Load Data form.
  2. Click Load Data From My Databases.
  3. Enter the database from which you'd like to load data.
  4. Optionally, enter how you'd like to load the data.
  5. Click Submit.

the database form

You can also enter the following utterance in the chat box: Load data from the database <connection name> where <connection name> is the name you specified in step 3.

note

For databases with write access, all tables are loaded into DataChat, which might be a time-intensive operation. Once the tables are loaded, you can click a link to list all the loaded datasets.

Browse a Read-Only Database

For read-only databases, when you enter: Load data from the database <connection name>

DataChat displays a table containing the datasets available within the database. Click on a dataset to load it, or specify which tables within the dataset to use in your session, as detailed in the reference page for Load.

Load More Tables from a Database

After connecting to a database and loading some tables, you can load more data from the database by specifying the table or loading a group of tables with the same schema. Refer to the reference page for Load.

Share a Connection

You can Share your existing database connection with another user in your organization. The recipient can then use the Connect skill to directly connect to the database.

From a Workflow

To load data from a saved workflow:

  1. Click the Load button to open the Load Data form.
  2. Select a workflow from the recommendations list.
  3. Click Submit.

Or, if your workflow isn't listed:

  1. Click the Load button to open the Load Data form.
  2. Click Replay One Of My Workflows.
  3. Enter the workflow to replay.
  4. Click Submit.

the workflow replay form

You can also enter the following utterance in the chat box: Replay the workflow <workflow name>

Read-Only Databases

For read-only databases, when you enter Load data from the database <connection name>, DataChat displays a table containing the datasets available within the database. Click on a dataset to load it, or specify which tables within the dataset to use in your session, as detailed in the reference page for Load.

Best Practices

To expedite future load time for datasets containing less than 10M rows, you can load the data and save it as a snapshot. Enter Load data from <read-only database name> and force transfer, followed by Save the current snapshot as <snapshot name>.

Optionally, you can schedule this workflow to refresh on a given time basis to ensure that the snapshot remains updated. Refer to the Schedule and Refresh skills references pages for more information.

Read-Only Databases

For read-only databases, when you enter:

Load data from the database <connection name>

DataChat displays a table containing the datasets available within the database. Click on a dataset to load it, or specify which tables within the dataset to use in your session, as detailed in the reference page for Load.

Best Practices

To expedite future load time for datasets containing less than 10M rows, you can load the data and save it as a snapshot. Enter Load data from <read-only database name> and force transfer, followed by Save the current snapshot as <snapshot name>.

Optionally, you can schedule this workflow to refresh on a given time basis to ensure that the snapshot remains updated. Refer to the Schedule and Refresh skills references pages for more information.

Export Data

note

Exporting large datasets are limited to 10 million cells.

There are two ways to export data from DataChat. You can:

  • Use Export to export a specific dataset or an entire database as CSV files.
  • Use the download button in the upper corner of a table to download the table as a CSV file.

Remove Data

There are a couple of ways to remove data from DataChat. You can:

  • Use the Forget skill to remove saved databases and datasets you've loaded in to your session. Note that forgetting a dataset does not remove the underlying file.
  • Remove the files themselves in the File Manager by going to Settings > Files and selecting the files you want to remove: remove file buttons