Skip to main content

Databases

The Database Browser

Setting up a connection to your database is a better option for larger datasets or datasets that change over time. To open the Database Browser, from the homepage, click New > Dataset > Connect.

database browser

The Database Browser lets you create a database connection and explore tables that you can then load into a session.

Connect to a Database

DataChat supports the following database systems:

  • BigQuery
  • Databricks
  • MicrosoftSQLServer
  • MySQL
  • PostgreSQL
  • Presto
  • Snowflake
  • ConnectionString
note

If you're working from DataChat's cloud platform and are connecting to a read-only database that's behind a firewall, refer to Add DataChat IP for Read-Only Databases for more information on allowing access to read-only databases.

note

DataChat recommends using MySQL version 8 or later for best results.

Database Types

note

To connect to a BigQuery database, ensure that your account has the necessary permissions. The required roles are:

  • BigQuery Data Viewer. This role can be scoped to the level of a specific BigQuery Table or View.
  • BigQuery User. This role can be scoped to the level of a specific BigQuery Dataset.

For more information on scope attributes, see Google Cloud's documentation.

To use BigQuery ML while connected to a BigQuery database, add the following to your account permissions:

  • bigquery.jobs.create
  • bigquery.models.create
  • bigquery.models.getData
  • bigquery.models.updateData
  • bigquery.models.updateMetadata
  • bigquery.models.delete

For more information on BigQuery ML permissions, see Google Cloud's documentation.

To add your BigQuery database:

Big Query database

  1. Give the database a display name. This is how you'll reference the database across DataChat.
  2. In the Project field, enter the name of your BigQuery project.
  3. In the Dataset Name field, enter the name of your BigQuery dataset.
  4. Upload your credentials file. Refer to Google's documentation for more information on creating and downloading this file.

If you're not going to use a staging project for your intermediate tables, click Save. Otherwise, continue to the next section.

Add a Staging Project

When working in DataChat, you will create intermediate tables that need to be written to the database. If you want to limit write access to your primary project, you can create a staging project to which those intermediate tables can be written. First, create a new project. Then:

  1. In the Staging Project field, enter the name of the BigQuery project you want to use to stage your data.
  2. In the Staging Dataset Name field, enter the name of the BigQuery dataset you want to use to stage your data.
  3. Upload the credentials file used for that database in the Staging Credential File field.
  4. Optionally, set the database to be read-only or enable using BigQuery's storage API. Note that selecting this option will increase transfer speeds, but may result in increased BigQuery costs.
  5. Test your database, then click Save.

Load from a Database

After connecting to a database, they appear on the left side of the Database Browser. From here, you can:

  1. Switch between databases by clicking the connection name.
  2. Optionally, add filters to search for a specific table.
  3. Optionally, refresh the database schema by clicking the Refresh button.
  4. Select the tables you'd like to open. Optionally, you can expand table to view column information, or preview a dataset by clicking on it.
  5. Click Import to create dataset objects or click Import and Load to automatically load the datasets into a session.

database browser

note

Sizeable database schemas can cause the Database Browser to time out. If you encounter this issue, reach out to your database administrator to access a more manageable version of the database. Otherwise, you can load from the database within a session.

Edit a Database

To edit a database:

  1. Double-click the database to edit, or select a database and click the Edit button from the toolbar or right-click menu. The Database Browser opens.
  2. Make your edits and click Save.
  3. To edit additional database connections, click the gear icon next connection to edit from the list of database connections.

edit a database

Save a Copy of a Database

To save a copy of an existing database, navigate to the homepage:

  1. Click Databases in the sidebar. You can also find your databases in My Work.
  2. Select the database to save a copy of.
  3. Click Edit Database. The Database Browser opens. You can also right-click on a selected database and click Save As and enter a new name for the copy.
  4. Click Save As and enter a new name for the copy.

copy a database

Within a Session

To connect to a database while in a session:

  1. In the Data tab, select Add Dataset > Connect in the skill menu. You can also click New Dataset > Connect if data has not yet been loaded.
  2. This opens the Database Browser, where you can connect to your database

To load datasets from an existing database connection, refer to Load Datasets into a Session

Create and Use Queries

If you know the layout of your database and only want to work with a certain subset of records, you can use SQL queries to browse and load that specific data.

Define Your Queries When Adding Your Database

You can add custom queries to your database information:

  1. Create a new database or edit an existing one.
  2. Open the Custom Queries field and click Add.
  3. Add a name for the resulting dataset and the SQL query that creates it.
  4. Test your database and then save it.

custom queries

Additional Database Connection Information

Data Retrieval

DataChat retrieves data from the connected database in batches. However, certain skills, such as joins, sometimes require the entire dataset to be materialized to execute successfully. This ensures that any computations that rely on the complete dataset are accurate.

Computation Execution

Computations are executed within your specific database environment. This approach utilizes the database's processing power, ensuring that operations are performed close to the data source, minimizing data transfer and improving performance.

Read-Only vs Writeable Connections

Generally, most database connections in DataChat will be configured as read-only. This is designed to protect your data integrity. If a DataChat skill requires moving data to a new location, such as joining two datasets from different database platforms, DataChat copies the data to its own internal compute engine rather than writing to your database connection.

For writeable connections, DataChat optimizes the performance of skills such as joins and bins by writing the data directly to your external database as a table or view. This is typically isolated in a dedicated schema or namespace to keep the results separate from other data.

note

Data written to external databases is cleaned up when your session is deleted unless you've saved an object to the homepage or published an object to an Insights Board.