Databases
The Database Browser
Setting up a connection to your database is a better option for larger datasets or datasets that change over time. To open the Database Browser, from the homepage, click New > Dataset > Connect.
The Database Browser lets you create a database connection and explore tables that you can then load into a session.
Connect to a Database
DataChat supports the following database systems:
- BigQuery
- Databricks
- MicrosoftSQLServer
- MySQL
- PostgreSQL
- Presto
- Snowflake
- ConnectionString
If you're working from DataChat's cloud platform and are connecting to a read-only database that's behind a firewall, refer to Add DataChat IP for Read-Only Databases for more information on allowing access to read-only databases.
DataChat recommends using MySQL version 8 or later for best results.
Database Types
- BigQuery
- Databricks
- MicrosoftSQLServer
- MySQL
- PostgreSQL
- Presto
- Snowflake
- ConnectionString
To connect to a BigQuery database, ensure that your account has the necessary permissions. The required roles are:
BigQuery Data Viewer
. This role can be scoped to the level of a specific BigQuery Table or View.BigQuery User
. This role can be scoped to the level of a specific BigQuery Dataset.
For more information on scope attributes, see Google Cloud's documentation.
To use BigQuery ML while connected to a BigQuery database, add the following to your account permissions:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.models.updateMetadata
bigquery.models.delete
For more information on BigQuery ML permissions, see Google Cloud's documentation.
To add your BigQuery database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Project field, enter the name of your BigQuery project.
- In the Dataset Name field, enter the name of your BigQuery dataset.
- Upload your credentials file. Refer to Google's documentation for more information on creating and downloading this file.
If you're not going to use a staging project for your intermediate tables, click Save. Otherwise, continue to the next section.
Add a Staging Project
When working in DataChat, you will create intermediate tables that need to be written to the database. If you want to limit write access to your primary project, you can create a staging project to which those intermediate tables can be written. First, create a new project. Then:
- In the Staging Project field, enter the name of the BigQuery project you want to use to stage your data.
- In the Staging Dataset Name field, enter the name of the BigQuery dataset you want to use to stage your data.
- Upload the credentials file used for that database in the Staging Credential File field.
- Optionally, set the database to be read-only or enable using BigQuery's storage API. Note that selecting this option will increase transfer speeds, but may result in increased BigQuery costs.
- Test your database, then click Save.
To add your Databricks database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Database field, enter the name of your Databricks database.
- In the Catalog field, enter the name of your Databricks catalog.
- In the Hostname field, enter the hostname or IP address of the database.
- In the Port Number field, enter the port for the database.
- In the HTTP Path field, enter the URL needed to connect to the database.
- In the Access Token field, enter the access token needed to connect to the database.
- Optionally, add some custom queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
To add your MicrosoftSQLServer database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Database field, enter the name of the database itself.
- In the Host Name field, enter the hostname or IP address of the database.
- In the Port Number field, enter the port for the database.
- In the Username field, enter the username needed to connect to the database.
- In the Password field, enter the password needed to connect to the database.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
To add your MySQL database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Database field, enter the name of the database itself.
- In the Host Name field, enter the hostname or IP address of the database.
- In the Port Number field, enter the port for the database.
- In the Username field, enter the username needed to connect to the database.
- In the Password field, enter the password needed to connect to the database.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
To add your PostgreSQL database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Database field, enter the name of the database itself.
- In the Host Name field, enter the hostname or IP address of the database.
- In the Port Number field, enter the port for the database.
- In the Username field, enter the username needed to connect to the database.
- In the Password field, enter the password needed to connect to the database.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
To add your Presto database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the URL field, enter the URL for your database.
- In the Port field, enter the port your database is using.
- In the Catalog field, enter the catalog to use.
- In the Schema field, enter the schema to use.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
To add your Snowflake database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Hostname field, enter the hostname or IP address of the database.
- In the Username field, enter the username needed to connect to the database.
- In the Password field, enter the password needed to connect to the database.
- In the Database field, enter the name of the database.
- In the Compute Warehouse section, optionally set the Compute Warehouse field – enter the default warehouse to use.
- Optionally, set the DataChat Workspace field – enter the name of the workspace to use. A workspace is similar to a schema in other database systems.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
You can use a raw connection string to connect to a database of your choosing. Ask your administrator if it's provided by your organization.
To add your connection string database:
- Give the database a display name. This is how you'll reference the database across DataChat.
- In the Connection String field, enter the connection string needed to connect to the database.
- Optionally, add custom SQL queries.
- Optionally, set the database to be read-only.
- Test your database, then click Save.
For example, if you'd like to connect to a MySQL or SQL Server database, you could enter something like mssql://<username>:<password>@<hostname>:<port>/<database name>?Encrypt=yes&TrustServerCertificate=no&Connection+Timeout=5
Using a connection string, you can gain read-only access to the following database systems:
- Apache Druid
- Apache Hive
- Apache Kylin
- Apache Pinot
- Apache Solr
- Apache Spark SQL
- Azure MS SQL
- Big Query
- CockroachDB
- Dremio
- Elasticsearch
- Google Sheets
- Firebolt
- Hologres
- IBM Db2
- MySQL
- Oracle
- PostgreSQL
- Trino
- Presto
- SAP Hana
- Snowflake
- SQLite
- SQL Server
- Teradata
- Vertica
- YugabyteDB
Load from a Database
After connecting to a database, they appear on the left side of the Database Browser. From here, you can:
- Switch between databases by clicking the connection name.
- Optionally, add filters to search for a specific table.
- Optionally, refresh the database schema by clicking the Refresh button.
- Select the tables you'd like to open. Optionally, you can expand table to view column information, or preview a dataset by clicking on it.
- Click Import to create dataset objects or click Import and Load to automatically load the datasets into a session.
Sizeable database schemas can cause the Database Browser to time out. If you encounter this issue, reach out to your database administrator to access a more manageable version of the database. Otherwise, you can load from the database within a session.
Edit a Database
To edit a database:
- Double-click the database to edit, or select a database and click the Edit button from the toolbar or right-click menu. The Database Browser opens.
- Make your edits and click Save.
- To edit additional database connections, click the gear icon next connection to edit from the list of database connections.
Save a Copy of a Database
To save a copy of an existing database, navigate to the homepage:
- Click Databases in the sidebar. You can also find your databases in My Work.
- Select the database to save a copy of.
- Click Edit Database. The Database Browser opens. You can also right-click on a selected database and click Save As and enter a new name for the copy.
- Click Save As and enter a new name for the copy.
Within a Session
To connect to a database while in a session:
- In the Data tab, select Add Dataset > Connect in the skill menu. You can also click New Dataset > Connect if data has not yet been loaded.
- This opens the Database Browser, where you can connect to your database
To load datasets from an existing database connection, refer to Load Datasets into a Session
Create and Use Queries
If you know the layout of your database and only want to work with a certain subset of records, you can use SQL queries to browse and load that specific data.
Define Your Queries When Adding Your Database
You can add custom queries to your database information:
- Create a new database or edit an existing one.
- Open the Custom Queries field and click Add.
- Add a name for the resulting dataset and the SQL query that creates it.
- Test your database and then save it.
Additional Database Connection Information
Data Retrieval
DataChat retrieves data from the connected database in batches. However, certain skills, such as joins, sometimes require the entire dataset to be materialized to execute successfully. This ensures that any computations that rely on the complete dataset are accurate.
Computation Execution
Computations are executed within your specific database environment. This approach utilizes the database's processing power, ensuring that operations are performed close to the data source, minimizing data transfer and improving performance.
Read-Only vs Writeable Connections
Generally, most database connections in DataChat will be configured as read-only. This is designed to protect your data integrity. If a DataChat skill requires moving data to a new location, such as joining two datasets from different database platforms, DataChat copies the data to its own internal compute engine rather than writing to your database connection.
For writeable connections, DataChat optimizes the performance of skills such as joins and bins by writing the data directly to your external database as a table or view. This is typically isolated in a dedicated schema or namespace to keep the results separate from other data.
Data written to external databases is cleaned up when your session is deleted unless you've saved an object to the homepage or published an object to an Insights Board.