Datasets
Datasets are created when uploading files and importing tables from database connections.
Create Datasets from Files
Uploading a dataset file into DataChat is the easiest way to get started and is a better option when working with small, static, or one-off datasets.
By default, DataChat comes with 4 datasets:
telcoCustomerChurn.csv
. Dataset from a telecommunications company on customer retention.Credit_Risk.csv
. Dataset on lending and default likelihood for credit users.Heart_Health.csv
. Dataset about key indicators of heart disease.Hotel_Reservations.csv
. Dataset about online hotel reservation platforms and their relationship to the booking landscape and customer behavior.
For best data handling, we recommend the following:
- Use descriptive file names
- Use accurate column headers
- Use one table per sheet
- Align the table to be in the upper-left corner
- Remove any empty columns and rows
Datasets supports the following file extensions:
- Comma-separated values (.csv)
- Excel (.xlsx)
Datasets can contain at most 1600 columns.
Local Datasets
You can upload a new dataset file from either the homepage or within a session:
From the Homepage
- Click New > Dataset > Upload
- Either drag and drop your dataset file, or browse your local machine.
The datasets automatically upload to the Datasets section of the homepage. Note that uploading a file with multiple sheets also creates a new folder <filename>
under the My Work section of the homepage that contains each dataset.
Within a Session
- Click Add Dataset > Upload in the Skill menu. Optionally, you can also use New Dataset > Upload if data has not yet been loaded.
- Either drag and drop your dataset file, or browse your local machine.
The datasets automatically load into your session. They are also added to the Datasets section of the homepage.
Online Datasets
You can also load dataset from a URL or from Google Drive using the Skill form.
Load Data From a URL
Uploading data from a URL also creates a dataset so you can easily load them again in other sessions. To upload data from a URL:
- In the Data tab, select Skill in the Skill menu.
- Enter
Load data from the URL <URL>
. - Click Submit.
For example, to upload the "Covid-19" dataset, you can enter Load data from the URL https://github.com/datasets/covid-19/blob/main/data/key-countries-pivoted.csv
Load Data From Google Drive
You can allow DataChat to load datasets from your Google Drive by authenticating it with Google and then loading data from the Google Drive URL.
- In the Data tab, select Skill in the Skill menu.
- Enter
Authenticate with Google Drive
. This opens a new browser tab. - Select the Google account to use.
- Click Allow to let DataChat access your Google Drive files. You're then redirected back to your session in DataChat.
- Enter in the Skill form:
Load data from the URL <URL>
, where<URL>
is the URL of your file in Google Drive. Note that you must the use the URL you receive after clicking the Share button in Google Drive, but you do not need to share the file with any other accounts.
Loading data from Google Drive also uploads the datasets to DataChat, so you can easily load them again in other sessions.
Create Datasets from Connections
To create a dataset, you must first create a database connection or open an existing database connection in the Database Browser, then:
- Select the connection to import tables from.
- Select the tables to import.
- Optionally, preview the tables by clicking the table name.
- Click Import. Optionally, you can also click Import and Load to load the tables directly into a new session.
The imported datasets then appear within a folder under the My Work section labeled database name > schema name > list of datasets by table
. The imported datasets can also be found in the Datasets section:
Load Datasets into a Session
From the Homepage
To load a single dataset into a session from the homepage, navigate to the Datasets section and double-click the dataset you'd like to load. This automatically opens a new session and loads the selected dataset.
To load an entire folder of datasets into a session from the homepage, navigate to the My Work section, then:
- Select the folder to load datasets from.
- Right-click and select Load Folder. Optionally, you can also use the Load Folder button in the toolbar.
This automatically opens a new session and loads all the datasets within the selected folder.
Within a Session
To upload datasets from within a DataChat session:
- In the Data tab, select Add Dataset > Load in the skill menu. You can also click New Dataset if data has not yet been loaded.
- Select the dataset or folder you'd like to load. Optionally, use the dropdown next to a folder to see its contents.
- If you dataset is not shown, you can Search for the dataset.
- Click Load. Alternatively, for loading a single folder or dataset, you can double-click on the respective item.
Once completed, the window automatically closes and loads the dataset into the session.
Refresh Datasets
Datasets created by saving active datasets in a session can be refreshed. This is especially helpful when the underlying file has been updated to incorporate fresh data.
To refresh a dataset:
- From the homepage, navigate to either My Work or Datasets.
- Click the Refresh button next to the dataset you'd like to update.
This automatically opens the Editor and runs the underlying workflow that created the dataset. If the underlying workflow is run successfully, a green check appears to the left of the dataset name.
Edit a Dataset
The underlying workflows of these datasets can also be edited to add or remove steps. Refer to Edit Steps.
While steps that create other objects such as charts or models can be added, only the steps that modify or change the state of the dataset are applied.
If you edit the underlying workflow, the workflow changes to unverified, and the green check changes to a red icon.
If no errors are encountered, the workflow is automatically verified and saved.