Skip to main content
Version: 0.20.7

Use Python with DataChat

note

This section assumes that you have a working knowledge of Python, Jupyter notebooks, and creating your own Python scripts.

You can use Jupyter notebooks and Python scripts in conjunction with your DataChat English workflows to help clean, analyze, visualize, and model your data in DataChat.

DataChat recommends using our built-in Jupyter notebook integration to create your own Python scripts, though custom scripts that you created elsewhere are also supported. Note that DataChat does not support importing Jupyter notebooks from outside of DataChat.

Scripts

The first step is to create a Jupyter notebook or a separate Python script and load it into DataChat. We'll cover this in more detail in the Create Scripts section.

After creating your script, you can use the Run skill to launch your script with input datasets of your choice. Behind the scenes, the script runs in its own Docker container that is isolated from both the network and the host machine's file system. When the Run is used, the datasets you specified are copied to a temporary directory that the container can access.

When the script finishes, the final datasets and visualizations are saved back to that temporary directory, loaded back into your session as the new "current" dataset, and the temporary directory is deleted, along with its contents.

Create Scripts

In this section, we'll cover how to create your script in either a Jupyter notebook or a Python file.

When you run the Launch skill the first time, both a Jupyter server and a Jupyter notebook are created. If you run the Launch skill again in the same session, a second notebook is created. There can be only one Jupyter server running per session (with a maximum of five across all sessions), but there is no limit to the number of notebooks that can be running on a given server. If you try to create a sixth server, you are prompted to stop one of the other five servers before continuing. In this case, we recommend saving your work and closing one of your DataChat sessions before continuing to create a new server.

To create a script with a Jupyter notebook:

  1. In a session, Load the datasets you want to use in your script.
  2. Launch Jupyter with the utterance Launch jupyter notebook with the datasets <your datasets> where <your datasets> are the datasets you loaded in step 1.
  3. Click the first link to open the notebook. If you're prompted to log in, click the second link, and then click the first link.

launch the notebook with these links

  1. Optionally, if you are handling datetime objects, include import datetime in your Jupyter notebook cell.
  2. Write your script. Refer to Jupyter's documentation for more information on using the Jupyter interface. Also, refer to the Functions section for information about using DataChat's APIs to create, save, and export data and visualizations back to your DataChat session. the jupyter notebook interface
  3. When you're finished, refer to the Save or Upload Scripts topic to save your script and make it accessible in DataChat.

Other Python Scripts

To get started, create a new Python file for your script, or use the following template. You can download it here. Note that your script must follow the template's structure in order to work properly:

# api is a DataChat library that allows you to read and save datasets and figures.
import api
from api import Dataset, Column, Parameter
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Every script needs this main() function. This function defines the datasets, columns, and parameters that should be used in the script.
# It should match the datasets and possible parameters used in the DataChat English utterance.
# For example, if your script uses the "titanic" dataset and two parameters, keep_kids (Boolean) and age_limit (int), your main() function would look like:
# main(titanic: Dataset([Column('Pclass', int), Column('Age', float)]), keep_kids: Parameter('keep_kids', bool), age_limit: Parameter('age_limit', int))
#
# The DataChat English utterance would look like: Run the script my_script on the dataset titanic.csv with parameters keep_kids, True, age_limit, 53
#
# The function takes the following parameters:
# main(<dataset name>: Dataset([Column("<column name>", <type>)], Column[...]))
def main(
titanic_dataset: Dataset([Column('Pclass', int), Column('Age', float)]),
change_title: Parameter('change_title', bool)=None,
chart_title: Parameter('chart_title', str)=None
):
# Load your dataset. Here, "titanic" is a Pandas dataframe.
titanic = api.load_dataset(titanic_dataset)

""" Start working with your data here. In this case, we'll calculate the average age of the passengers in each class and create a bar chart """

# Save the unique values from the Pclass column for later use
classes = pd.unique(titanic["Pclass"])

# Compute the average age for each Pclass. Remember that "titanic" is a Pandas dataframe and can be manipulated as such.
averages = titanic.groupby("Pclass")["Age"].mean()

# Store the values alone
avg_ages = []
for pclass in classes:
avg_ages.append(averages[pclass])

avg_ages = np.array(avg_ages)
classes = np.array(classes)

# Plot a bar chart with the x-axis Pclass the y-axis avgAge, accounting for the parameters the user can pass in to change th title
fig_name = ""
if change_title:
fig_name = chart_title
else:
fig_name = "AverageAgeByPclass"

plt.bar(classes, avg_ages)
plt.title("Average Age by Passenger Class")
plt.xlabel("Passenger Class")
plt.xticks(classes, classes)
plt.ylabel("Average Age")

""" Stop working with your data here. """

""" Use the api library to save your charts or datasets """

# Pass the chart you created to the api library so it can be shown in DataChat. Note that this chart is shown as an image, not an interactive chart.
api.show_figure(plt.figure(1), fig_name)

# Save the final dataset so you can access it in DataChat
api.save(titanic, "titanic1")

kwargs = api.read_args()
main(**kwargs)

Note that you must import the api library in order to load data in to your script and save it back to your session. See the Functions section for more information about the api library.

Also, note only specific Python libraries are supported. Unsupported libraries cannot run in DataChat and might cause your script to fail. See the Supported Libraries section for more information.

Functions

The following functions are available as part of DataChat's api package. Refer to the package documentation in the Supported Libraries section for other package-specific functions.

  • load_dataset(dataset_name: str). Loads a dataset as a pandas dataframe. It takes only the dataset_name argument, which is a string representing the name of the dataset. Note that if you create a script through a Jupyter notebook, this function is automatically added for you and the dataset_name argument matches the name of the dataset you used in your Launch utterance. Also, note that while the dataset_name argument will always match the name of the original dataset used to create the script, other datasets can still be used in your Run utterance.
  • save(dataframe, dataset_name: str). Saves a pandas dataframe as a dataset which can be used in a DataChat session in conjunction with Run. The save function is designed to be run within a script and is not for use directly from a notebook. The arguments include:
    • dataframe (dataframe) (required). The pandas dataframe to be saved.
    • dataset_name (string) (required). The name of the dataset to save as a dataframe.
  • export(dataframe, file_name, index=False, overwrite=False, duplicate=False). Exports a pandas dataframe to a CSV file that can be loaded into a DataChat session. The arguments include:
    • dataframe (dataframe) (required). The pandas dataframe to be exported.
    • file_name (string) (required). The name of the CSV file to be created.
    • index (Boolean) (optional). Whether to export the row indices of the dataframe. By default, this argument is set to False and the row indices are not exported.
    • overwrite (Boolean) (optional). Whether to overwrite an existing CSV file of the same name. By default, this argument is set to False.
    • duplicate (Boolean) (optional). Whether to create a duplicate file if another of the same name already exists. If this argument is True and the overwrite argument is False, a new file is created with a versioned suffix (e.g. my_file_1.csv if my_file.csv already exists).
  • show_figure(figure, figure_name: str). Saves a figure created with matplotlib or plotly so it can be shown as an image in DataChat. The arguments include:
    • figure (matplotlib or plotly figure) (required). The figure to save.
    • figure_name (string) (required). The name of the figure.

Supported Libraries

Because DataChat runs these scripts in an isolated Docker container, only the packages in the container can be used by your script. The container includes:

Save or Upload Scripts

In this section, we'll cover how to save your script and, if it wasn't created in a Jupyter notebook, upload it to DataChat. Then, refer to the Run Scripts section for information on running your scripts in other sessions.

Jupyter Notebooks

Once a Jupyter notebook is created, it is saved in DataChat and can be accessed across all sessions.

Other Python Scripts

You need to upload your Python scripts to DataChat before you can use them in a session. Scripts are uploaded as any other file: Through the File Manager. Refer to the Uploading Files topic for more information.

You can view all of the scripts you've uploaded by entering List all saved scripts.

Run Scripts

After you've created a notebook or uploaded your scripts, you can access the script across all sessions. For example, if you had saved a Jupyter notebook as a script named "AnalyzeTitanic," and you wanted to run it on a dataset called "Titanic.csv," you would:

  1. Load the dataset into your session. Refer to the Load Data Into a Session topic for more information.
  2. Use the Run skill to run your script on your dataset with the utterance Run the script AnalyzeTitanic on the dataset Titanic.

When the script finishes, the plots or datasets it created are shown in the display panel, and a link to any associated logs appears in the chat history. If the script fails, you can download the error log to troubleshoot the issue.

Error Handling

Sometimes, scripts can encounter errors while they run. Depending on how your script was created, there are different ways to address those errors.

If your script was created with a Jupyter notebook, DataChat creates a second notebook containing your script. You can use the links provided to open the notebook, address any errors, and save the notebook as a new script or overwrite the existing script.

If your script was created outside of a Jupyter notebook, an error log file is returned for you to download. You can then view the error logs, correct your script, and re-upload it to DataChat.

Edit Scripts

In this section, we'll cover how to edit your scripts.

Jupyter Notebooks

There are several ways to edit scripts created in Jupyter notebooks, depending on whether the session that created the notebook is still open and whether the script has been saved.

If your script is saved in DataChat:

  • If the session that created the notebook is still open, from within the session, enter in the chat box:

    Edit the notebook <notebook name> using the datasets <dataset name>

    The Jupyter notebook opens with the script <notebook name> in a new browser tab.

  • If your script is saved in DataChat and the original session and notebook are unavailable, or you would prefer to work in a new session, you can create a different Jupyter server instance with the saved script. Enter in the chat box:

    Load data from the file <dataset name>

    List all scripts

    Click on a row to see details about the script. If your script is listed in the All Saved Scripts table, enter:

    Edit the notebook <notebook name> using the datasets <dataset name>

    The Jupyter notebook opens with the script <notebook name> in a different browser tab. See Edit for more information.

    If your script is not listed, you will have to create your script again. Or you can save it in a notebook from an outside source.

If your script is not saved in DataChat, and the session that created the notebook is no longer open,

  1. In a new session, launch a new notebook. Refer to the Create Scripts > Jupyter Notebooks section for more information.
  2. Open your old script in your editor of choice. DataChat recommends using Visual Studio Code to open .ipynb files with their cell structure intact.
  3. Cell by cell, copy your old script from your editor to the new notebook.
  4. Make your changes in the notebook.
  5. Save your changes in the notebook.
  6. Save your notebook as a new script or overwrite the old script. Refer to the Save or Upload Scripts section for more information.

Other Python Scripts

To edit a script created outside of DataChat:

  1. Export the script.
  2. Open the script in your editor of choice.
  3. Make your changes.
  4. Upload the script to DataChat and overwrite the old version.

Share Scripts

You can share scripts with other users through DataChat. Those users can then run the scripts on their data and export the scripts to their machines. Refer to the Collaborate topic for more information.

Export Scripts

You can use the Export skill to export scripts that you've uploaded or that have been shared with you from DataChat to your local machine. This can be useful for troubleshooting or modifying your scripts before re-uploading them to DataChat.

Remove Scripts

You can remove scripts you've uploaded from DataChat using the Forget skill.