Best Practices
If you need some help getting the insights you want from the Data Assistant, try some of the tips below. These tips will help optimize your conversations to gain actionable insights from your data.
Prep Your Data
Before asking questions of your data, make sure it's properly prepared for the model to interpret.
Rename Columns to be Semantically Meaningful
Some columns names in your dataset might be ambiguous or lack clear definitions. To enhance the model's ability to interpret and establish relationships between columns, it's important to use semantically meaningful names. For example, a column labeled "ticket" might actually represent purchase sales. While you understand that "ticket" refers to purchase sales, the model might not automatically make that connection. In this instance, renaming the column to "purchase_sales" can help the model interpret the data more effectively.
You can easily ask the Data Assistant to rename columns as needed, or you can follow the steps outlined in Rename Columns.
Fix Incorrect Column Types
Imported data can sometimes have incorrect columns types. It's important to verify that all columns in your dataset are correctly typed before using the Data Assistant. Specifically, date and numeric columns imported as strings cannot be properly interpreted by the model. Ensuring correct data types allows the model to execute queries effectively, such as performing calculations or applying conditional cleaning.
To change column types, refer to Change Type.
Remove Duplicates
Duplicate data can lead to skewed analysis and unnecessary processing by the Data Assistant. We recommend removing any duplicate entries from your dataset.
To eliminate duplicates, you can simply ask the Data Assistant to "Remove all duplicates".
Structure Your Question
Properly structuring your questions to the Data Assistant helps to ensure the response provided is insightful.
Clearly Define Objectives
Clearly state the Data Assistant's task or objective. Whether your objective involves data summarization, chart generation, or answering targeted queries, a well-defined goal is essential.
Examples of question starters:
- "Generate a heatmap chart illustrating..."
- "Compute the average of..."
- "Identify the key factors influencing..."
- "What is the total count of..."
- "Show me the relationship between..."
Define Constraints
If there are specific constraints or guidelines you'd like the Data Assistant to adhere to, incorporate them into your questions. For example, when working with a dataset containing a column of mostly null values, consider specifying that this column should be excluded from the analysis. Similarly, if you have a specific column value you want to prioritize, be sure to include it in your question.
Examples of constraint-specific questions:
- "Identify the factors most affecting test scores, excluding the ID column."
- "Find the maximum and minimum temperatures for each season."
- "Determine the number of adult women in first class."
- "Visualize the relationship between gross sales and marketing costs for the Midwest region."
- "Group individuals by age in 15-year increments."
Ask Specific Questions
Ambiguous or open-ended questions can yield generalized responses that may not fully address your needs. Ambiguities can come from terms such as "best," "big," and "good," which may carry different meanings depending on the context. For instance, the question "Which subscription package is best?" may lack precision and could be interpreted in various ways. To ensure accurate results, we can rephrase such questions into more specific forms, such as:
- "Which subscription package is the most affordable?"
- "Which subscription package has the most channels?"
- "What product performed the best based on total sales?"
Specify Time Ranges
When dealing with time-series data, it's important to specify the time range or period of interest in your questions. This ensures that the Data Assistant focuses its analysis on the relevant temporal context.
Examples of time-specific questions:
- "What are the top-selling items in Q2 2023?"
- "Generate a line chart depicting total sales for each month."
- "Identify businesses most likely to experience an increase in revenue over the next 4 weeks."
Troubleshoot Unexpected Responses
To effectively troubleshoot unexpected responses, monitor your questions, the generated results, and any adjustments you make. Refining and rephrasing questions is key, especially if the initial responses don't meet your expectations.
Clear the Topic
The first step is to clear the topic. By default, the Data Assistant considers the context of previous questions and responses. For example, if you first ask, "Compute the average sales for each product in July", and then "List the top performing products", the assistant will only list the top products for July. Clearing the topic resets this context, allowing the assistant to analyze all available data without previous constraints.
Bring Datasets into Focus
Next, ensure that the appropriate datasets are in focus. When you begin a conversation with the Data Assistant, it uses the datasets currently in focus. Hiding a dataset will impact the Data Assistant’s responses only at the start of the conversation or when resetting the topic of the conversation, but it won’t affect how datasets are considered as the conversation progresses.
The Data Assistant tracks any new datasets it creates, automatically using them in follow-up questions. You can also add new datasets, which will be picked up in the next question. While you can’t remove datasets mid-conversation, you can start a new topic with only the datasets you want to focus on by hiding the irrelevant ones in the Data tab.
Check the Workflow
Next, check the underlying workflow. This lets you review each step in the Data Assistant's response, helping to identify where an error might have occurred. For instance, you may discover that the assistant used the right steps but applied them to the wrong dataset, or performed the correct computation on the wrong column. Adjust your question based on these insights for more precise results.
For example, if you ask, "Compute the average price for each product" but your dataset has separate "cost" and "sales" columns and no "price" column, you might receive an unexpected response. Checking the workflow might show that the assistant computed the average cost rather than sales. You can then refine your question to, "Compute the average sales for each product" to get the desired results.
Provide Feedback
If clearing the topic and checking the workflow don't resolve the issue, you can provide feedback on the Data Assistant's responses or submit a detailed ticket to our development team to improve your experience.