Best Practices
If you need some help getting the insights you want from the Data Assistant, try some of the tips below. These tips will help optimize your conversations to gain actionable insights from your data.
Prep Your Data
Before asking questions of your data, make sure it's properly prepared for the model to interpret.
Add Context to Datasets and Columns
You can use the Data Assistant Dictionary to add important context to your datasets and columns for more accurate responses. This is especially helpful for organization-specific terms, specific values in categorical columns, abbreviations, conversions, and more.
Rename Columns to be Semantically Meaningful
Some columns names in your dataset might be ambiguous or lack clear definitions. To enhance the model's ability to interpret and establish relationships between columns, it's important to use semantically meaningful names. For example, a column labeled "ticket" might actually represent purchase sales. While you understand that "ticket" refers to purchase sales, the model might not automatically make that connection. In this instance, renaming the column to "purchase_sales" can help the model interpret the data more effectively.
You can easily ask the Data Assistant to rename columns as needed, or you can follow the steps outlined in Rename Columns. Alternatively, you can define your columns with the Data Assistant Dictionary instead.
Fix Incorrect Column Types
Imported data can sometimes have incorrect columns types. It's important to verify that all columns in your dataset are correctly typed before using the Data Assistant. Specifically, date and numeric columns imported as strings cannot be properly interpreted by the model. Ensuring correct data types allows the model to execute queries effectively, such as performing calculations or applying conditional cleaning.
You can ask the Data Assistant to change column types as needed, or you can follow the steps outlined in Change Type.
Remove Duplicates
Duplicate data can lead to skewed analysis and unnecessary processing by the Data Assistant. We recommend removing any duplicate entries from your dataset.
To eliminate duplicates, you can simply ask the Data Assistant to "Remove all duplicates".
Structure Your Question
Properly structuring your questions to the Data Assistant can help to ensure the response provided is insightful.
Clearly Define Objectives
Clearly state the Data Assistant's task or objective. Whether your objective involves data summarization, chart generation, or answering targeted queries, a well-defined goal is essential.
Examples of question starters:
- "Generate a heatmap chart illustrating..."
- "Compute the average of..."
- "Identify the key factors influencing..."
- "What is the total count of..."
- "Show me the relationship between..."
Define Constraints
If there are specific constraints or guidelines you'd like the Data Assistant to adhere to, incorporate them into your questions. For example, when working with a dataset containing a column of mostly null values, consider specifying that this column should be excluded from the analysis. Similarly, if you have a specific column value you want to prioritize, be sure to include it in your question.
Examples of constraint-specific questions:
- "Which factors most affect test scores, excluding studentID."
- "Find the max and min temperatures for each season."
- "How many adult women are in first class."
- "Visualize the relationship between gross sales and marketing costs for the Midwest region."
Ask Specific Questions
Ambiguous or open-ended questions can yield generalized responses that may not fully address your needs. Ambiguities can come from terms such as "best," "big," and "good," which may carry different meanings depending on the context. For instance, the question "Which subscription package is best?" may lack precision and could be interpreted in various ways. To ensure accurate results, we can rephrase such questions into more specific forms, such as:
- "Which subscription package is the most affordable?"
- "Which subscription package has the most channels?"
- "What product performed the best based on total sales?"
You can also define these ambiguities using the Data Assistant Dictionary. For example, you could include in your dataset definition something like "The best salesperson has the highest average sales and highest average customer satisfaction".
Specify Time Ranges
When dealing with time-series data, it's important to specify the time range or period of interest in your questions. This ensures that the Data Assistant focuses its analysis on the relevant temporal context.
Examples of time-specific questions:
- "What are the top-selling items in Q2 2023?"
- "Generate a line chart depicting total sales for each month."
- "Identify businesses most likely to experience an increase in revenue over the next 4 weeks."
Troubleshoot Unexpected Responses
To effectively troubleshoot unexpected responses, monitor your questions, the generated results, and any adjustments you make. Refining and rephrasing questions is key, especially if the initial responses don't meet your expectations.
Break Up Complex Queries
Breaking up questions into smaller, more focused queries can help resolve unexpected responses and uncover insights more effectively. This approach not only makes the questions easier for the assistant to process but also allows problems to be tackled step by step.
For example, if a complex query like "Show me the average sales and customer satisfaction scores for each region over the last year, grouped by product category" doesn't return an expected result, you could break it into smaller queries:
- "What are the average sales for each region over the last year?"
- "What are the customer satisfaction scores for each region over the last year?"
- "Can you group these results by product category?"
Use Follow Ups to Fine-Tune Responses
The Data Assistant is built for interactivity. If the initial response isn't specific or detailed enough, a follow-up question or instruction can help adjust the results to better match your needs. This iterative process shapes the response progressively without starting over.
For example, a query like "What are the sales for each home entertainment product in Q4" might return total sales. You could refine this by following up with "Show me the average sales instead".
Clear the Topic
By default, the Data Assistant considers the context of previous questions and responses. For example, if you first ask, "Compute the average sales for each product in July", and then "List the top performing products", the assistant will only list the top products for July. Clearing the topic resets this context, allowing the assistant to analyze all available data without previous constraints.
Bring Datasets into Focus
Ensure that the appropriate datasets are in focus. When you begin a conversation with the Data Assistant, it uses the datasets currently in focus. Hiding a dataset will impact the Data Assistant’s responses only at the start of the conversation or when resetting the topic of the conversation, but it won’t affect how datasets are considered as the conversation progresses.
The Data Assistant tracks any new datasets it creates, automatically using them in follow-up questions. You can also add new datasets, which will be picked up in the next question. While you can’t remove datasets mid-conversation, you can start a new topic with only the datasets you want to focus on by hiding the irrelevant ones in the Data tab.
Check the Workflow
Check the underlying workflow. This lets you review each step in the Data Assistant's response, helping to identify where an error might have occurred. For instance, you may discover that the assistant used the right steps but applied them to the wrong dataset, or performed the correct computation on the wrong column. Adjust your question based on these insights for more precise results.
For example, if you ask, "Compute the average price for each product" but your dataset has separate "cost" and "sales" columns and no "price" column, you might receive an unexpected response. Checking the workflow might show that the assistant computed the average cost rather than sales. You can then refine your question to, "Compute the average sales for each product" to get the desired results.
Provide Feedback
If the above suggestions don't resolve the issue, you can provide feedback on the Data Assistant's responses or submit a detailed ticket to our development team to improve your experience.
Get Additional Help
Need help using DataChat? Simply ask the Data Assistant. For example, to learn how to add context to the Data Assistant Dictionary, you can ask something like: "How do I add dataset context?"
The Data Assistant will provide a summary of the process and a link to the relevant documentation for more details.