# Compute

### Description

`Compute`

lets you perform calculations and aggregations on one or more columns on your dataset. You can also group values based on certain columns, sort the results, name the resulting columns, and more.

### Format

`Compute the <aggregations> for each <group> where <predicates> sorted by <column name, order> such that <statement> showing the <limit, number of rows> and call the computed columns <labels>`

### Parameters

The parameters used in `Compute`

include:

`aggregations`

(required). A comma-separated list of calculations to perform, including:`average <numeric column> | <math expression>`

. Returns the average of all the values in the column or values returned by the expression.`concatenation of <column> using the delimiter <delim>`

. Concatenates the values of the given column into a single string using the specified delimiter between each value.`count of <records> | <column>`

. Returns the number of records (including duplicates) in the dataset or specified column.`distinct count of <records> | <column>`

. Returns the number of records (excluding duplicates) in the dataset or specified column.`distinct percentage of`

`expression <expression>`

. Returns the output of the expression.`largest <column> | <math expression>`

. Returns the largest (maximum) value in the column or in the values returned by the expression. This is a synonym for the`maximum`

aggregation.`least <column> | <math expression>`

. Returns the smallest (minimum) value in the column or in the values returned by the expression. This is a synonym for the`minimum`

aggregation.`maximum <column> | <math expression>`

. Returns the maximum value in the column or in the values returned by the expression. This is a synonym for the`largest`

aggregation.`median <column> | <math expression>`

. Returns the median value in the column or in the values returned by the expression.`minimum <column> | <math expression>`

. Returns the smallest value in the column. This is a synonym for the`least`

aggregation.`most <column> | <math expression>`

. Returns the value that appears the most often in the column or in the values returned by the expression.`percentage of <column>`

. If no grouping columns are specified with`for each`

, this returns the percentage of records that are not null. If grouping columns are specified, this returns the percentage of the target column's total that is made up by each grouping column.`percentile of <column> at <percent>`

. Returns the value of the column at the specified percentile.`proportion of <column>`

. If no grouping columns are specified with`for each`

, this returns the percentage of rows that are not null. If`proportion of records`

is used, this returns the percentage of valid records in the target column divided by the total number of records in that column.`smallest <column> | <math expression>`

. Returns the smallest (minimum) value in the column or in the values returned by the expression. This is a synonym of the`minimum`

and`least`

`standard deviation of <numeric column> | <math expression>`

(numeric columns only)`absolute median deviation of <numeric column> | <math expression>`

(numeric columns only)`total <numeric column> | <math expression>`

(numeric columns only)`group`

(optional). One or more columns or column groups used to group the results of the aggregations. Note that column groups are available only at skill level 3.

`predicates`

(optional). Operators used to compare two values, including:`is after <aggregate date> | <date or time value>`

`is before <aggregate date> | <date or time value>`

`is (not) between the dates <date1> to <date2>`

`is (not) equal to the aggregate date <date or time value>`

`is on or before <date or time value>`

`is on or after <date or time value>`

`is (not) null`

`is (not) blank`

`is (not) equal to the column <column name>`

`is (not) equal to <value>`

`equals <value>`

`does not equal <value>`

`is (not) <value>`

`is (not) equal to the math expression <expression>`

`equal to the value <value>`

`is (not) greater than <value>`

`(not) greater than <value>`

`is (not) more than <value>`

`is at most <value>`

`is (not) smaller than <value>`

`(not) smaller than <value>`

`is (not) less than <value>`

`is at least <value>`

`less than <value>`

`matches the value <value>`

`starts with <string>`

`limit, number of rows`

(optional). Limits the resulting dataset to the specified number of rows in the specified part of the dataset. You can choose from the top, bottom, first, and last parts of the dataset. For example, "top 5" limits the resulting dataset to the first five rows.`column name, order`

(optional). Used to sort the resulting dataset based on one or more columns in ascending or descending order.`statement`

(optional). Allows you to run a secondary calculation (also known as a nested query) to further refine the results of your calculations.`labels`

(optional). A comma-separated list of labels used as the names of any newly-created columns. Note that this list must match the number of aggregations. Otherwise, the newly-created columns are given names in the order of the aggregations.

### Output

If the calculations are successful, the updated dataset becomes [dataset]_Compute.

### Examples

Consider a dataset called "Titanic" that contains information on each passenger, including the following columns:

`Age`

. Their age.`Gender`

. Their gender.`Name`

. Their name.`PClass`

. Their class.`Survived`

. Whether they survived the disaster.

To compute the average age for all passengers, enter `Compute average Age`

To compute the standard deviation of age for passengers in each class and sort the result by class values in ascending order, enter `Compute standard deviation of Age for each PClass sorted by PClass in ascending order`

To compute both the percentages of first class passengers who survived and those who did not, enter `Compute percentage of records for each Survived where PClass is equal to the value 1`

To compute the number of passengers of each gender in each class group, enter `Compute count of records for each PClass, Gender`

To compute the number and average ages of passengers who did and didn't survive and show the results in two new columns named `AvgAge`

and `NumPeople`

, respectively, enter `Compute count of records, average Age for each Survived calling the output columns NumPeople, AvgAge`

To compute the number of people in each passenger class that have average age greater than 30, enter `Compute count of records for each PClass such that average Age is greater than 30`

#### Computing Proportions vs. Percentages

Consider the following dataset:

ID | Value |
---|---|

1 | 100 |

1 | 100 |

2 | 200 |

3 | 400 |

Computing a proportion calculates a percentage of non-null rows as a proportion of the total data. For example, running `Compute the proportion of Value for each ID`

on the example dataset returns the following dataset:

ID | ProportionOfValue |
---|---|

1 | 50% |

2 | 25% |

3 | 25% |

In this case, of the four rows in the dataset, two of them have an ID of 1 (50 percent of the total) and there is one row for each ID of 2 and 3 (25 percent of the total, respectively).

Computing a percentage calculates the percentage each grouping column makes up of the total of the target column. For example, running `Compute the percentage of Value for each ID`

on the example dataset returns the following dataset:

ID | PercentageOfValue |
---|---|

1 | 25% |

2 | 25% |

3 | 50% |

In this case, we're computing the percentage of the total of the Value column made up by the values of each ID. Here, the total of the Value column is 800, and the two rows with an ID of 1 combine to make up 25 percent (200/800) of that total.