Split
Split lets you clean and explore your data by splitting a column into multiple new columns. You can split a column based on a delimiter or at a certain position, such as the second character or the third number in the column.
Format
Split has two formats:
- Split <column> at the position <position> (calling the output columns <names>)
- Split <column> by the delimiter <delimiter> (into columns <names>)
Parameters
Split uses the following parameters:
- Column(required). The column you want to split.
- Position(required). The position at which you want to split the value. For example, if you want to split the value at the first character or number, enter 1.
- Delimiter(required). The delimiter you want to use to split the values in the column. This can be a number, letter, or symbol. A space (“ “) is the default delimiter.
- Names(optional). If you want to give the resulting columns custom names, enter the names as a comma-separated list.
Output
If the column is successfully split, the split columns are appended to your dataset and a sample of the updated dataset is shown in the Data tab and a success message is shown in the conversation history.
Otherwise, an error message is shown.
Examples
Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:
- Age. Their age.
- Gender. Their gender.
- Name. Their name.
- PClass. Their class.
- Cabin. The passenger’s cabin ID.
- Survived. Whether they survived the disaster.
Values in the Cabin column are often listed as “level-room number,” such as A-23. To split the Cabin column into a CabinLevel and CabinNumber column, enter Split Cabin by the delimiter - into columns CabinLevel, CabinNumber.