deduplicating with related columns

chuxin huo
2 min readJul 21, 2021

Example

Example1

There is an Excel file Book1.xlsx, and part of the data is as follows:

Remove data with duplicate id and name. If there is a non-empty name with the same id, the data with empty name will also be deleted. The results are as follows:

Write SPL script:

A1 Read excel file

A2 Group by id, deduplicate by name in the group, after deduplication, if there are more than two data in the group, then filter out the data with non-empty name, otherwise do not filter and merge the results of each group

A3 Export results to result.xlsx

Example2

There is an Excel file Book1.xlsx, and part of the data is as follows:

Remove duplicates (the column order is irrelevant), the results are as follows:

Write SPL script:

A1 Read excel file

A2 Sort each row of data, group to remove duplicates, and take the first data in the group

A3 Export results to result.xlsx

--

--