deduplicating for row-style Excel

chuxin huo
2 min readJul 21, 2021

A file has multiple columns of data. The first row is the column name, and the second row is the data record. Compare the key columns in the file, delete the duplicate rows in the key column or keep only the duplicate rows.

The current sales order table order_2018.xlsx for 2018, some of the data are as follows:

1. Remove duplicates

Example 1: Find all the different customer IDs who purchased products in 2018 and save them in the file 2018c.xlsx.

The esProc SPL script is as follows:

A1 Read the data in order_2018.xlsx

A2 Take out all unique CustomerIds in A1

A3 write the data in A2 into the file 2018c.xlsx

Example 2: Find out which different products each customer purchased in 2018, and save CustomerId and ProductId in the file 2018c_p.xlsx.

The esProc SPL script is as follows:

A1 Read in two columns of CustomerId and ProductId in order_2018.xlsx

A2 Groupe by key column, @1 means that only one record in the group will be taken

A3 Write the data in A2 into the file 2018c_p.xlsx

2. Only keep duplicates

Example: Find the order status of repeat customers (customers who bought the same product multiple times) in 2018, and save the results in the file 2018c_rebuy.xlsx.

The esProc SPL script is as follows:

A1 Read order_2018.xlsx data

A2 Groupe by key column, the orders of the same customer for the same product are divided into a group

A3 Select groups with orders greater than 1, and combine the orders of each group into a data table

A4 Write the data in A3 to the file 2018c_rebuy.xlsx

--

--