How to Handle Huge XLS Files

chuxin huo
3 min readJan 13, 2021

--

Excel displays poor performance in handling huge xls files. Usually we load an Excel file to the database and handle it using the database’s computing ability. Sometimes, however, the data can’t be wholly loaded to the database due to specific reasons. It would be great there was an application that can directly deal with massive xls files.

Take the employee information file emp.xls (Below is part of the source data):

And the state information file states.xls as an example (Below is part of the data):

Task: Join the two tables through emp’s STATE column and states’ NAME column and get records where SALARY is above 5000 and POPULATION is below 5 million.

It’s easy to do this with esProc.
You can download esProc installation package and free DSK edition license HERE.

1. Get records of states where POPULATION is below one million:

Write script wherexls.dfx in esProc:

A1 gets records of states where POPULATION is below one million using simple SQL.

Below is A1’s result after execution:

2. Group emp records by genders and count employees in each group:

Write script groupxls.dfx in esProc:

A1 groups emp by gender and count employees in each group using simple SQL.

Below is A1’s result after execution:

3. Join emp table and states table through emp.STATE and states.NAME and select records where state population is below 5 million and employee salary is above 5000:

Write script joinxls.dfx in esProc:

A1 performs join filtering over two tables using simple SQL.

Below is A1’s result after execution:

We can convert an xls file into a bin file. The bin file is esProc’s built-in binary file format. The format uses simple compression mechanism to store same size of data in smaller space and thus enables less time in reading it.

Take orders file orders.xls as an example. To convert it to a bin file, we use the following script:

Below is the disk space used by xls file and bin file:

Like an Excel file, a bin file can be directly handled with esProc. For example:

Group orders file orders.btx by year and list years when the total number of orders is less than 10,000:

Write script groupbtx.dfx in esProc:

A1 groups records by year and count orders for each year using simple SQL.

Below is A1’s result after execution:

--

--

No responses yet