How I escaped Apache-POI and found my dream 3rd party excel writer

Deepak Choudhary
4 min readJun 7, 2019
Apache POI is a popular giant when it comes to handling excel files

This is how I got rid of slow excel file generation, memory exhaustion traps and finally found the one solution which worked wonders for my team and project.

Before we move on to what worked for me, let’s take a look at the problem statement.

PROBLEM

We need to gather records from our SQL database which match a given set of query parameter, populate these results in an excel sheet and let the client[in this case an Angular app] download it. The columns in the excel sheet should be auto adjusted to content width.

OLD FRIEND

Enter Apache POI. Apache POI has long been the go to tool for Java developers to read and write excel files. Hence, we hailed the old conqueror and integrated it in our Spring Boot backend to generate Excel files[.xlsx]. Things seemed to be going well and good until one day the load increased. Result sets started growing larger now. A couple of thousand records cap had now increased to lakhs[1 lakh = 100000] of records.

PANIC has entered the chat room…

OUR VAIN EFFORTS

Developers who have worked with spreadsheets know that Apache POI supports the following : HSSF, XSSF and SXSSF. We were previously using XSSF workbook which was giving us memory issues when generating excel files with lakhs of records. The application would run out of memory and crash.

We switched to the SXSSF workbooks for a while but couldn’t get the auto column width adjustment to work. Although SXSSF did solve majority of problems, there were still some that needed to be addressed.

A DISCOVERY IS MADE…

My search for the perfect solution to my woes landed me at this page : https://github.com/dhatim/fastexcel.

Reading the following excerpt from the page had my mouth salivating already at the possibilities that could be :

There are not many alternatives when you have to generate xlsx Excel workbooks in Java. The most popular one (Apache POI) includes many features, but when it comes down to huge worksheets it quickly becomes a memory hog.

Its streaming API may mitigate this problem but it introduces several limitations:

  • Its sliding window mechanism prevents you from accessing cells above the current writing position.
  • It writes stuff to a temporary file.
  • It comes with an overhead on the file size because shared strings are disabled by default. Enabling shared strings is likely to consume much more heap if you deal with string values.

So, fastexcel has been created to offer an alternative with the following key points:

  • Limited set of features (basic style support, no graph support yet) and very simple API.
  • Reduced memory footprint and high performance by accumulating only necessary elements. XML stuff is piped to the output stream at the end.
  • Multithreading support: each worksheet in the workbook can be generated by a different thread, while fully supporting shared strings and styles.

TALK IS CHEAP, SHOW ME THE STATS

Below is the scenario explained for the comparison.

Report Generated : An excel report with 22 columns.

Server location : Local

Client location : Local

DB location : Singapore

Response time comparison when generating 196 records
Response time comparison when generating 95136 records
Response time comparison when generating 224376 records

NOTE : All timings are average of 5 attempts and network lag has not been taken into account.

Here are some charts from the github page showing the memory footprint and other comparisons :

Generation time comparison with POI
Memory footprint comparison with POI

NOTE : Heap memory usage is measured just before flushing the workbook to the output stream.

CONCLUSION

Fast-excel was like a ray of sunshine after days of a thunderous storm between users, managers and code that crashes when put to test with huge loads.

If you liked this article please send me claps. It’ll let me know you’d like to read more articles like this, and it’ll help other people discover the article as well. Comments and criticism are welcome too.

--

--

Deepak Choudhary

Technology evangelist engineering solutions on weekdays and exploring life on the weekends. The joy of life lies in the gray zone.