If you have experience of business application development, then most likely you will have encountered a requirement for the application to have a flexible reporting mechanism. The company I work for is mainly focused on developing business solutions and reporting is an essential, indeed, must have aspect of all the enterprise systems we develop. To enable flexible reporting in our systems, we have developed our own open-source (distributed under Apache 2.0 license) report generator - YARG (Yet Another Report Generator). Now YARG is a heart of the reporting in the CUBA platform - which itself is the basis of all the systems we develop.
Why is There a Need to Develop a New One
First of all, let me remark that we are not wheel reinventors. We are always looking for solutions to integrate with, as long as those solutions fit us. Unfortunately, in this case we couldn’t find any open source tool that met the following requirements we identified:
- Generate report in template format and/or convert the output to PDF
- Avoid use of external tools for report template creation (Microsoft Office or Libreoffice should be enough)
- Support various formats of templates: DOC, ODT, XLS, DOCX, XLSX, HTML
- Ability to use complex XLS and XLSX templates with charts, formulas and etc.
- Ability to use HTML layout and insert/embed images
- Split out data layer (report structure and data fetching) and presentation layer (report templates)
- Enable various methods of data fetching such as SQL, JPQL or Groovy script
- Ability to be integrated with IoC frameworks (Spring, Guice)
- Functionality to use the tool as a standalone application, to be able to use it bеyond of the Java ecosystem (e.g. to generate reports using PHP)
- Store report structure in a transparent XML format
The closest tool we could find was JasperReports, but there were a few blockers which stopped us from using it:
- The free version could not generate DOC reports (there was a commercial library which provided this functionality)
- XLS reports were very limited, and it was not possible to use charts, formulas and cell formatting
- To create a report, it is necessary to have a certain skillset and knowledge of how to use very specific tools (e.g. iReports)
- There is no clear separation between data and presentation layers
Of course we researched lots of other different tools, but all other libraries we found were focused on some specific format. We wanted to have a panacea for reporting - one tool for all kind of reports.
Taking in consideration all the points and thoughts listed above, we decided to develop another, but bespoke tool for report generation.
What’s Under the Hood
When we started YARG it was not a problem to find a library for XLS integration (POI-HSSF, JXLS, etc.). We decided to go for Apache POI as the most popular and well supported library.
The situation around DOC integration was completely the opposite. Only a very few options were available in the open-source market (POI-HWPF, COM and UNO Runtime). The POI-HWPF library was very limited in a number of ways and we did not consider it as a suitable option. We had to choose between COM and UNO Runtime, which is literally an API for the OpenOffice server side integration.
So, after deep investigation we decided to pick UNO Runtime, mainly because of the positive feedback from people who successfully employed it for systems coded in completely different languages, such as Python, Ruby, C#, etc.
While use of POI-HSSF was quite straightforward (except of charting), we faced a number of challenges integrating UNO Runtime:
- There is no clear API to work with tables
- Each time a report is generated, OpenOffice starts. Initially we used bootstrapconnector to manage OpenOffice processes, but later it became clear that in many cases it doesn’t kill the process after a report is generated. Thus, we had to re-implement the logic of OpenOffice start and shutdown (thanks to jodconverter developers, who pointed many good ideas on this matter)
- Also UNO Runtime (and OpenOffice Server itself) has significant problems with thread-safety and this can lead to the server freezing or terminating itself if an internal error occurs. To overcome this, we had to implement a mechanism to restart reports if the server fails, with an obvious detrimental consequence for performance
Later on, when the DOCX4J library became very mature and popular, we supported XLSX/DOCX. The main advantage of the DOCX4J library is that it provides the necessary low level access to the document structure (basically, you operate with XML). Another benefit of using DOCX4J is that it doesn’t require OpenOffice server integration for DOCX reports generation.
Also, there is an ability to use document with Freemarker markup as a report template. We usually use it to generate very custom reports as HTML and then convert the result to PDF.
Finally, YARG infrastructure is developed in an extendable way, so that experienced users can implement integration with any other template type by themselves.
Hello World Report
Let’s get to know YARG. The main idea of the report generator is to split out the data layer and presentation layer. The Data Layer enables scripting or direct SQL querying to fetch the desired information and the presentation layer represents the markup of the fetched data.
All YARG reports consist of so-called ‘Bands’. A Band is something that links up data and presentation layers together. So, every band knows where to get the data from and where it is placed in the template.
For example, we want to print out all our employees to an Excel spreadsheet. We will need to create the ‘Staff’ band and define an SQL query to fetch the list of the employees:
select name, surname, position from staff
Java Code
ReportBuilder reportBuilder = new ReportBuilder();
ReportTemplateBuilder reportTemplateBuilder = new ReportTemplateBuilder()
.documentPath("/home/haulmont/templates/staff.xls")
.documentName("staff.xls")
.outputType(ReportOutputType.xls)
.readFileFromPath();
reportBuilder.template(reportTemplateBuilder.build());
BandBuilder bandBuilder = new BandBuilder();
ReportBand staff= bandBuilder.name("Staff")
.query("Staff", "select name, surname, position from staff", "sql")
.build();
reportBuilder.band(staff);
Report report = reportBuilder.build();
Reporting reporting = new Reporting();
reporting.setFormatterFactory(new DefaultFormatterFactory());
reporting.setLoaderFactory(
new DefaultLoaderFactory().setSqlDataLoader(new SqlDataLoader(datasource)));
ReportOutputDocument reportOutputDocument = reporting.runReport(
new RunParams(report), new FileOutputStream("/home/haulmont/reports/staff.xls"));
select book.author as "author", book.name as "name", book.price as "price", count(*) as "count"
from book book where book.store_id = ${Shop.id}
group by book.author, book.name, book.price