PDFsam: How to upgrade a Maven application for Debian

pdfsam

In the coming weeks and months I intend to write a mini series about packaging Java software for Debian. The following article basically starts in the middle of this journey because the PDFsam upgrade is still fresh in my mind. It requires some preexisting knowledge about build tools like Maven and some Java terminology. But do not fear. Hopefully it will make sense in the end when all pieces fall into place.
A month ago I decided to upgrade PDFsam, a Java application to split, merge, extract, mix and rotate PDF documents. The current version 1.1.4 is already seven years old and uses Ant as its build system. Unfortunately up to now nobody was interested enough to invest the time to upgrade it to the latest version. A quick internet search unveils that the current sources can be found on github.com. Another brief look reveals we are dealing with a Maven project here because we can find a pom.xml file in the root directory and there is no sign of Ant's typical build.xml file anymore. Here are some general tips how to proceed from this point by using the PDFsam upgrade as an example.

Find out how many new dependencies you really need

The pom.xml file declares its dependencies in the <dependencies> section. It is good practice to inspect the pom.xml file and determine how much work will be required to upgrade the package. A seasoned Java packager will quickly find common dependencies like Hibernate or the Apache Commons libraries. Fortunately for you they are already packaged in Debian because a lot of projects depend on them. If you are unsure what is and what is not packaged for Debian, tracker.debian.org and codesearch.debian.net are useful tools to search for those packages. If in doubt just ask on debian-java@lists.debian.org. There is no automagical tool (yet) to find out what dependencies are really new (we talk about mh_make soon) but if you use the aforementioned tools and websites you will notice that in June 2017 one could not find the following artifacts: fontawesomefx, eventstudio, sejda-* and jackson-jr-objects. There are also jdepend and testFx but notice they are marked as <scope>test</scope> meaning they are only required if you would like to run upstream's test suite as well. For the sake of simplicity, it is best to ignore them for now and to focus on packaging only dependencies which are really needed to compile the application. Test dependencies can always be added later.

This pom.xml investigation leads us to the following conclusion: PDFsam depends on Sejda, a PDF library. Basically Sejda is the product of a major refactoring that happened years ago and allows upstream to develop PDFsam faster and in multiple directions. For Debian packagers it is quite clear now that the "upgrade" of PDFsam is in reality more like packaging a completely new application. The inspection of Sejda's pom.xml file (another Maven project) reveals we also have to package imgscalr, Twelvemonkeys and SAMBox. We continue with these pom.xml analyses and end up with these new source packages: jackson-jr, libimgscalr-java, libsambox-java, libsejda-java, libsejda-injector-java, libsejda-io-java, libsejda-eventstudio-java, libtwelvemonkeys-java, fontawesomefx and libpdfbox2-java. Later I discovered that gettext-maven-plugin was also required.

This was not obvious at first glance if you only check the pom.xml in the root directory but PDFsam and Sejda are multi-module projects! In this case every subdirectory (module) contains another pom.xml with additional information, so ideally you should check those too before you decide to start with your packaging. But don't worry it is often possible to ignore modules with a simple --ignore  rule inside your debian/*.poms file. The package will have less functionality but it can be still useful if you only need a subset of the modules. Of course in this case ignoring the gettext-maven-plugin artifact would result in a runtime error. C'est la vie.

A brief remark about Java package names: Java library packages must be named like libXXX-java. This is important for binary packages to avoid naming collisions. We are more tolerant when it comes to source package names but in general we recommend to use the exact same name as for the binary package. There are exceptions like prefixing source packages with their well known project name like jackson-XXX or jboss-XXX but this should only be used when there are already existing packages that use such a naming scheme. If in doubt, talk to us.

mh_make or how to quickly generate an initial debian directory

Packaging a Maven library is usually not very difficult even if it consists of multiple modules. The tricky part is to get the maven.rules, maven.IgnoreRules and your *.poms file right but debian/rules often only consists of a single dh line and the rest is finding the build-dependencies and adding them to debian/control.
A small tool called mh_make, which is included in maven-debian-helper, can lend you a helping hand. The tool is not perfect yet. It requires that most build-dependencies are already installed on your local system, otherwise it won't create the initial debian directory and will only produce some unfinished (but in some cases still useful) files.
A rule of thumb is to start with a package that does not depend on any other new dependency and requires the fewest build-dependencies.  I have chosen libtwelvemonkeys-java because it was the simplest package and met the aforementioned criteria.

Here is how mh_make looks like in action. (The animated GIF was created with Byzanz) First of all download the release tarball, unpack it and run mh_make inside the root directory.

Ok, what is happening here? First you can choose a source and binary package name. Then disable the tests and don't run javadoc to create the documentation. This will simplify things a little.  Tests and javadoc settings can be added later. Choose the version you want to package and then you can basically follow the default recommendations and confirm them by hitting the Enter key. Throughout the project we choose to transform the upstream version with the symbolic "debian" version. Remember that Java/Maven is version-centric. This will ensure that our Maven dependencies are always satisfied later and we can simply upgrade our Maven libraries and don't have to change the versions by hand in various pom.xml files; maven-debian-helper will automatically transform them for us to "debian". Enable all modules. If you choose not to, you can select each module individually. Note that later on some of the required build-dependencies cannot be found because they are either not installed (libjmagick6-java) or they cannot be found in Debian's Maven repository under /usr/share/maven-repo.  You can fix this by entering a substitution rule or, as I did in this case, you can just ignore these artifacts for now. They will be added to maven.IgnoreRules. In order to successfully compile your program you have to remove them from this file later again, create the correct substitution rule in maven.rules and add the missing build-dependencies to debian/control. For now we just want to quickly create our initial debian directory.

If everything went as planned a complete debian directory should be visible in your root directory. The only thing left is to fix the substitution rule for the Servlet API 3.1. Add libservlet3.1-java to Build-Depends and the following rule to maven.rules:
javax.servlet s/servlet-api/javax.servlet-api/ * s/.*/3.1/ * *
s/javax.servlet/javax.servlet.jsp/ s/jsp-api/javax.servlet.jsp-api/ * s/.*/2.3/ * *

The maven.rules file consists of multiple rows separated by six columns. The values represent groupId, artifactId, type, version number and two fields which I never use. 🙂 You can just use an asterisk to match any value. Every value can be substituted. This is necessary when the value of upstream's pom.xml file differs from Debian's system packages. This happens frequently for API packages which are uploaded to Maven Central multiple times under a different groupId/artifactId but provide the same features. In this case the Twelvemonkeys' pom requires an older API version but Debian is already at version 3.1. Note that we require a strict version number in this case because libservlet3.1-java does not use a symbolic debian version since we provide more than one Servlet API in the archive and this measure prevents conflicts.
Thanks for reading this far. More articles about Java packaging will follow in the near future and hopefully they will clarify some terms and topics which could only be briefly mentioned in this post.

before

and after