Any type of tool, whether it's based on Perl, Java, or something else entirely, can be added to FITS. Certain tools can extract technical metadata for files (e.g. Jhove, Exiftool, NLNZ ME), while others can only identify file formats (Droid, FFident, File Utility). In addition, different tools support different formats. Jhove and NLNZ ME support a small set of popular preservation formats, while Exiftool and File utility support a wide range.
A tool wrapper must be created for the tool that encapsulates the complexities of invoking the tool, capturing the output, and converting it to FITS XML. A tool wrapper must implement the Tool.java interface and extend the ToolBase.java base class. The implementing class has two options for its constructor. The first is a simple no-argument constructor. Alternatively, it's possible to create a constructor with Fits.java as its sole argument should the tool need access to data from within the Fits instance. In either case the constructor should call super() on ToolBase. These two alternatives are implemented via Java Reflection in ToolBelt.createToolClassInstance(...). See existing tools in the codebase to use as examples.
It is the responsibility of the tool wrapper to convert the tool output into FITS XML and return a valid ToolOutput object. ToolOutput must contain a valid FITS XML JDOM object.
If the tool depends on a specific operating system, the necessary checks should be made within the tool wrapper to prevent execution on incompatible systems. For example Exiftool is written in Perl. The Exiftool tool wrapper checks for the operating system type and whether or not Perl is installed. It then can decide if it should use the standard Perl version of Exiftool or the windows executable.
For tools that natively return XML, XSLT can be used to convert the output to FITS XML. For tools that do not return XML, the output can either be a) directly converted to FITS XML, or b) converted to a basic intermediate XML format and then converted using XSLT. It’s possible for tools to output conflicting data when they actually mean the same thing. For example, one tool could report the format of a PNG image as “Portable Network Graphics”, while another may report “PNG”. A tool could report a sampling frequency unit of “2”, while another may report the text string “inches”. If left alone, these would cause false positive conflicts to appear in the FITS consolidated output. These differences are converted in the XSLT that converts the native tool output into FITS XML. In general, FITS prefers text strings to numeric values (“inches” instead of “2”), and complete format names to abbreviations (“Portable Network Graphics” instead of “PNG”). If new tools or formats are being added to FITS then thorough testing should be done to ensure that any false positive conflicts are resolved. If a tool prefers to output numeric values then these can be converted either using either the FITS mapping file, or during the conversion process from the native tool output to FITS xml.
The ToolBase abstract class implements the Tool interface and provides methods for applying XSLT transforms, checking for unknown identities and excluded extensions. The current tool wrappers all extend ToolBase.
Each tool's output is validated against the local FITS XML schema (xml/fits_output.xsd) when the ToolOutput object is created.
Any new tool wrappers must be added to the xml/fits.xml configuration file. FITS handles initializing the wrapper and sending the input file to it. See here for how to configure a <tool> elements within the fits.xml file.
Note: Any Java-based tool should have its JAR files placed within a sub-directory of the ‘lib’ directory. This needs to be configured in fits.xml in the <tool> element. See here for how to add a tool to FITS.
If in addition to format identification the tool can be used for technical metadata extraction, additional steps need to be implemented:
- Decide if the new format fits into one of the already supported format genres: image, text, audio, document
- If it does not, request that the new metadata genre be added to the fits schemas.
- Decide if XSLT will be used to transform the tool output into FITS XML
- If so, create an XSLT file to transform the technical metadata output into FITS XML.
- If a XSLT to format mapping file exists, add the new format to it. For example: xml/exiftool/exiftool_xslt_map.xml