FITS converts the native output of each wrapped tool to a format called FITS XML which is described here. The XML schema for FITS XML is maintained by Harvard Library and located at http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd.
The schema is divided into sections:
In addition, there are some key things to understand about the FITS XML output:
- status attribute
- tool ordering preference
- the relationship between format identification and technical metadata
- tool output normalization
This section contains the file format in one or more identity blocks. If all the tools that processed the file and could identify it came up with the same format, there will only be one identity block. If there were tools that processed the file that came up with an alternative format, there will be multiple identity blocks. The tools that identified the format will be nested within the identity elements. Some examples follow.
Ex. 1: Successful format identification
In this example, two tools (Jhove 1.5 and file utility 5.04) identified the format as Plain text with a MIME media type of text/plain.
Ex. 2: Format conflict
This section contains basic technical metadata that isn't specific to any format:
- copyrightBasis element
- copyrightNote element
- created element (file creation date)
- creatingApplicationName element (name of the software used to create the file)
- creatingApplicationVersion element (version of the software used to create the file)
- creatingos element (Operating system used to create the file)
- filepath element (full filepath to the file)
- filename element (name of the file)
- fslastmodified element (last modified date based on file system metadata)
- inhibitorType element (type of file inhibitor)
- inhibitorTarget element (what is being inhibited)
- lastmodified element (last modified date based on metadata embedded in the file)
- md5checksum element (MD5 value for the file)
- rightsBasis element
- size element (size of the file in bytes)
Each of the above elements will carry toolname and toolversion attributes to record the name of the tool that is the source of the information. In most cases there will also be a status attribute value equal to SINGLE_RESULT which means that there wasn't any conflicting information output by a tool. In some cases, for example if tools reported different file creation dates there will be a status value of CONFLICT.
If any of the tools are able to validate files in this format, this section will contain validity information:
- message element (more information from tools about what was found)
- valid element (whether or not the file was found to be valid)
- well-formed element (whether or not the file was found to be well-formed)
This section contains the format-specific technical metadata after each tool's native output has been normalized and consolidated by FITS. The elements in this section differ depending on the genre of the file format (audio, document, image, text, video). Each genre-specific section below lists the potential elements that can appear; the actual elements depend on what the tools are able to determine for the file.
When the fits.xml file is configured to also output the native tool output, this section will contain the output from each tool that ran against the file, each surrounded by tool elements like this example:
In later versions of FITS this section was added to record how much time each wrapped tool spent processing the file. As shown in this example, when a tool isn't run against a file, a status attribute value of "did not run" is output:
If multiple tools disagree on a format identity or other metadata values, a status attribute is added to the element with a value of "CONFLICT". If only a single tool reports a format identity or other metadata value, a status attribute is added to the element with a value of "SINGLE_RESULT". If multiple tools agree on a an identity or value, and none disagree, the status attribute is omitted. A "PARTIAL" value is written when the format can only be partially identified, for example a format name is identified but not a MIME media type.
The ordering preference of the tools in xml/fits.xml determines the ordering of conflicting values. If the report-conflict configuration option is set to false then only the tool that first reported the element is displayed and the other conflicting values are discarded.
All tools that agree on a format identity are consolidated into a single <identity> section. Technical metadata is only output (and a part of the consolidation process) for tools that were able to identify the file and that are listed in the first <identity> section. All other output is discarded.
It’s possible for tools to output conflicting data when they actually mean the same thing. For example, one tool could report the format of a PNG image as “Portable Network Graphics”, while another may report “PNG”. A tool could report a sampling frequency unit of “2”, while another may report the text string “inches”. If left alone, these would cause false positive conflicts to appear in the FITS consolidated output. These differences are converted in the XSLT that converts the native tool output into FITS XML. In general, FITS prefers text strings to numeric values (“inches” instead of “2”), and complete format names to abbreviations (“Portable Network Graphics” instead of “PNG”). If new tools or formats are being added to FITS then thorough testing should be done to ensure that any false positive conflicts are resolved.