FITS XML

FITS converts the native output of each wrapped tool to a format called FITS XML which is described here. The XML schema for FITS XML is maintained by Harvard Library and located at http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd

The schema is divided into sections:

In addition, there are some key things to understand about the FITS XML output:

Identification section

This section contains the file format in one or more identity blocks. If all the tools that processed the file and could identify it came up with the same format, there will only be one identity block. If there were tools that processed the file that came up with an alternative format, there will be multiple identity blocks. The tools that identified the format will be nested within the identity elements. Some examples follow.

Ex. 1: Successful format identification

In this example, two tools (Jhove 1.5 and file utility 5.04) identified the format as Plain text with a MIME media type of text/plain.

<identification>
     <identity format="Plain text" mimetype="text/plain" toolname="FITS" toolversion="0.8.x">
          <tool toolname="Jhove" toolversion="1.5" />
          <tool toolname="file utility" toolversion="5.04" />
     </identity>
</identification>
 

 

Ex. 2: Format conflict

In this example, there is a "format conflict". The tool Exiftool 9.13 identified the format as PCD with MIME media type image/x-photo-cd, but the tool Tika 1.3 identified the format as MPEG-1 Audio Layer 3. Notice in this case that the identification element will carry an attribute status value of CONFLICT.
 
<identification status="CONFLICT">
     <identity format="PCD" mimetype="image/x-photo-cd" toolname="FITS" toolversion="0.8.x">
          <tool toolname="Exiftool" toolversion="9.13" />
     </identity>
     <identity format="MPEG-1 Audio Layer 3" mimetype="audio/mpeg" toolname="FITS" toolversion="0.8.x">
          <tool toolname="Tika" toolversion="1.3" />
     </identity>
</identification>
 

 

Fileinfo section

This section contains basic technical metadata that isn't specific to any format:

  • copyrightBasis element 
  • copyrightNote element
  • created element (file creation date)
  • creatingApplicationName element (name of the software used to create the file)
  • creatingApplicationVersion element (version of the software used to create the file)
  • creatingos element (Operating system used to create the file)
  • filepath element (full filepath to the file)
  • filename element (name of the file)
  • fslastmodified element (last modified date based on file system metadata)
  • inhibitorType element (type of file inhibitor)
  • inhibitorTarget element (what is being inhibited)
  • lastmodified element (last modified date based on metadata embedded in the file)
  • md5checksum element (MD5 value for the file)
  • rightsBasis element
  • size element (size of the file in bytes)

Each of the above elements will carry toolname and toolversion attributes to record the name of the tool that is the source of the information. In most cases there will also be a status attribute value equal to SINGLE_RESULT which means that there wasn't any conflicting information output by a tool. In some cases, for example if tools reported different file creation dates there will be a status value of CONFLICT.

 

Filestatus section

If any of the tools are able to validate files in this format, this section will contain validity information:

  • message element (more information from tools about what was found)
  • valid element (whether or not the file was found to be valid)
  • well-formed element (whether or not the file was found to be well-formed)

 

Metadata section

 

This section contains the format-specific technical metadata after each tool's native output has been normalized and consolidated by FITS. The elements in this section differ depending on the genre of the file format (audio, document, image, text, video). Each genre-specific section below lists the potential elements that can appear; the actual elements depend on what the tools are able to determine for the file. 

Audio elements

  • audioDataEncoding
  • avgBitRate
  • avgPacketSize
  • bitDepth
  • bitRate
  • blockAlign
  • blockSizeMax
  • blockSizeMin
  • byteOrder
  • channels
  • duration
  • maxBitRate
  • maxPacketSize
  • numPackets
  • numSamples
  • offset
  • sampleRate
  • software
  • soundField
  • time
  • wordSize

Document elements

  • author
  • hasAnnotations
  • hasOutline
  • isProtected
  • isRightsManaged
  • isTagged
  • language
  • pageCount
  • title

Image elements

  • apertureValue
  • bitsPerSample
  • brightnessValue
  • byteOrder
  • captureDevice
  • cfaPattern
  • cfaPattern2
  • colorMap
  • colorSpace
  • compressionScheme
  • digitalCameraManufacturer
  • digitalCameraModelName
  • digitalCameraSerialNo
  • exifVersion
  • exposureBiasValue
  • exposureIndex
  • exposureProgram
  • exposureTime
  • extraSamples
  • flash
  • flashEnergy
  • fNumber
  • focalLength
  • gpsAltitudeRef
  • gpsAltitude
  • gpsAreaInformation
  • gpsDateStamp
  • gpsDestBearing
  • gpsDestBearingRef
  • gpsDestDistance
  • gpsDestDistanceRef
  • gpsDestLatitude
  • gpsDestLatitudeRef
  • gpsDestLongitude
  • gpsDestLongitudeRef
  • gpsDifferential
  • gpsDOP
  • gpsImgDirection
  • gpsImgDirectionRef
  • gpsLatitude
  • gpsLatitudeRef
  • gpsLongitude
  • gpsLongitudeRef
  • gpsMapDatum
  • gpsMeasureMode
  • gpsProcessingMethod
  • gpsSatellites
  • gpsSpeed
  • gpsSpeedRef
  • gpsStatus
  • gpsTimeStamp
  • gpsTrack
  • gpsTrackRef
  • gpsVersionID
  • grayResponseUnit
  • iccProfileName
  • iccProfileVersion
  • imageHeight
  • imageProducer
  • imageWidth
  • isoSpeedRating
  • lightSource
  • maxApertureValue
  • meteringMode
  • oECF
  • orientation
  • primaryChromaticitiesBlueX
  • primaryChromaticitiesBlueY
  • primaryChromaticitiesGreenX
  • primaryChromaticitiesGreenY
  • primaryChromaticitiesRedX
  • primaryChromaticitiesRedY
  • qualityLayers
  • referenceBlackWhite
  • resolutionLevels
  • samplesPerPixel
  • samplingFrequencyUnit
  • scannerManufacturer
  • scannerModelName
  • scannerModelNumber
  • scannerModelSerialNo
  • scanningSoftwareName
  • scanningSoftwareVersionNo
  • sensingMethod
  • shutterSpeedValue
  • spectralSensitivity
  • subjectDistance
  • tileHeight
  • tileWidth
  • whitePointXValue
  • whitePointYValue
  • xSamplingFrequency
  • ySamplingFrequency
  • YCbCrCoefficients
  • YCbCrPositioning
  • YCbCrSubSampling

Text elements

  • charset
  • linebreak
  • markupBasis
  • markupBasisVersion
  • markupLanguage

Video elements

  • apertureSetting
  • bitDepth
  • bitRate
  • blockSizeMax
  • blockSizeMin
  • channels
  • creatingApplicationName
  • dataFormatType
  • digitalCameraManufacturer
  • digitalCameraModelName
  • duration
  • exposureTime
  • exposureProgram
  • fNumber
  • focus
  • frameRate
  • gain
  • gpsAltitude
  • gpsAltitudeRef
  • gpsAreaInformation
  • gpsDateStamp
  • gpsDestBearing
  • gpsDestBearingRef
  • gpsDestDistance
  • gpsDestDistanceRef
  • gpsDestLatitude
  • gpsDestLatitudeRef
  • gpsDestLongitude
  • gpsDestLongitudeRef
  • gpsDifferential
  • gpsDOP
  • gpsImgDirection
  • gpsImgDirectionRef
  • gpsLatitude
  • gpsLatitudeRef
  • gpsLongitude
  • gpsLongitudeRef
  • gpsMapDatum
  • gpsMeasureMode
  • gpsProcessingMethod
  • gpsSatellites
  • gpsSpeed
  • gpsSpeedRef
  • gpsStatus
  • gpsTimeStamp
  • gpsTrack
  • gpsTrackRef
  • gpsVersionID
  • imageHeight
  • imageStabilization
  • imageWidth
  • sampleRate
  • shutterSpeedValue
  • videoStreamType
  • whiteBalance
  • xSamplingFrequency
  • ySamplingFrequency

 

ToolOutput section

 

When the fits.xml file is configured to also output the native tool output, this section will contain the output from each tool that ran against the file, each surrounded by tool elements like this example:

<toolOutput>
     <tool name="Jhove" version="1.5">
          [Jhove's native output]
     </tool>
     <tool name="file utility" version="5.04">
          [file utility's native output]
     </tool>
     <tool name="Exiftool" version="9.13">
          [ExifTool's native output]
     </tool>
     <tool name="Droid" version="6.1.3">
          [Droid's native output]
     </tool>
     <tool name="NLNZ Metadata Extractor" version="3.4GA">
          [NLNZ Metadata Extractor's native output]
     </tool>
     <tool name="OIS File Information" version="0.1">
          [OIS File Information's native output]
     </tool>
     <tool name="ffident" version="0.2">
          [ffident's native output]
     </tool>
     <tool name="Tika" version="1.3">
          [Tika's native output]
     </tool>
</toolOutput>
 

 

Statistics section

In later versions of FITS this section was added to record how much time each wrapped tool spent processing the file. As shown in this example, when a tool isn't run against a file, a status attribute value of "did not run" is output:

<statistics fitsExecutionTime="3705">
     <tool toolname="OIS Audio Information" toolversion="0.1" status="did not run" />
     <tool toolname="ADL Tool" toolversion="0.1" status="did not run" />
     <tool toolname="Jhove" toolversion="1.5" executionTime="3703" />
     <tool toolname="file utility" toolversion="5.04" executionTime="95" />
     <tool toolname="Exiftool" toolversion="9.13" executionTime="167" />
     <tool toolname="Droid" toolversion="6.1.3" executionTime="18" />
     <tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" executionTime="11" />
     <tool toolname="OIS File Information" toolversion="0.1" executionTime="43" />
     <tool toolname="OIS XML Metadata" toolversion="0.2" status="did not run" />
     <tool toolname="ffident" toolversion="0.2" executionTime="6" />
     <tool toolname="Tika" toolversion="1.3" executionTime="20" />
</statistics>
 

 

Status attribute

If multiple tools disagree on a format identity or other metadata values, a status attribute is added to the element with a value of "CONFLICT". If only a single tool reports a format identity or other metadata value, a status attribute is added to the element with a value of "SINGLE_RESULT". If multiple tools agree on a an identity or value, and none disagree, the status attribute is omitted. A "PARTIAL" value is written when the format can only be partially identified, for example a format name is identified but not a MIME media type.

Tool ordering preference

The ordering preference of the tools in xml/fits.xml determines the ordering of conflicting values. If the report-conflict configuration option is set to false then only the tool that first reported the element is displayed and the other conflicting values are discarded.

Relationship between format identification and technical metadata

All tools that agree on a format identity are consolidated into a single <identity> section. Technical metadata is only output (and a part of the consolidation process) for tools that were able to identify the file and that are listed in the first <identity> section. All other output is discarded.

Tool output normalization

It’s possible for tools to output conflicting data when they actually mean the same thing. For example, one tool could report the format of a PNG image as “Portable Network Graphics”, while another may report “PNG”. A tool could report a sampling frequency unit of “2”, while another may report the text string “inches”. If left alone, these would cause false positive conflicts to appear in the FITS consolidated output. These differences are converted in the XSLT that converts the native tool output into FITS XML. In general, FITS prefers text strings to numeric values (“inches” instead of “2”), and complete format names to abbreviations (“Portable Network Graphics” instead of “PNG”). If new tools or formats are being added to FITS then thorough testing should be done to ensure that any false positive conflicts are resolved.