Class LexiconSpecification

  • All Implemented Interfaces:
    Specifiable, Serializable

    public class LexiconSpecification
    extends Object
    implements Specifiable, Serializable

    Lexicons are defined using a LexiconSpecification in the following format:

     
     lexicon:(mem|disk):name(:(csv|json))*::RESOURCE(;ARG=VALUE)*
     **
     

    The schema of the specification is "lexicon" and the currently supported protocols are: mem: An in-memory Trie-based lexicon. disk: A persistent on-disk based lexicon.The name of the lexicon is used during annotation to mark the provider. Additionally, a format (csv or json) can be specified, with json being the default if none is provided, to specify the lexicon format when creating in-memory lexicons. Finally, a number of query parameters (ARG=VALUE) can be given from the following choices:

    • caseSensitive=(true|false): Is the lexicon case-sensitive (true) or case-insensitive (false) (default false).
    • defaultTag=TAG: The default tag value for entry when one is not defined (default null).
    • language=LANGUAGE: The default language of entries in the lexicon (default Hermes.defaultLanguage()).
    • and the following for CSV lexicons:
      • lemma=INDEX: The index in the csv row containing the lemma (default 0).
      • tag=INDEX: The index in the csv row containing the tag (default 1).
      • probability=INDEX: The index in the csv row containing the probability (default 2).
      • constraint=INDEX: The index in the csv row containing the constraint (default 3).

    Author:
    David B. Bracewell
    See Also:
    Serialized Form
    • Constructor Detail

      • LexiconSpecification

        public LexiconSpecification()
    • Method Detail

      • parse

        public static LexiconSpecification parse​(@NonNull
                                                 @NonNull String specification)
        Parse the given specification string constructing a LexiconSpecification
        Parameters:
        specification - the specification
        Returns:
        the LexiconSpecification
      • create

        public Lexicon create()
                       throws IOException
        Create the lexicon from this specification
        Returns:
        the lexicon
        Throws:
        IOException - Something went wrong during the construction