Class CoNLLFormat

  • All Implemented Interfaces:
    DocFormat, OneDocPerFileFormat, Serializable

    public class CoNLLFormat
    extends WholeFileTextFormat
    implements OneDocPerFileFormat, Serializable

    Format Name: conll

    The CoNLL format. The following additional parameters are available when reading/writing in CoNLL format:

    • docPerSentence=[true|false]: One document per sentence when true (default: true).
    • fields=<list of fields>: list of string denoting the field names (default: ["WORD", "POS", "CHUNK")]).
    • fs=<String>: Field separator (default: "\\s+")
    • overrideSentences=[true|false]: Override the CoNLL sentence boundaries with Hermes boundaries when true (default: false)

    Currently, the following indexes are supported:

    • INDEX - The index of the word in the sentence.
    • WORD - The word.
    • LEMMA - The lemmatized form of the word.
    • UPOS - The universal part-of-speech tag of the word.
    • POS - The part-of-speech tag of the word.
    • CHUNK - IOB annotated Phrase Chunks.
    • ENTITY - IOB annotated Named Entities.
    • HEAD - The index of this word’s syntactic head in the sentence.
    • DEP_REL - The dependency relation of this word to its head.
    • IGNORE - Ignores the field.

    See Also:
    Serialized Form