Package com.gengoai.hermes.format
Class CoNLLFormat
- java.lang.Object
-
- com.gengoai.hermes.format.WholeFileTextFormat
-
- com.gengoai.hermes.format.CoNLLFormat
-
- All Implemented Interfaces:
DocFormat
,OneDocPerFileFormat
,Serializable
public class CoNLLFormat extends WholeFileTextFormat implements OneDocPerFileFormat, Serializable
Format Name: conll
The CoNLL format. The following additional parameters are available when reading/writing in CoNLL format:
- docPerSentence=[true|false]: One document per sentence when true (default: true).
- fields=<list of fields>: list of string denoting the field names (default: ["WORD", "POS", "CHUNK")]).
- fs=<String>: Field separator (default: "\\s+")
- overrideSentences=[true|false]: Override the CoNLL sentence boundaries with Hermes boundaries when true (default: false)
Currently, the following indexes are supported:
- INDEX - The index of the word in the sentence.
- WORD - The word.
- LEMMA - The lemmatized form of the word.
- UPOS - The universal part-of-speech tag of the word.
- POS - The part-of-speech tag of the word.
- CHUNK - IOB annotated Phrase Chunks.
- ENTITY - IOB annotated Named Entities.
- HEAD - The index of this word’s syntactic head in the sentence.
- DEP_REL - The dependency relation of this word to its head.
- IGNORE - Ignores the field.
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CoNLLFormat.CoNLLParameters
The type CoNLL parameters.static class
CoNLLFormat.Provider
The type Provider.
-
Field Summary
Fields Modifier and Type Field Description static ParameterDef<Boolean>
DOC_PER_SENTENCE
True create a document per sentence, False multiple sentences per documentstatic String
EMPTY_FIELD
Empty Field Contentstatic ParameterDef<String>
FIELD_SEPARATOR
The String used to separate fieldsstatic ParameterDef<List<String>>
FIELDS
The name of the fields in the CoNLL Filestatic ParameterDef<Boolean>
OVERRIDE_SENTENCES
True override sentence boundaries with Hermes boundaries
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocFormatParameters
getParameters()
protected Stream<Document>
readSingleFile(String file)
Converts the content of an entire file into one ore more documents.void
write(Document document, Resource outputResource)
Writes the given document in this format to the given output resource.-
Methods inherited from class com.gengoai.hermes.format.WholeFileTextFormat
read
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.gengoai.hermes.format.OneDocPerFileFormat
write
-
-
-
-
Field Detail
-
DOC_PER_SENTENCE
public static final ParameterDef<Boolean> DOC_PER_SENTENCE
True create a document per sentence, False multiple sentences per document
-
EMPTY_FIELD
public static final String EMPTY_FIELD
Empty Field Content- See Also:
- Constant Field Values
-
FIELDS
public static final ParameterDef<List<String>> FIELDS
The name of the fields in the CoNLL File
-
FIELD_SEPARATOR
public static final ParameterDef<String> FIELD_SEPARATOR
The String used to separate fields
-
OVERRIDE_SENTENCES
public static final ParameterDef<Boolean> OVERRIDE_SENTENCES
True override sentence boundaries with Hermes boundaries
-
-
Method Detail
-
getParameters
public DocFormatParameters getParameters()
- Specified by:
getParameters
in interfaceDocFormat
- Returns:
- the
DocFormatParameters
set for the instance of this foramt
-
readSingleFile
protected Stream<Document> readSingleFile(String file)
Description copied from class:WholeFileTextFormat
Converts the content of an entire file into one ore more documents.- Specified by:
readSingleFile
in classWholeFileTextFormat
- Parameters:
file
- the content- Returns:
- the stream of documents.
-
write
public void write(Document document, Resource outputResource) throws IOException
Description copied from interface:DocFormat
Writes the given document in this format to the given output resource.- Specified by:
write
in interfaceDocFormat
- Parameters:
document
- the documentoutputResource
- the output resource- Throws:
IOException
- Something went wrong writing the document
-
-