Package com.gengoai.hermes.format
Class PennTreebankFormat
- java.lang.Object
-
- com.gengoai.hermes.format.WholeFileTextFormat
-
- com.gengoai.hermes.format.PennTreebankFormat
-
- All Implemented Interfaces:
DocFormat
,OneDocPerFileFormat
,Serializable
public class PennTreebankFormat extends WholeFileTextFormat implements OneDocPerFileFormat
Format Name: ptb
Reader for Penn Treebank
mrg
files. Provides the following AnnotatableType:- TOKEN
- SENTENCE
- PART_OF_SPEECH
- CONSTITUENT_PARSE, which adds NON_TERMINAL_NODE annotations and SYNTACTIC_HEAD relations
Function tags are represented on the SYNTACTIC_HEAD relation with the NON_TERMINAL_NODE annotations only have the base part-of-speech. Note this removes all
-None-
entries.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PennTreebankFormat.Provider
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocFormatParameters
getParameters()
protected Stream<Document>
readSingleFile(String content)
Converts the content of an entire file into one ore more documents.void
write(Document document, Resource outputResource)
Writes the given document in this format to the given output resource.-
Methods inherited from class com.gengoai.hermes.format.WholeFileTextFormat
read
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.gengoai.hermes.format.OneDocPerFileFormat
write
-
-
-
-
Method Detail
-
getParameters
public DocFormatParameters getParameters()
- Specified by:
getParameters
in interfaceDocFormat
- Returns:
- the
DocFormatParameters
set for the instance of this foramt
-
readSingleFile
protected Stream<Document> readSingleFile(String content)
Description copied from class:WholeFileTextFormat
Converts the content of an entire file into one ore more documents.- Specified by:
readSingleFile
in classWholeFileTextFormat
- Parameters:
content
- the content- Returns:
- the stream of documents.
-
-