All Classes (com.gengoai:hermes 1.1-SNAPSHOT API)

All Classes Interface Summary Class Summary Enum Summary
Class	Description
Action	An action defines a processing step to perform on a `Corpus` with a given `Context` which results in either modifying the corpus or the context.
AffixFeaturizer	The type Affix featurizer.
AnnotatableType	An annotatable type is one that can be added to a document through annotation either by `Corpus.annotate(AnnotatableType...)` or `Document.annotate(AnnotatableType...)`.
AnnotatableType.Deserializer
AnnotatableType.KeyDeserializer
AnnotatableType.Serializer
AnnotatableTypeConverter	Mango Converter to automatically Convert other objects (Json and Strings) into `AnnotatableType`s
Annotate	The type Annotate processor.
Annotation	An annotation is an `HString` that associates an `AnnotationType`, e.g.
AnnotationPipeline	Helper class for determining the correct sequence of annotators to apply on a Document in order for to satisfy the given AnnotatableType.
AnnotationSet	An AnnotationSet acts as the storage mechanism for annotations associated with a document.
AnnotationType	An AnnotationType defines an `Annotation`, which is a typed (e.g.
Annotator	An annotator processes documents adding one or more `AnnotatableType`s.
AttributeMap	Specialized HashMap for storing `AttributeType`s and their values that correctly handles json serialization / deserialization and allows for checked type gets.
AttributeType<T>	An AttributeType defines a named Attribute that can be added to an HString.
BackreferenceTransition
BaseHStringMLModel	The type Base h string ml model.
BaseWorkflowIO
BasicCategories	A basic set of categories to describe words which is useful for inferring higher level concepts.
BasicCategoryFeature
BreakIteratorTokenizer	A tokenizer implementation based on Java's BreakIterator class
CaduceusProgram	Caduceus, pronounced ca·du·ceus, is a rule-based information extraction system.
CategoryProcessor
CoNLLColumnProcessor	Interface defining how to process a column from a CoNLL formatted document.
CoNLLEvaluation	Evaluation used in CoNLL for Named Entity Recognition.
CoNLLFormat	Format Name: conll
CoNLLFormat.CoNLLParameters	The type CoNLL parameters.
CoNLLFormat.Provider	The type Provider.
CoNLLRow	A Row (token) in a CoNLL formatted file
Context	Contexts are a specialized map that act as a shared memory for a Workflow.
ContextualizedEmbedding
Corpus	A persistent collection of documents each having a unique document ID.
CsvFormat	Format Name: csv
CsvFormat.CSVParameters	The type Csv parameters.
CsvFormat.Provider	The type Provider.
DefaultCategoryAnnotator	Default annotator for basic categories, which is limited to nouns.
DefaultDependencyAnnotator	Default dependency annotator that uses MaltParser.
DefaultEntityAnnotator	Default Entity Annotator that realizes the Entity annotation through sub-annotators defined using the configuration setting: `com.gengoai.hermes.annotator.DefaultEntityAnnotator.subTypes`.
DefaultLemmaAnnotator	Default Lemmatization annotator that uses the `Lemmatizer` registered with the token's language to perform lemmatization.
DefaultMlEntityAnnotator	Default Machine-Learning based Entity annotator.
DefaultPartOfSpeechAnnotator	Default Part-of-Speech annotator that uses a `POSTagger` machine learning model.
DefaultPhraseChunkAnnotator	Default Phrase Chunk annotator that use an IOBTagger.
DefaultSentenceAnnotator	Default Sentence Annotator that works reasonably well on tokenized text.
DefaultStemAnnotator	Default Stem annotator that uses the `Stemmer` registered with the token's language to perform stemming.
DefaultTokenAnnotator	Default token annotator that uses the `Tokenizer` registered with the token's language to perform tokenization.
DefaultTokenTypeEntityAnnotator	Default annotator for `TokenType` entities that maps `TokenType`s to `EntityType`.
DefaultTransliterationAnnotator	Annotates tokens with their transliteration using ICU4Js Transliterator class.
DependencyLinkProcessor	Processes dependency governor (parent) information in CoNLL Files
DependencyRelationProcessor	Processes dependency relation information in CoNLL Files
DiacriticalMarkNormalizer	Removes diacritics
DiskLexicon	A `PersistentLexicon` that stores `LexiconEntry` on disk facilitating the use of very lexicons with little memory overhead.
DistributionalLexiconGenerator<T extends Tag>	Generates a lexicon based on similarity in an embedding space where positive and negative examples can be given per tag category.
DocFormat	A DocFormat defines how to read and write documents in a given format.
DocFormatParameters	The type Doc format parameters.
DocFormatProvider	A provider for `DocFormat` for use within Java's service loader framework.
DocFormatService	Service for handling `DocFormat` and `DocFormatProvider`.
Document	A document represents text content with an accompanying set of metadata (Attributes), linguistic overlays (Annotations), and relations between elements in the document.
Document.AnnotationBuilder	Annotation builder for creating annotations associated with a document
DocumentCollection	A document collection represents a temporary collection of documents often used for ad-hoc analytics or to import documents into a corpus
DocumentFactory	A document factory facilitates the creation of document objects performing any predefined preprocessing, e.g.
DocumentFactory.DocumentFactoryBuilder	Builder for `DocumentFactory`s
Downloader
ElmoNERModel
ElmoSeq2SeqModel
ElmoTokenEmbedding
EmbeddingSimilarity	Implementation of a `HStringSimilarity` that calculates similarity based on the similarity between the HStrings in embedding space.
ENEntityAnnotator	Default Entity annotator for English
ENLemmatizer	English language lemmatizer based on WordNet's Morphy
ENLexicons	Lexicons used by the English Tokenizer.
ENPOSTagger	Default English language Part-of-Speech Annotator that uses a combination of machine learning and post-ml corrective rules.
ENPOSValidator	English language sequence labeling validator for part-of-speech tags.
ENStemmer	Default English language stemmer using Porter Stemmer.
ENStopWords	English StopWords
Entities	Predefined set of common entities.
EntityTagger
EntityType	Tag type associated with Entity annotations.
EntityType.Converter	The type Converter.
ENTokenizer	English language tokenizer
Extraction	An extraction is the output generated by an `Extractor`.
Extractor	Fundamental to text mining in Hermes is the concept of a `Extractor and the Extraction it produces.`
ExtractorBasedSimilarity	An implementation of an `HStringSimilarity` that uses an Apollo Similarity measure to determine the similarity between two `HString` based on the extraction from a given `Extractor`.
Features	The type Features.
FeaturizingExtractor	Combines an `Extractor` with an Apollo `Featurizer` allowing for the output of the extractor to be directly used as features for machine learning.
Fragments	Convenience methods for constructing orphaned and empty fragments.
FuzzyLexiconAnnotator	A lexicon annotator that allows gaps to occur in multi-word expressions.
Hermes	Convenience methods for getting common configuration options.
HermesJsonFormat	Format Name: hjson
HermesJsonFormat.Provider	The type Provider.
HString	An HString (Hermes String) is a Java String on steroids.
HStringDataSetGenerator	An extension to a DataSetGenerator that allows for the incoming documents to be broken up into multiple Datum based on a given `AnnotationType`.
HStringDataSetGenerator.Builder	Builder Class for HStringDataSetGenerator
HStringMLModel	The interface H string ml model.
HStringSimilarity	Interface defining a methodology for computing the similarity between two `HString`.
HtmlEntityNormalizer	Normalizes xml and html entities, such as `&`
ImportDocuments
IndexProcessor	Processes token index information in CoNLL Files
IOB
IOBFieldProcessor	Base processor for IOB (Inside, Outside, Beginning) annotations in CoNLL Files
IOBTagger	Creates annotations based on the IOB tag output of an underlying model.
IOBValidator	Sequence validator ensuring correct IOB tag output
KeywordExtraction
KeywordExtractor	A keyword extractor determines the important words, phrases, or concepts in `HString` returning a counter of keywords and their corresponding scores.
LemmaProcessor	Processes lemma information in CoNLL Files
Lemmatizer	Defines the interface for lemmatizing tokens.
Lemmatizers	Factory class for creating/retrieving lemmatizers for a given language
LexicalFeatures
Lexicon	A traditional approach to information extraction incorporates the use of lexicons, also called gazetteers, for finding specific lexical items in text.
LexiconAnnotator	Annotator that provides annotations based on a lexicon.
LexiconEntry	An entry in a lexicon defining the lemma, probability, tag, and any constraints on matching
LexiconGenerator<T extends Tag>	Defines a methodology for constructing a lexicon for a set of tags.
LexiconIO	Utility methods reading and writing Lexicon
LexiconIO.CSVParameters	The type Csv parameters.
LexiconManager	Manages the creation and access to Lexicons
LexiconMatch	Value class for matches made by lexicons
LexiconSpecification	Lexicons are defined using a `LexiconSpecification` in the following format:
LyreDSL	Static functions allowing for a functional style DSL for constructing LyreExpressions.
LyreExpression	A LyreExpression represents a series of steps to perform over an input `HString` which can be used for querying (i.e.
LyreExpressionType	Enumeration of the different types Lyre Expressions
MorphologicalFeatureProcessor
MultiPhaseExtractor	A `FeaturizingExtractor` that breaks the extraction process into the follow parts: Extracts annotations of the given types. Trims the extractions, if a trim method is defined. Filters the extractions, if a trim method is defined.
MultiPhaseExtractor.MultiPhaseExtractorBuilder<T extends MultiPhaseExtractor,V extends MultiPhaseExtractor.MultiPhaseExtractorBuilder<T,V>>
NamedEntityProcessor	Processes Named Entities in CoNLL Format.
NeuralNERModel
NFA	Implementation of a non-deterministic finite state automata that works on a Text
NGramExtractor	A `MultiPhaseExtractor` implementation that extracts n-grams over the desired annotation types.
NGramExtractor.Builder	Builder Class for constructing `NGramExtractor`
NoOptProcessor	No Operation Processor
NPClusteringKeywordExtractor	Implementation of the NP Clustering Keyword Extractor presented in:
OneDocPerFileFormat	Defines a format in which only one document is written per file.
PartOfSpeech	Interface defining a part-of-speech.
PartOfSpeechConverter
PennTreeBank	Part-of-speech tags defined by Penn Treebank
PennTreebankFormat	Format Name: ptb
PennTreebankFormat.Provider
PersistentLexicon	Base class for lexicon implementations that are persistent, meaning added entries are persisted between runs.
PhraseChunkProcessor	Processes Shallow Parse information (Phrase Chunks) in CoNLL Format
PhraseChunkTagger
PorterStemmer	Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
POSCorrection	Corrects POS tags to conform to HERMES format
POSFieldProcessor	Processes part-of-speech fields
POSFormat	Format Name: pos
POSFormat.Provider	The type Provider.
POSTagger	A `HStringMLModel` for assigning `PartOfSpeech` to tokens.
PredefinedFeatures	The type Predefined features.
PredefinedFeatures.PredefinedFeaturizer	The type Predefined featurizer.
PrefixSearchable	Interface defining a lexicon or word list that can be searched using prefixes
ProgressLogger	Defines a logger that keeps track of the number of documents and words processed and reports processing statistics on a given interval.
Query	Defines the methodology for matching documents based on simple boolean logic over term and document level attributes.
QueryParser	Simple query to predicate constructor for basic keyword queries over corpora.
RakeKeywordExtractor	Implementation of the RAKE keyword extraction algorithm as presented in:
RegexAnnotator	Annotator that constructs annotations based on regular expression matches.
RegexExtractor	An Extractor implementation that searches for a given regular expression pattern in the document.
Relation	Relations provide a mechanism to link two Annotations.
RelationDirection	Directionality of a relation.
RelationEdge	A specialized annotation graph edge that stores relation type and value.
RelationEdgeFactory	Factory class for constructing `RelationEdge`s
RelationGraph	A graph where vertices are annotations and edges represent relations.
RelationType	Dynamic enumeration of known types of relations that can exist between annotations.
ResourceType	Defines common resource used by Hermes and methods for finding configuration values and resources for them.
SearchExtractor	An Extractor implementation that searches for a given search text in the document.
SearchResults	A collection of `Document` obtained via querying a `Corpus`.
SentenceLevelAnnotator	Base for annotators that work at the sentence level.
SequentialWorkflow	Entry point to sequentially processing a corpus via one ore more `Action`s.
SimpleWordList	Simple implementation of a `WordList` backed by a HashSet
SpellChecker	The type Spellchecker module.
StandardTokenizer	This class is a scanner generated by JFlex 1.5.0-SNAPSHOT from the specification file `/home/ik/prj/gengoai/hermes-pom/core/src/main/jflex/StandardTokenizer.jflex`
State	Defines an action state which can be LOADED where the action has loaded its state or NOT_LOADED meaning the action has no state to load.
Stemmer	Defines the interface for stemming tokens.
Stemmers	Factory class for creating/retrieving stemmers for a given language
StopWords	Defines a methodology for determining if an HString or String is a stopword for a given language.
StopWords.NoOptStopWords	StopWords implementation that treats everything as a content word.
SubTypeAnnotator	An annotator that provides its annotation by annotating for sub-types.
Summarizer	Interface defining an `Extractor` that generates summaries for given `HString` and specifically documents.
SuperSenseProcessor
TagDecoder
TaggedFormat	Format Name: tagged
TaggedFormat.Provider	The type Provider.
TaggedFormat.TaggedParameters	The type Tagged parameters.
TensorFlowSequenceLabeler
TermCounts	The type Term extraction processor.
TermExtractor	Implementation of the `MultiPhaseExtractor` for extracting terms where a term is a single annotation (TOKEN by default).
TermExtractor.Builder	Builder Class for constructing `TermExtractor`
TermKeywordExtractor	Implementation of a `KeywordExtractor` that extracts and scores terms based on a given `FeaturizingExtractor`*.
TextNormalization	Class takes care of normalizing text using a number of `TextNormalizer`s.
TextNormalizer	Defines a methodology for normalizing a string.
TextRank	Implementation of the TextRank algorithm for keyword extraction as defined in: Mihalcea, R., Tarau, P.: "Textrank: Bringing order into texts".
TextRankSummarizer	Implementation of the TextRank algorithm for summarization as defined in: Mihalcea, R., Tarau, P.: "Textrank: Bringing order into texts".
TFIDFKeywordExtractor	Keyword extractor that scores words based on their TFIDF value.
Tokenizer	Low level tokenization of strings
Tokenizer.Token	An internal token
Tokenizers	Factor methods for constructing `Tokenizer`s
TokenMatch	A match from a `TokenRegex` pattern on an input HString.
TokenMatcher	The TokenMatcher class allows for iterating of the matches, extracting the match or named-groups within the match, the starting and ending offset of the match, and conversion into a TokenMatch object which records the current state of the match.
TokenRegex	Hermes provides a token-based regular expression engine that allows for matches on arbitrary annotation types, relation types, and attributes, while providing many of the operators that are possible using standard Java regular expressions.
TokenType	Defines the type for a given token.
TraditionalToSimplified	Preprocessor that converts traditional characters into simplified characters.
TrieLexicon	Implementation of `Lexicon` usng a Trie data structure.
TrieWordList	Implementation of a `WordList` backed by a Trie
TwitterSearchFormat	Format Name: twitter_search
TwitterSearchFormat.Provider	The type Provider.
TxtFormat	Format Name: text
TxtFormat.Provider	The type Provider.
Types	Common Annotatable Types.
UnicodeNormalizer	Converts unicode to canonical form and removes smart quotes.
UniversalFeature	Enumeration of the Universal features as defined inUniversal Dependencies (UD) framework
UniversalFeatureSet	A set of `UniversalFeature` and their associated `UniversalFeatureValue`
UniversalFeatureValue	Values associated with `UniversalFeature`s
UniversalSentenceEncoder
UPOSProcessor	Processes universal part-of-speech information
ValueCalculator	The enum Value calculator.
ViterbiAnnotator	An abstract base annotator that uses the Viterbi algorithm to find text items in a document.
WhitespaceNormalizer	Handles normalizing whitespace.
WholeFileTextFormat	Defines a format in which files need to be completely read in order to generate documents.
WordList	Word lists provide a set like interface to set of vocabulary items.
WordProcessor	Processes words
Workflow	A workflow represents a set of _actions_ to perform on an document collection.