All Classes Interface Summary Class Summary Enum Summary
Class |
Description |
Action |
An action defines a processing step to perform on a Corpus with a given Context which results in
either modifying the corpus or the context.
|
AffixFeaturizer |
The type Affix featurizer.
|
AnnotatableType |
|
AnnotatableType.Deserializer |
|
AnnotatableType.KeyDeserializer |
|
AnnotatableType.Serializer |
|
AnnotatableTypeConverter |
Mango Converter to automatically Convert other objects (Json and Strings) into AnnotatableType s
|
Annotate |
The type Annotate processor.
|
Annotation |
|
AnnotationPipeline |
Helper class for determining the correct sequence of annotators to apply on a Document in order for to satisfy the
given AnnotatableType.
|
AnnotationSet |
An AnnotationSet acts as the storage mechanism for annotations associated with a document.
|
AnnotationType |
An AnnotationType defines an Annotation , which is a typed (e.g.
|
Annotator |
|
AttributeMap |
Specialized HashMap for storing AttributeType s and their values that correctly handles json serialization /
deserialization and allows for checked type gets.
|
AttributeType<T> |
An AttributeType defines a named Attribute that can be added to an HString.
|
BackreferenceTransition |
|
BaseHStringMLModel |
The type Base h string ml model.
|
BaseWorkflowIO |
|
BasicCategories |
A basic set of categories to describe words which is useful for inferring higher level concepts.
|
BasicCategoryFeature |
|
BreakIteratorTokenizer |
A tokenizer implementation based on Java's BreakIterator class
|
CaduceusProgram |
Caduceus, pronounced ca·du·ceus, is a rule-based information extraction system.
|
CategoryProcessor |
|
CoNLLColumnProcessor |
Interface defining how to process a column from a CoNLL formatted document.
|
CoNLLEvaluation |
Evaluation used in CoNLL for Named Entity Recognition.
|
CoNLLFormat |
Format Name: conll
|
CoNLLFormat.CoNLLParameters |
The type CoNLL parameters.
|
CoNLLFormat.Provider |
The type Provider.
|
CoNLLRow |
A Row (token) in a CoNLL formatted file
|
Context |
Contexts are a specialized map that act as a shared memory for a Workflow.
|
ContextualizedEmbedding |
|
Corpus |
A persistent collection of documents each having a unique document ID.
|
CsvFormat |
Format Name: csv
|
CsvFormat.CSVParameters |
The type Csv parameters.
|
CsvFormat.Provider |
The type Provider.
|
DefaultCategoryAnnotator |
Default annotator for basic categories, which is limited to nouns.
|
DefaultDependencyAnnotator |
Default dependency annotator that uses MaltParser.
|
DefaultEntityAnnotator |
Default Entity Annotator that realizes the Entity annotation through sub-annotators defined using the configuration
setting: com.gengoai.hermes.annotator.DefaultEntityAnnotator.subTypes .
|
DefaultLemmaAnnotator |
Default Lemmatization annotator that uses the Lemmatizer registered with the
token's language to perform lemmatization.
|
DefaultMlEntityAnnotator |
Default Machine-Learning based Entity annotator.
|
DefaultPartOfSpeechAnnotator |
Default Part-of-Speech annotator that uses a POSTagger machine learning model.
|
DefaultPhraseChunkAnnotator |
Default Phrase Chunk annotator that use an IOBTagger.
|
DefaultSentenceAnnotator |
Default Sentence Annotator that works reasonably well on tokenized text.
|
DefaultStemAnnotator |
Default Stem annotator that uses the Stemmer registered with the
token's language to perform stemming.
|
DefaultTokenAnnotator |
Default token annotator that uses the Tokenizer registered with the
token's language to perform tokenization.
|
DefaultTokenTypeEntityAnnotator |
|
DefaultTransliterationAnnotator |
Annotates tokens with their transliteration using ICU4Js Transliterator class.
|
DependencyLinkProcessor |
Processes dependency governor (parent) information in CoNLL Files
|
DependencyRelationProcessor |
Processes dependency relation information in CoNLL Files
|
DiacriticalMarkNormalizer |
Removes diacritics
|
DiskLexicon |
|
DistributionalLexiconGenerator<T extends Tag> |
Generates a lexicon based on similarity in an embedding space where positive and negative examples can be given per
tag category.
|
DocFormat |
A DocFormat defines how to read and write documents in a given format.
|
DocFormatParameters |
The type Doc format parameters.
|
DocFormatProvider |
A provider for DocFormat for use within Java's service loader framework.
|
DocFormatService |
|
Document |
A document represents text content with an accompanying set of metadata (Attributes), linguistic overlays
(Annotations), and relations between elements in the document.
|
Document.AnnotationBuilder |
Annotation builder for creating annotations associated with a document
|
DocumentCollection |
A document collection represents a temporary collection of documents often used for ad-hoc analytics or to import
documents into a corpus
|
DocumentFactory |
A document factory facilitates the creation of document objects performing any predefined preprocessing, e.g.
|
DocumentFactory.DocumentFactoryBuilder |
|
Downloader |
|
ElmoNERModel |
|
ElmoSeq2SeqModel |
|
ElmoTokenEmbedding |
|
EmbeddingSimilarity |
Implementation of a HStringSimilarity that calculates similarity based on the similarity between the
HStrings in embedding space.
|
ENEntityAnnotator |
Default Entity annotator for English
|
ENLemmatizer |
English language lemmatizer based on WordNet's Morphy
|
ENLexicons |
Lexicons used by the English Tokenizer.
|
ENPOSTagger |
Default English language Part-of-Speech Annotator that uses a combination of machine learning and post-ml corrective
rules.
|
ENPOSValidator |
English language sequence labeling validator for part-of-speech tags.
|
ENStemmer |
Default English language stemmer using Porter Stemmer.
|
ENStopWords |
English StopWords
|
Entities |
Predefined set of common entities.
|
EntityTagger |
|
EntityType |
Tag type associated with Entity annotations.
|
EntityType.Converter |
The type Converter.
|
ENTokenizer |
English language tokenizer
|
Extraction |
An extraction is the output generated by an Extractor .
|
Extractor |
Fundamental to text mining in Hermes is the concept of a Extractor and the Extraction it
produces.
|
ExtractorBasedSimilarity |
An implementation of an HStringSimilarity that uses an Apollo Similarity measure to determine the
similarity between two HString based on the extraction from a given Extractor .
|
Features |
The type Features.
|
FeaturizingExtractor |
Combines an Extractor with an Apollo Featurizer allowing for the output of the extractor to be
directly used as features for machine learning.
|
Fragments |
Convenience methods for constructing orphaned and empty fragments.
|
FuzzyLexiconAnnotator |
A lexicon annotator that allows gaps to occur in multi-word expressions.
|
Hermes |
Convenience methods for getting common configuration options.
|
HermesJsonFormat |
Format Name: hjson
|
HermesJsonFormat.Provider |
The type Provider.
|
HString |
An HString (Hermes String) is a Java String on steroids.
|
HStringDataSetGenerator |
An extension to a DataSetGenerator that allows for the incoming documents to be broken up into multiple Datum based
on a given AnnotationType .
|
HStringDataSetGenerator.Builder |
Builder Class for HStringDataSetGenerator
|
HStringMLModel |
The interface H string ml model.
|
HStringSimilarity |
Interface defining a methodology for computing the similarity between two HString .
|
HtmlEntityNormalizer |
Normalizes xml and html entities, such as &
|
ImportDocuments |
|
IndexProcessor |
Processes token index information in CoNLL Files
|
IOB |
|
IOBFieldProcessor |
Base processor for IOB (Inside, Outside, Beginning) annotations in CoNLL Files
|
IOBTagger |
Creates annotations based on the IOB tag output of an underlying model.
|
IOBValidator |
Sequence validator ensuring correct IOB tag output
|
KeywordExtraction |
|
KeywordExtractor |
A keyword extractor determines the important words, phrases, or concepts in HString returning a counter
of keywords and their corresponding scores.
|
LemmaProcessor |
Processes lemma information in CoNLL Files
|
Lemmatizer |
Defines the interface for lemmatizing tokens.
|
Lemmatizers |
Factory class for creating/retrieving lemmatizers for a given language
|
LexicalFeatures |
|
Lexicon |
A traditional approach to information extraction incorporates the use of lexicons, also called gazetteers, for
finding specific lexical items in text.
|
LexiconAnnotator |
Annotator that provides annotations based on a lexicon.
|
LexiconEntry |
An entry in a lexicon defining the lemma, probability, tag, and any constraints on matching
|
LexiconGenerator<T extends Tag> |
Defines a methodology for constructing a lexicon for a set of tags.
|
LexiconIO |
Utility methods reading and writing Lexicon
|
LexiconIO.CSVParameters |
The type Csv parameters.
|
LexiconManager |
Manages the creation and access to Lexicons
|
LexiconMatch |
Value class for matches made by lexicons
|
LexiconSpecification |
|
LyreDSL |
Static functions allowing for a functional style DSL for constructing LyreExpressions.
|
LyreExpression |
A LyreExpression represents a series of steps to perform over an input HString which can be used for
querying (i.e.
|
LyreExpressionType |
Enumeration of the different types Lyre Expressions
|
MorphologicalFeatureProcessor |
|
MultiPhaseExtractor |
A FeaturizingExtractor that breaks the extraction process into the follow parts:
Extracts annotations of the given types.
Trims the extractions, if a trim method is defined.
Filters the extractions, if a trim method is defined.
|
MultiPhaseExtractor.MultiPhaseExtractorBuilder<T extends MultiPhaseExtractor,V extends MultiPhaseExtractor.MultiPhaseExtractorBuilder<T,V>> |
|
NamedEntityProcessor |
Processes Named Entities in CoNLL Format.
|
NeuralNERModel |
|
NFA |
Implementation of a non-deterministic finite state automata that works on a Text
|
NGramExtractor |
|
NGramExtractor.Builder |
|
NoOptProcessor |
No Operation Processor
|
NPClusteringKeywordExtractor |
Implementation of the NP Clustering Keyword Extractor presented in:
|
OneDocPerFileFormat |
Defines a format in which only one document is written per file.
|
PartOfSpeech |
Interface defining a part-of-speech.
|
PartOfSpeechConverter |
|
PennTreeBank |
Part-of-speech tags defined by Penn Treebank
|
PennTreebankFormat |
Format Name: ptb
|
PennTreebankFormat.Provider |
|
PersistentLexicon |
Base class for lexicon implementations that are persistent, meaning added entries are persisted between runs.
|
PhraseChunkProcessor |
Processes Shallow Parse information (Phrase Chunks) in CoNLL Format
|
PhraseChunkTagger |
|
PorterStemmer |
Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
|
POSCorrection |
Corrects POS tags to conform to HERMES format
|
POSFieldProcessor |
Processes part-of-speech fields
|
POSFormat |
Format Name: pos
|
POSFormat.Provider |
The type Provider.
|
POSTagger |
|
PredefinedFeatures |
The type Predefined features.
|
PredefinedFeatures.PredefinedFeaturizer |
The type Predefined featurizer.
|
PrefixSearchable |
Interface defining a lexicon or word list that can be searched using prefixes
|
ProgressLogger |
Defines a logger that keeps track of the number of documents and words processed and reports processing statistics on
a given interval.
|
Query |
Defines the methodology for matching documents based on simple boolean logic over term and document level
attributes.
|
QueryParser |
Simple query to predicate constructor for basic keyword queries over corpora.
|
RakeKeywordExtractor |
Implementation of the RAKE keyword extraction algorithm as presented in:
|
RegexAnnotator |
Annotator that constructs annotations based on regular expression matches.
|
RegexExtractor |
An Extractor implementation that searches for a given regular expression pattern in the document.
|
Relation |
Relations provide a mechanism to link two Annotations.
|
RelationDirection |
Directionality of a relation.
|
RelationEdge |
A specialized annotation graph edge that stores relation type and value.
|
RelationEdgeFactory |
|
RelationGraph |
A graph where vertices are annotations and edges represent relations.
|
RelationType |
Dynamic enumeration of known types of relations that can exist between annotations.
|
ResourceType |
Defines common resource used by Hermes and methods for finding configuration values and resources for them.
|
SearchExtractor |
An Extractor implementation that searches for a given search text in the document.
|
SearchResults |
|
SentenceLevelAnnotator |
Base for annotators that work at the sentence level.
|
SequentialWorkflow |
Entry point to sequentially processing a corpus via one ore more Action s.
|
SimpleWordList |
Simple implementation of a WordList backed by a HashSet
|
SpellChecker |
The type Spellchecker module.
|
StandardTokenizer |
This class is a scanner generated by
JFlex 1.5.0-SNAPSHOT
from the specification file /home/ik/prj/gengoai/hermes-pom/core/src/main/jflex/StandardTokenizer.jflex
|
State |
Defines an action state which can be LOADED where the action has loaded its state or NOT_LOADED meaning the action
has no state to load.
|
Stemmer |
Defines the interface for stemming tokens.
|
Stemmers |
Factory class for creating/retrieving stemmers for a given language
|
StopWords |
Defines a methodology for determining if an HString or String is a stopword for a given language.
|
StopWords.NoOptStopWords |
StopWords implementation that treats everything as a content word.
|
SubTypeAnnotator |
An annotator that provides its annotation by annotating for sub-types.
|
Summarizer |
Interface defining an Extractor that generates summaries for given HString and
specifically documents.
|
SuperSenseProcessor |
|
TagDecoder |
|
TaggedFormat |
Format Name: tagged
|
TaggedFormat.Provider |
The type Provider.
|
TaggedFormat.TaggedParameters |
The type Tagged parameters.
|
TensorFlowSequenceLabeler |
|
TermCounts |
The type Term extraction processor.
|
TermExtractor |
Implementation of the MultiPhaseExtractor for extracting terms where a term is a single annotation (TOKEN by
default).
|
TermExtractor.Builder |
|
TermKeywordExtractor |
|
TextNormalization |
Class takes care of normalizing text using a number of TextNormalizer s.
|
TextNormalizer |
Defines a methodology for normalizing a string.
|
TextRank |
Implementation of the TextRank algorithm for keyword extraction as defined in:
Mihalcea, R., Tarau, P.: "Textrank: Bringing order into texts".
|
TextRankSummarizer |
Implementation of the TextRank algorithm for summarization as defined in:
Mihalcea, R., Tarau, P.: "Textrank: Bringing order into texts".
|
TFIDFKeywordExtractor |
Keyword extractor that scores words based on their TFIDF value.
|
Tokenizer |
Low level tokenization of strings
|
Tokenizer.Token |
An internal token
|
Tokenizers |
|
TokenMatch |
A match from a TokenRegex pattern on an input HString.
|
TokenMatcher |
The TokenMatcher class allows for iterating of the matches, extracting the match or named-groups within the match,
the starting and ending offset of the match, and conversion into a TokenMatch object which records the current state
of the match.
|
TokenRegex |
Hermes provides a token-based regular expression engine that allows for matches on arbitrary annotation types,
relation types, and attributes, while providing many of the operators that are possible using standard Java regular
expressions.
|
TokenType |
Defines the type for a given token.
|
TraditionalToSimplified |
Preprocessor that converts traditional characters into simplified characters.
|
TrieLexicon |
Implementation of Lexicon usng a Trie data structure.
|
TrieWordList |
Implementation of a WordList backed by a Trie
|
TwitterSearchFormat |
Format Name: twitter_search
|
TwitterSearchFormat.Provider |
The type Provider.
|
TxtFormat |
Format Name: text
|
TxtFormat.Provider |
The type Provider.
|
Types |
Common Annotatable Types.
|
UnicodeNormalizer |
Converts unicode to canonical form and removes smart quotes.
|
UniversalFeature |
|
UniversalFeatureSet |
|
UniversalFeatureValue |
|
UniversalSentenceEncoder |
|
UPOSProcessor |
Processes universal part-of-speech information
|
ValueCalculator |
The enum Value calculator.
|
ViterbiAnnotator |
An abstract base annotator that uses the Viterbi algorithm to find text items in a document.
|
WhitespaceNormalizer |
Handles normalizing whitespace.
|
WholeFileTextFormat |
Defines a format in which files need to be completely read in order to generate documents.
|
WordList |
Word lists provide a set like interface to set of vocabulary items.
|
WordProcessor |
Processes words
|
Workflow |
A workflow represents a set of _actions_ to perform on an document collection.
|