Maven Central
<dependency>
<groupId>com.gengoai</groupId>
<artifactId>hermes</artifactId>
<version>1.1</version>
</dependency>
Hermes is easy to learn and provides models for part-of-speech tagging, shallow parsing, named-entity-recognition, and dependency parsing out of the box.
Through the use of Token-based regular expressions, the Lyre Expression language, and Caduceus you can easily create custom extraction models.
Hermes can easily scale its annotation processing by utilizing Apache Spark backed Document Collections.
//We can construct a document collection by specifying a format (text), where the documents are located, and optionally any arguments for the format.
DocumentCollection documents = DocumentCollection.create("text_opl::classpath:com/gengoai/hermes/example_docs.txt");
//We can then add token, sentence, and lemmas by annotating the collection.
documents = documents.annotate(Types.TOKEN, Types.SENTENCE, Types.LEMMA);
//We will define a term extractor to extract lemmatized tokens.
Extractor termExtractor = TermExtractor.builder().toString(LyreDSL.lemma).build();
//We can extract the term counts from the document collection using the defined extractor.
Counter<String> termFrequencies = documents.termCount(termExtractor);
//Lets print out the top 10 terms
System.out.println("Top 10 by Term Frequency");
termFrequencies.topN(10).itemsByCount(false).forEach(term -> System.out.println(term + ": " + termFrequencies.get(term)));
System.out.println();
<dependency>
<groupId>com.gengoai</groupId>
<artifactId>hermes</artifactId>
<version>1.1</version>
</dependency>
Self-contained jar file to install the Hermes libs and models.