hermes
Scalable Natural Language Processing
in Java
Get Started Quickly

Hermes is easy to learn and provides models for part-of-speech tagging, shallow parsing, named-entity-recognition, and dependency parsing out of the box.

Easy to Customize

Through the use of Token-based regular expressions, the Lyre Expression language, and Caduceus you can easily create custom extraction models.

Scalable

Hermes can easily scale its annotation processing by utilizing Apache Spark backed Document Collections.

Easily Annotate and Extract Top Terms
//We can construct a document collection by specifying a format (text), where the documents are located, and optionally any arguments for the format.
DocumentCollection documents = DocumentCollection.create("text_opl::classpath:com/gengoai/hermes/example_docs.txt");

//We can then add token, sentence, and lemmas by annotating the collection.
documents = documents.annotate(Types.TOKEN, Types.SENTENCE, Types.LEMMA);

//We will define a term extractor to extract lemmatized tokens.
Extractor termExtractor = TermExtractor.builder().toString(LyreDSL.lemma).build();

//We can extract the term counts from the document collection using the defined extractor.
Counter<String> termFrequencies = documents.termCount(termExtractor);

//Lets print out the top 10 terms
System.out.println("Top 10 by Term Frequency");
termFrequencies.topN(10).itemsByCount(false).forEach(term -> System.out.println(term + ": " + termFrequencies.get(term)));
System.out.println();

Releases

Maven Central

<dependency>
        <groupId>com.gengoai</groupId>
        <artifactId>hermes</artifactId>
        <version>1.1</version>
</dependency>

GengoAI Installer

Self-contained jar file to install the Hermes libs and models.

Documentation