Easily Annotate and Extract Top Terms

//We can construct a document collection by specifying a format (text), where the documents are located, and optionally any arguments for the format.
DocumentCollection documents = DocumentCollection.create("text_opl::classpath:com/gengoai/hermes/example_docs.txt");

//We can then add token, sentence, and lemmas by annotating the collection.
documents = documents.annotate(Types.TOKEN, Types.SENTENCE, Types.LEMMA);

//We will define a term extractor to extract lemmatized tokens.
Extractor termExtractor = TermExtractor.builder().toString(LyreDSL.lemma).build();

//We can extract the term counts from the document collection using the defined extractor.
Counter<String> termFrequencies = documents.termCount(termExtractor);

//Lets print out the top 10 terms
System.out.println("Top 10 by Term Frequency");
termFrequencies.topN(10).itemsByCount(false).forEach(term -> System.out.println(term + ": " + termFrequencies.get(term)));
System.out.println();

Releases

Maven Central

<dependency>
        <groupId>com.gengoai</groupId>
        <artifactId>hermes</artifactId>
        <version>1.1</version>
</dependency>

GengoAI Installer

Self-contained jar file to install the Hermes libs and models.

Download

Readme

Documentation

HTML

PDF

JavaDoc

HTML

PDF

JavaDoc