Package com.gengoai.hermes.lexicon
Class DiskLexicon
- java.lang.Object
-
- com.gengoai.hermes.lexicon.Lexicon
-
- com.gengoai.hermes.lexicon.PersistentLexicon
-
- com.gengoai.hermes.lexicon.DiskLexicon
-
- All Implemented Interfaces:
Extractor
,PrefixSearchable
,WordList
,Serializable
,Iterable<String>
,Predicate<HString>
public class DiskLexicon extends PersistentLexicon implements PrefixSearchable
APersistentLexicon
that storesLexiconEntry
on disk facilitating the use of very lexicons with little memory overhead.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description DiskLexicon(@NonNull KeyValueStoreConnection connection, boolean isCaseSensitive)
Instantiates a new DiskLexicon
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(@NonNull LexiconEntry lexiconEntry)
Adds an entry to the lexiconvoid
addAll(@NonNull Iterable<LexiconEntry> lexiconEntries)
Adds all lexicon entries in the given iterable to the lexiconboolean
contains(String string)
Is the String contained in the WordListSet<LexiconEntry>
entries()
Set<LexiconEntry>
get(String word)
Returns theLexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.int
getMaxLemmaLength()
int
getMaxTokenLength()
String
getName()
boolean
isCaseSensitive()
Is the Lexicon case sensitive or notboolean
isPrefixMatch(@NonNull HString hString)
Check if a prefix matches the givenHString
boolean
isPrefixMatch(String hString)
Check if a prefix matches the given Stringboolean
isProbabilistic()
Is the Lexicon case sensitive or notIterator<String>
iterator()
List<LexiconEntry>
match(@NonNull HString string)
Gets the matched entries for a givenHString
List<LexiconEntry>
match(String hString)
Returns theLexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.Set<String>
prefixes(String string)
Gets the prefixes that match the given stringint
size()
The number of lexical items in the lexicon-
Methods inherited from class com.gengoai.hermes.lexicon.Lexicon
extract, getProbability, getProbability, getProbability, getProbability, getTag, getTag, normalize, test
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Constructor Detail
-
DiskLexicon
public DiskLexicon(@NonNull @NonNull KeyValueStoreConnection connection, boolean isCaseSensitive)
Instantiates a new DiskLexicon- Parameters:
connection
- the KeyValueStoreConnection describing how to connect to the underlying storage.isCaseSensitive
- True - the lexicon is case-sensitive, False case-insensitive (Note: if the lexicon already exists this value will be ignored.)
-
-
Method Detail
-
add
public void add(@NonNull @NonNull LexiconEntry lexiconEntry)
Description copied from class:Lexicon
Adds an entry to the lexicon
-
addAll
public void addAll(@NonNull @NonNull Iterable<LexiconEntry> lexiconEntries)
Description copied from class:Lexicon
Adds all lexicon entries in the given iterable to the lexicon
-
contains
public boolean contains(String string)
Description copied from interface:WordList
Is the String contained in the WordList
-
entries
public Set<LexiconEntry> entries()
-
get
public Set<LexiconEntry> get(String word)
Description copied from class:Lexicon
Returns theLexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.- Specified by:
get
in classLexicon
- Parameters:
word
- the word in the lexicon whose entries we want- Returns:
- the
LexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.
-
getMaxLemmaLength
public int getMaxLemmaLength()
- Specified by:
getMaxLemmaLength
in classLexicon
- Returns:
- the max lemma length
-
getMaxTokenLength
public int getMaxTokenLength()
- Specified by:
getMaxTokenLength
in classLexicon
- Returns:
- the max token length
-
getName
public String getName()
-
isCaseSensitive
public boolean isCaseSensitive()
Description copied from class:Lexicon
Is the Lexicon case sensitive or not- Specified by:
isCaseSensitive
in classLexicon
- Returns:
- True if the lexicon is case sensitive, False if not
-
isPrefixMatch
public boolean isPrefixMatch(@NonNull @NonNull HString hString)
Description copied from interface:PrefixSearchable
Check if a prefix matches the givenHString
- Specified by:
isPrefixMatch
in interfacePrefixSearchable
- Parameters:
hString
- theHString
to check for a prefix match- Returns:
- True if a prefix matches, False otherwise
-
isPrefixMatch
public boolean isPrefixMatch(String hString)
Description copied from interface:PrefixSearchable
Check if a prefix matches the given String- Specified by:
isPrefixMatch
in interfacePrefixSearchable
- Parameters:
hString
- the String to check for a prefix match- Returns:
- True if a prefix matches, False otherwise
-
isProbabilistic
public boolean isProbabilistic()
Description copied from class:Lexicon
Is the Lexicon case sensitive or not- Specified by:
isProbabilistic
in classLexicon
- Returns:
- True if the lexicon is case sensitive, False if not
-
match
public List<LexiconEntry> match(@NonNull @NonNull HString string)
Description copied from class:Lexicon
Gets the matched entries for a givenHString
-
match
public List<LexiconEntry> match(String hString)
Description copied from class:Lexicon
Returns theLexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.- Specified by:
match
in classLexicon
- Parameters:
hString
- the word in the lexicon whose entries we want- Returns:
- the
LexiconEntry
associated with a given word in the Lexicon or an empty set if there are none.
-
prefixes
public Set<String> prefixes(String string)
Description copied from interface:PrefixSearchable
Gets the prefixes that match the given string- Specified by:
prefixes
in interfacePrefixSearchable
- Parameters:
string
- the string- Returns:
- the set of matching prefixes
-
-