Package com.gengoai.hermes
Class DocumentFactory
- java.lang.Object
-
- com.gengoai.hermes.DocumentFactory
-
- All Implemented Interfaces:
Serializable
public final class DocumentFactory extends Object implements Serializable
A document factory facilitates the creation of document objects performing any predefined preprocessing, e.g. whitespace normalization, on the document content. A default factory can be obtained by calling
getInstance()
or a factory can be built using aDocumentFactory.DocumentFactoryBuilder
constructed usingbuilder()
.The default factory uses configuration settings to determine the default language and preprocessing normalizers. The default language is defined using the
com.gengoai.hermes.DefaultLanguage
configuration property and the normalizers are defined using thecom.gengoai.hermes.preprocessing.normalizers
configuration property.- Author:
- David B. Bracewell
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DocumentFactory.DocumentFactoryBuilder
Builder forDocumentFactory
s
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static DocumentFactory.DocumentFactoryBuilder
builder()
Document
create(@NonNull String content)
Creates a document with the given content assigning the new document an auto-generated id and setting its language to the default language.Document
create(@NonNull String content, @NonNull Language language)
Creates a document with the given content written in the given language assigning the new document an auto-generated id .Document
create(@NonNull String content, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given content written in the given language having the given set of attributes.Document
create(@NonNull String id, @NonNull String content)
Creates a document with the given id and content setting its language to the default language.Document
create(@NonNull String id, @NonNull String content, @NonNull Language language)
Creates a document with the given id and content written in the given language.Document
create(@NonNull String id, @NonNull String content, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given id and content written in the given language having the given set of attributes.Document
createRaw(@NonNull String content)
Creates a document with the given content written in the default language.Document
createRaw(@NonNull String content, @NonNull Language language)
Creates a document with the given content written in the given language.Document
createRaw(@NonNull String content, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given content written in the given language having the given set of attributes.Document
createRaw(@NonNull String id, @NonNull String content)
Creates a document with the given id and content written in the default language.Document
createRaw(@NonNull String id, @NonNull String content, @NonNull Language language)
Creates a document with the given id and content written in the given language.Document
createRaw(@NonNull String id, @NonNull String content, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given id and content written in the given language having the given set of attributes.Document
fromTokens(@NonNull Language language, @NonNull String... tokens)
Creates a document from the given tokens.Document
fromTokens(@NonNull Iterable<String> tokens)
Creates a document from the given tokens using the default language.Document
fromTokens(@NonNull Iterable<String> tokens, @NonNull Language language)
Creates a document from the given tokens.Document
fromTokens(@NonNull String... tokens)
Creates a document from the given tokens using the default language.static DocumentFactory
getInstance()
-
-
-
Method Detail
-
builder
public static DocumentFactory.DocumentFactoryBuilder builder()
- Returns:
- a document factory builder
-
getInstance
public static DocumentFactory getInstance()
- Returns:
- A document factory whose preprocessors and default language are set via configuration options
-
create
public Document create(@NonNull @NonNull String content)
Creates a document with the given content assigning the new document an auto-generated id and setting its language to the default language.- Parameters:
content
- the document content- Returns:
- the document
-
create
public Document create(@NonNull @NonNull String id, @NonNull @NonNull String content)
Creates a document with the given id and content setting its language to the default language.- Parameters:
id
- the idcontent
- the content- Returns:
- the document
-
create
public Document create(@NonNull @NonNull String content, @NonNull @NonNull Language language)
Creates a document with the given content written in the given language assigning the new document an auto-generated id .- Parameters:
content
- the contentlanguage
- the language- Returns:
- the document
-
create
public Document create(@NonNull @NonNull String id, @NonNull @NonNull String content, @NonNull @NonNull Language language)
Creates a document with the given id and content written in the given language.- Parameters:
id
- the idcontent
- the contentlanguage
- the language- Returns:
- the document
-
create
public Document create(@NonNull @NonNull String content, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given content written in the given language having the given set of attributes.- Parameters:
content
- the contentlanguage
- the languageattributeMap
- the attribute map- Returns:
- the document
-
create
public Document create(@NonNull @NonNull String id, @NonNull @NonNull String content, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given id and content written in the given language having the given set of attributes.- Parameters:
id
- the idcontent
- the contentlanguage
- the languageattributeMap
- the attribute map- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String id, @NonNull @NonNull String content, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given id and content written in the given language having the given set of attributes. This method does not apply anyTextNormalizer
- Parameters:
id
- the idcontent
- the contentlanguage
- the languageattributeMap
- the attribute map- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String content)
Creates a document with the given content written in the default language. This method does not apply anyTextNormalizer
**- Parameters:
content
- the content- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String id, @NonNull @NonNull String content)
Creates a document with the given id and content written in the default language. This method does not apply anyTextNormalizer
- Parameters:
id
- the idcontent
- the content- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String content, @NonNull @NonNull Language language)
Creates a document with the given content written in the given language. This method does not apply anyTextNormalizer
**- Parameters:
content
- the contentlanguage
- the language- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String id, @NonNull @NonNull String content, @NonNull @NonNull Language language)
Creates a document with the given id and content written in the given language. This method does not apply anyTextNormalizer
- Parameters:
id
- the idcontent
- the contentlanguage
- the language- Returns:
- the document
-
createRaw
public Document createRaw(@NonNull @NonNull String content, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributeMap)
Creates a document with the given content written in the given language having the given set of attributes. This method does not apply anyTextNormalizer
- Parameters:
content
- the contentlanguage
- the languageattributeMap
- the attribute map- Returns:
- the document
-
fromTokens
public Document fromTokens(@NonNull @NonNull Language language, @NonNull @NonNull String... tokens)
Creates a document from the given tokens. The language parameter controls how the content of the documents is created. If the language has whitespace tokens are joined with a single space between them, otherwise no space is inserted between tokens.- Parameters:
language
- the language of the documenttokens
- the tokens making up the document- Returns:
- the document with tokens provided.
-
fromTokens
public Document fromTokens(@NonNull @NonNull String... tokens)
Creates a document from the given tokens using the default language.- Parameters:
tokens
- the tokens- Returns:
- the document
-
fromTokens
public Document fromTokens(@NonNull @NonNull Iterable<String> tokens)
Creates a document from the given tokens using the default language.- Parameters:
tokens
- the tokens- Returns:
- the document
-
fromTokens
public Document fromTokens(@NonNull @NonNull Iterable<String> tokens, @NonNull @NonNull Language language)
Creates a document from the given tokens. The language parameter controls how the content of the documents is created. If the language has whitespace tokens are joined with a single space between them, otherwise no space is inserted between tokens.- Parameters:
tokens
- the tokenslanguage
- the language- Returns:
- the document
-
-