Class RakeKeywordExtractor

  • All Implemented Interfaces:
    Extractor, KeywordExtractor, Serializable

    public class RakeKeywordExtractor
    extends Object
    implements KeywordExtractor
    Implementation of the RAKE keyword extraction algorithm as presented in:
     Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents.
     In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.
     
    Author:
    David B. Bracewell
    See Also:
    Serialized Form
    • Constructor Detail

      • RakeKeywordExtractor

        public RakeKeywordExtractor()
        Instantiates a new Rake keyword extractor using a default FeaturizingExtractor that lower cases words.
      • RakeKeywordExtractor

        public RakeKeywordExtractor​(@NonNull
                                    @NonNull LyreExpression toStringExpression)
        Instantiates a new Rake keyword extractor.
        Parameters:
        toStringExpression - the specification for how to convert tokens/phrases to strings (all other options are ignored).
    • Method Detail

      • extract

        public Extraction extract​(@NonNull
                                  @NonNull HString source)
        Description copied from interface: Extractor
        Generate an Extraction from the given HString.
        Specified by:
        extract in interface Extractor
        Parameters:
        source - the source text from which we will generate an Extraction
        Returns:
        the Extraction
      • fit

        public void fit​(DocumentCollection corpus)
        Description copied from interface: KeywordExtractor
        In certain cases a keyword extractor needs to collect corpus level statistics or construct a model of what a good keyword looks like. The fit method allows implementations to perform this logic at a corpus level.
        Specified by:
        fit in interface KeywordExtractor
        Parameters:
        corpus - the corpus to fit the extractor to