Interface MPairStream<T,​U>

  • Type Parameters:
    T - the type parameter
    U - the type parameter
    All Superinterfaces:
    AutoCloseable
    All Known Implementing Classes:
    AbstractLocalMPairStream, LocalDefaultMPairStream, SparkPairStream

    public interface MPairStream<T,​U>
    extends AutoCloseable

    A facade for stream classes which contain key-value pairs. Provides a common interface to working with an manipulating streams regardless of their backend implementation.

    Author:
    David B. Bracewell
    • Method Detail

      • cache

        MPairStream<T,​U> cache()
        Caches the stream.
        Returns:
        the cached stream
      • collectAsList

        List<Map.Entry<T,​U>> collectAsList()
        Collects the items in the stream as a list
        Returns:
        the list of items in the stream
      • collectAsMap

        Map<T,​U> collectAsMap()
        Collects the items in the stream as a map
        Returns:
        the map of items in the stream
      • count

        long count()
        The number of items in the stream
        Returns:
        the number of items in the stream
      • filter

        MPairStream<T,​U> filter​(SerializableBiPredicate<? super T,​? super U> predicate)
        Filters the stream.
        Parameters:
        predicate - the predicate to use to determine which objects are kept
        Returns:
        the new stream
      • filterByKey

        MPairStream<T,​U> filterByKey​(SerializablePredicate<T> predicate)
        Filters the stream by key.
        Parameters:
        predicate - the predicate to apply to keys in order to determine which objects are kept
        Returns:
        the new stream
      • filterByValue

        MPairStream<T,​U> filterByValue​(SerializablePredicate<U> predicate)
        Filters the stream by value.
        Parameters:
        predicate - the predicate to apply to values in order to determine which objects are kept
        Returns:
        the new stream
      • flatMapToPair

        <R,​V> MPairStream<R,​V> flatMapToPair​(SerializableBiFunction<? super T,​? super U,​Stream<Map.Entry<? extends R,​? extends V>>> function)
        Maps the key-value pairs to one or more new key-value pairs
        Type Parameters:
        R - the new key type parameter
        V - the new value type parameter
        Parameters:
        function - the function to map key-value pairs
        Returns:
        the new pair stream
      • forEach

        void forEach​(SerializableBiConsumer<? super T,​? super U> consumer)
        Performs an operation on each item in the stream
        Parameters:
        consumer - the consumer action to perform
      • forEachLocal

        void forEachLocal​(SerializableBiConsumer<? super T,​? super U> consumer)
        Performs an operation on each item in the stream ensuring that is done locally and not distributed.
        Parameters:
        consumer - the consumer action to perform
      • getContext

        StreamingContext getContext()
        Gets the context used to create the stream
        Returns:
        the context
      • groupByKey

        MPairStream<T,​Iterable<U>> groupByKey()
        Group bys the items in the stream by the key
        Returns:
        the new stream
      • isDistributed

        default boolean isDistributed()
      • isEmpty

        boolean isEmpty()
        Determines if the stream is empty or not
        Returns:
        True if empty, False otherwise
      • isReusable

        default boolean isReusable()
        Can this stream be consumed more the once?
        Returns:
        True the stream can be reused multiple times, False the stream can only be used once
      • join

        <V> MPairStream<T,​Map.Entry<U,​V>> join​(MPairStream<? extends T,​? extends V> stream)
        Performs an inner join between this stream and another using the key to match.
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to inner join with
        Returns:
        the new stream
      • keys

        MStream<T> keys()
        Returns a stream over the keys in this pair stream
        Returns:
        the key stream
      • leftOuterJoin

        <V> MPairStream<T,​Map.Entry<U,​V>> leftOuterJoin​(MPairStream<? extends T,​? extends V> stream)
        Performs a left outer join between this stream and another using the key to match.
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to left outer join with
        Returns:
        the new stream
      • map

        <R> MStream<R> map​(SerializableBiFunction<? super T,​? super U,​? extends R> function)
        Maps the key-value pairs to a new object
        Type Parameters:
        R - the type parameter being mapped to
        Parameters:
        function - the function to map from key-value pairs to objects of type R
        Returns:
        the new stream
      • mapToDouble

        MDoubleStream mapToDouble​(SerializableToDoubleBiFunction<? super T,​? super U> function)
        Maps the key-value pairs to doubles
        Parameters:
        function - the function to map from key-value pairs to double values
        Returns:
        the new double stream
      • mapToPair

        <R,​V> MPairStream<R,​V> mapToPair​(SerializableBiFunction<? super T,​? super U,​? extends Map.Entry<? extends R,​? extends V>> function)
        Maps the key-value pairs to new key-value pairs
        Type Parameters:
        R - the new key type parameter
        V - the new value type parameter
        Parameters:
        function - the function to map key-value pairs
        Returns:
        the new pair stream
      • max

        Optional<Map.Entry<T,​U>> max​(SerializableComparator<Map.Entry<T,​U>> comparator)
        Returns the max item in the stream using the given comparator to compare items.
        Parameters:
        comparator - the comparator to use to compare values in the stream
        Returns:
        the optional containing the max value
      • maxByKey

        default Optional<Map.Entry<T,​U>> maxByKey()
        Returns the maximum entry in the stream comparing on key
        Returns:
        the optional containing the entry with max key
      • maxByKey

        default Optional<Map.Entry<T,​U>> maxByKey​(SerializableComparator<? super T> comparator)
        Returns the maximum entry in the stream comparing on key
        Parameters:
        comparator - the comparator to use to compare keys in the stream
        Returns:
        the optional containing the entry with max key
      • maxByValue

        default Optional<Map.Entry<T,​U>> maxByValue()
        Returns the maximum entry in the stream comparing on value
        Returns:
        the optional containing the entry with max value
      • maxByValue

        default Optional<Map.Entry<T,​U>> maxByValue​(SerializableComparator<? super U> comparator)
        Returns the maximum entry in the stream comparing on value
        Parameters:
        comparator - the comparator to use to compare value in the stream
        Returns:
        the optional containing the entry with max value
      • minByKey

        default Optional<Map.Entry<T,​U>> minByKey()
        Returns the minimum entry in the stream comparing on key
        Returns:
        the optional containing the entry with min key
      • minByKey

        default Optional<Map.Entry<T,​U>> minByKey​(SerializableComparator<? super T> comparator)
        Returns the minimum entry in the stream comparing on key
        Parameters:
        comparator - the comparator to use to compare keys in the stream
        Returns:
        the optional containing the entry with min key
      • minByValue

        default Optional<Map.Entry<T,​U>> minByValue()
        Returns the minimum entry in the stream comparing on value
        Returns:
        the optional containing the entry with min value
      • minByValue

        default Optional<Map.Entry<T,​U>> minByValue​(SerializableComparator<? super U> comparator)
        Returns the minimum entry in the stream comparing on value
        Parameters:
        comparator - the comparator to use to compare value in the stream
        Returns:
        the optional containing the entry with min value
      • onClose

        MPairStream<T,​U> onClose​(SerializableRunnable closeHandler)
        Sets the handler to call when the stream is closed. Typically, this is to clean up any open resources, such as file handles.
        Parameters:
        closeHandler - the handler to run when the stream is closed.
      • parallel

        MPairStream<T,​U> parallel()
        Ensures that the stream is parallel or distributed.
        Returns:
        the new stream
      • reduceByKey

        MPairStream<T,​U> reduceByKey​(SerializableBinaryOperator<U> operator)
        Performs a reduction by key on the elements of this stream using the given binary operator.
        Parameters:
        operator - the binary operator used to combine two objects
        Returns:
        the new stream containing keys and reduced values
      • repartition

        MPairStream<T,​U> repartition​(int numPartitions)
        Repartitions the stream to the given number of partitions. This may be a no-op for some streams, i.e. Local Streams.
        Parameters:
        numPartitions - the number of partitions the stream should have
        Returns:
        the new stream
      • rightOuterJoin

        <V> MPairStream<T,​Map.Entry<U,​V>> rightOuterJoin​(MPairStream<? extends T,​? extends V> stream)
        Performs a right outer join between this stream and another using the key to match.
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to right outer join with
        Returns:
        the new stream
      • sample

        MPairStream<T,​U> sample​(boolean withReplacement,
                                      long number)
        Randomly samples number items from the stream.
        Parameters:
        withReplacement - true allow a single item to be represented in the sample multiple times, false allow a single item to only be picked once.
        number - the number of items desired in the sample
        Returns:
        the new stream
      • shuffle

        default MPairStream<T,​U> shuffle()
        Shuffles the items in the stream.
        Returns:
        the new stream
      • shuffle

        MPairStream<T,​U> shuffle​(Random random)
        Shuffles the items in the string using the given Random object.
        Parameters:
        random - the random number generator
        Returns:
        the new stream
      • sortByKey

        default MPairStream<T,​U> sortByKey​(boolean ascending)
        Sorts the items in the stream by key in ascending or descending order. Requires items to implement the Comparable interface.
        Parameters:
        ascending - determines if the items should be sorted in ascending (true) or descending (false) order
        Returns:
        the new stream
      • sortByKey

        MPairStream<T,​U> sortByKey​(SerializableComparator<T> comparator)
        Sorts the items in the stream by key using the given comparator.
        Parameters:
        comparator - The comparator to use to comapre keys
        Returns:
        the new stream
      • union

        MPairStream<T,​U> union​(MPairStream<? extends T,​? extends U> other)
        Unions this stream with another.
        Parameters:
        other - the other stream to add to this one.
        Returns:
        the new stream
      • updateConfig

        default void updateConfig()
        Updates the config instance used for this String
      • values

        MStream<U> values()
        Returns a stream of values
        Returns:
        the new stream of values