Class SparkPairStream<T,​U>

    • Constructor Detail

      • SparkPairStream

        public SparkPairStream​(org.apache.spark.api.java.JavaPairRDD<T,​U> rdd)
        Instantiates a new Spark pair stream.
        Parameters:
        rdd - the rdd
      • SparkPairStream

        public SparkPairStream​(Map<? extends T,​? extends U> map)
        Instantiates a new Spark pair stream.
        Parameters:
        map - the map
    • Method Detail

      • collectAsMap

        public Map<T,​U> collectAsMap()
        Description copied from interface: MPairStream
        Collects the items in the stream as a map
        Specified by:
        collectAsMap in interface MPairStream<T,​U>
        Returns:
        the map of items in the stream
      • count

        public long count()
        Description copied from interface: MPairStream
        The number of items in the stream
        Specified by:
        count in interface MPairStream<T,​U>
        Returns:
        the number of items in the stream
      • filterByKey

        public MPairStream<T,​U> filterByKey​(SerializablePredicate<T> predicate)
        Description copied from interface: MPairStream
        Filters the stream by key.
        Specified by:
        filterByKey in interface MPairStream<T,​U>
        Parameters:
        predicate - the predicate to apply to keys in order to determine which objects are kept
        Returns:
        the new stream
      • filterByValue

        public MPairStream<T,​U> filterByValue​(SerializablePredicate<U> predicate)
        Description copied from interface: MPairStream
        Filters the stream by value.
        Specified by:
        filterByValue in interface MPairStream<T,​U>
        Parameters:
        predicate - the predicate to apply to values in order to determine which objects are kept
        Returns:
        the new stream
      • flatMapToPair

        public <R,​V> SparkPairStream<R,​V> flatMapToPair​(SerializableBiFunction<? super T,​? super U,​Stream<Map.Entry<? extends R,​? extends V>>> function)
        Description copied from interface: MPairStream
        Maps the key-value pairs to one or more new key-value pairs
        Specified by:
        flatMapToPair in interface MPairStream<T,​U>
        Type Parameters:
        R - the new key type parameter
        V - the new value type parameter
        Parameters:
        function - the function to map key-value pairs
        Returns:
        the new pair stream
      • forEach

        public void forEach​(SerializableBiConsumer<? super T,​? super U> consumer)
        Description copied from interface: MPairStream
        Performs an operation on each item in the stream
        Specified by:
        forEach in interface MPairStream<T,​U>
        Parameters:
        consumer - the consumer action to perform
      • forEachLocal

        public void forEachLocal​(SerializableBiConsumer<? super T,​? super U> consumer)
        Description copied from interface: MPairStream
        Performs an operation on each item in the stream ensuring that is done locally and not distributed.
        Specified by:
        forEachLocal in interface MPairStream<T,​U>
        Parameters:
        consumer - the consumer action to perform
      • isEmpty

        public boolean isEmpty()
        Description copied from interface: MPairStream
        Determines if the stream is empty or not
        Specified by:
        isEmpty in interface MPairStream<T,​U>
        Returns:
        True if empty, False otherwise
      • isReusable

        public boolean isReusable()
        Description copied from interface: MPairStream
        Can this stream be consumed more the once?
        Specified by:
        isReusable in interface MPairStream<T,​U>
        Returns:
        True the stream can be reused multiple times, False the stream can only be used once
      • join

        public <V> MPairStream<T,​Map.Entry<U,​V>> join​(MPairStream<? extends T,​? extends V> stream)
        Description copied from interface: MPairStream
        Performs an inner join between this stream and another using the key to match.
        Specified by:
        join in interface MPairStream<T,​U>
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to inner join with
        Returns:
        the new stream
      • keys

        public MStream<T> keys()
        Description copied from interface: MPairStream
        Returns a stream over the keys in this pair stream
        Specified by:
        keys in interface MPairStream<T,​U>
        Returns:
        the key stream
      • leftOuterJoin

        public <V> MPairStream<T,​Map.Entry<U,​V>> leftOuterJoin​(MPairStream<? extends T,​? extends V> stream)
        Description copied from interface: MPairStream
        Performs a left outer join between this stream and another using the key to match.
        Specified by:
        leftOuterJoin in interface MPairStream<T,​U>
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to left outer join with
        Returns:
        the new stream
      • map

        public <R> MStream<R> map​(SerializableBiFunction<? super T,​? super U,​? extends R> function)
        Description copied from interface: MPairStream
        Maps the key-value pairs to a new object
        Specified by:
        map in interface MPairStream<T,​U>
        Type Parameters:
        R - the type parameter being mapped to
        Parameters:
        function - the function to map from key-value pairs to objects of type R
        Returns:
        the new stream
      • mapToPair

        public <R,​V> MPairStream<R,​V> mapToPair​(SerializableBiFunction<? super T,​? super U,​? extends Map.Entry<? extends R,​? extends V>> function)
        Description copied from interface: MPairStream
        Maps the key-value pairs to new key-value pairs
        Specified by:
        mapToPair in interface MPairStream<T,​U>
        Type Parameters:
        R - the new key type parameter
        V - the new value type parameter
        Parameters:
        function - the function to map key-value pairs
        Returns:
        the new pair stream
      • max

        public Optional<Map.Entry<T,​U>> max​(SerializableComparator<Map.Entry<T,​U>> comparator)
        Description copied from interface: MPairStream
        Returns the max item in the stream using the given comparator to compare items.
        Specified by:
        max in interface MPairStream<T,​U>
        Parameters:
        comparator - the comparator to use to compare values in the stream
        Returns:
        the optional containing the max value
      • onClose

        public MPairStream<T,​U> onClose​(SerializableRunnable closeHandler)
        Description copied from interface: MPairStream
        Sets the handler to call when the stream is closed. Typically, this is to clean up any open resources, such as file handles.
        Specified by:
        onClose in interface MPairStream<T,​U>
        Parameters:
        closeHandler - the handler to run when the stream is closed.
      • parallel

        public MPairStream<T,​U> parallel()
        Description copied from interface: MPairStream
        Ensures that the stream is parallel or distributed.
        Specified by:
        parallel in interface MPairStream<T,​U>
        Returns:
        the new stream
      • reduceByKey

        public MPairStream<T,​U> reduceByKey​(SerializableBinaryOperator<U> operator)
        Description copied from interface: MPairStream
        Performs a reduction by key on the elements of this stream using the given binary operator.
        Specified by:
        reduceByKey in interface MPairStream<T,​U>
        Parameters:
        operator - the binary operator used to combine two objects
        Returns:
        the new stream containing keys and reduced values
      • repartition

        public MPairStream<T,​U> repartition​(int partitions)
        Description copied from interface: MPairStream
        Repartitions the stream to the given number of partitions. This may be a no-op for some streams, i.e. Local Streams.
        Specified by:
        repartition in interface MPairStream<T,​U>
        Parameters:
        partitions - the number of partitions the stream should have
        Returns:
        the new stream
      • rightOuterJoin

        public <V> MPairStream<T,​Map.Entry<U,​V>> rightOuterJoin​(MPairStream<? extends T,​? extends V> stream)
        Description copied from interface: MPairStream
        Performs a right outer join between this stream and another using the key to match.
        Specified by:
        rightOuterJoin in interface MPairStream<T,​U>
        Type Parameters:
        V - the value type parameter of the stream to join with
        Parameters:
        stream - the other stream to right outer join with
        Returns:
        the new stream
      • sample

        public MPairStream<T,​U> sample​(boolean withReplacement,
                                             long number)
        Description copied from interface: MPairStream
        Randomly samples number items from the stream.
        Specified by:
        sample in interface MPairStream<T,​U>
        Parameters:
        withReplacement - true allow a single item to be represented in the sample multiple times, false allow a single item to only be picked once.
        number - the number of items desired in the sample
        Returns:
        the new stream
      • shuffle

        public MPairStream<T,​U> shuffle​(Random random)
        Description copied from interface: MPairStream
        Shuffles the items in the string using the given Random object.
        Specified by:
        shuffle in interface MPairStream<T,​U>
        Parameters:
        random - the random number generator
        Returns:
        the new stream
      • sortByKey

        public MPairStream<T,​U> sortByKey​(SerializableComparator<T> comparator)
        Description copied from interface: MPairStream
        Sorts the items in the stream by key using the given comparator.
        Specified by:
        sortByKey in interface MPairStream<T,​U>
        Parameters:
        comparator - The comparator to use to comapre keys
        Returns:
        the new stream
      • union

        public MPairStream<T,​U> union​(MPairStream<? extends T,​? extends U> other)
        Description copied from interface: MPairStream
        Unions this stream with another.
        Specified by:
        union in interface MPairStream<T,​U>
        Parameters:
        other - the other stream to add to this one.
        Returns:
        the new stream
      • updateConfig

        public void updateConfig()
        Description copied from interface: MPairStream
        Updates the config instance used for this String
        Specified by:
        updateConfig in interface MPairStream<T,​U>
      • values

        public MStream<U> values()
        Description copied from interface: MPairStream
        Returns a stream of values
        Specified by:
        values in interface MPairStream<T,​U>
        Returns:
        the new stream of values