Package com.gengoai.stream
Interface MPairStream<T,U>
-
- Type Parameters:
T
- the type parameterU
- the type parameter
- All Superinterfaces:
AutoCloseable
- All Known Implementing Classes:
AbstractLocalMPairStream
,LocalDefaultMPairStream
,SparkPairStream
public interface MPairStream<T,U> extends AutoCloseable
A facade for stream classes which contain key-value pairs. Provides a common interface to working with an manipulating streams regardless of their backend implementation.
- Author:
- David B. Bracewell
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description MPairStream<T,U>
cache()
Caches the stream.List<Map.Entry<T,U>>
collectAsList()
Collects the items in the stream as a listMap<T,U>
collectAsMap()
Collects the items in the stream as a maplong
count()
The number of items in the streamMPairStream<T,U>
filter(SerializableBiPredicate<? super T,? super U> predicate)
Filters the stream.MPairStream<T,U>
filterByKey(SerializablePredicate<T> predicate)
Filters the stream by key.MPairStream<T,U>
filterByValue(SerializablePredicate<U> predicate)
Filters the stream by value.<R,V>
MPairStream<R,V>flatMapToPair(SerializableBiFunction<? super T,? super U,Stream<Map.Entry<? extends R,? extends V>>> function)
Maps the key-value pairs to one or more new key-value pairsvoid
forEach(SerializableBiConsumer<? super T,? super U> consumer)
Performs an operation on each item in the streamvoid
forEachLocal(SerializableBiConsumer<? super T,? super U> consumer)
Performs an operation on each item in the stream ensuring that is done locally and not distributed.StreamingContext
getContext()
Gets the context used to create the streamMPairStream<T,Iterable<U>>
groupByKey()
Group bys the items in the stream by the keydefault boolean
isDistributed()
boolean
isEmpty()
Determines if the stream is empty or notdefault boolean
isReusable()
Can this stream be consumed more the once?Stream<Map.Entry<T,U>>
javaStream()
<V> MPairStream<T,Map.Entry<U,V>>
join(MPairStream<? extends T,? extends V> stream)
Performs an inner join between this stream and another using the key to match.MStream<T>
keys()
Returns a stream over the keys in this pair stream<V> MPairStream<T,Map.Entry<U,V>>
leftOuterJoin(MPairStream<? extends T,? extends V> stream)
Performs a left outer join between this stream and another using the key to match.<R> MStream<R>
map(SerializableBiFunction<? super T,? super U,? extends R> function)
Maps the key-value pairs to a new objectMDoubleStream
mapToDouble(SerializableToDoubleBiFunction<? super T,? super U> function)
Maps the key-value pairs to doubles<R,V>
MPairStream<R,V>mapToPair(SerializableBiFunction<? super T,? super U,? extends Map.Entry<? extends R,? extends V>> function)
Maps the key-value pairs to new key-value pairsOptional<Map.Entry<T,U>>
max(SerializableComparator<Map.Entry<T,U>> comparator)
Returns the max item in the stream using the given comparator to compare items.default Optional<Map.Entry<T,U>>
maxByKey()
Returns the maximum entry in the stream comparing on keydefault Optional<Map.Entry<T,U>>
maxByKey(SerializableComparator<? super T> comparator)
Returns the maximum entry in the stream comparing on keydefault Optional<Map.Entry<T,U>>
maxByValue()
Returns the maximum entry in the stream comparing on valuedefault Optional<Map.Entry<T,U>>
maxByValue(SerializableComparator<? super U> comparator)
Returns the maximum entry in the stream comparing on valueOptional<Map.Entry<T,U>>
min(SerializableComparator<Map.Entry<T,U>> comparator)
Returns the min item in the stream requiring that the items be comparable.default Optional<Map.Entry<T,U>>
minByKey()
Returns the minimum entry in the stream comparing on keydefault Optional<Map.Entry<T,U>>
minByKey(SerializableComparator<? super T> comparator)
Returns the minimum entry in the stream comparing on keydefault Optional<Map.Entry<T,U>>
minByValue()
Returns the minimum entry in the stream comparing on valuedefault Optional<Map.Entry<T,U>>
minByValue(SerializableComparator<? super U> comparator)
Returns the minimum entry in the stream comparing on valueMPairStream<T,U>
onClose(SerializableRunnable closeHandler)
Sets the handler to call when the stream is closed.MPairStream<T,U>
parallel()
Ensures that the stream is parallel or distributed.MPairStream<T,U>
persist(StorageLevel storageLevel)
MPairStream<T,U>
reduceByKey(SerializableBinaryOperator<U> operator)
Performs a reduction by key on the elements of this stream using the given binary operator.MPairStream<T,U>
repartition(int numPartitions)
Repartitions the stream to the given number of partitions.<V> MPairStream<T,Map.Entry<U,V>>
rightOuterJoin(MPairStream<? extends T,? extends V> stream)
Performs a right outer join between this stream and another using the key to match.MPairStream<T,U>
sample(boolean withReplacement, long number)
Randomly samplesnumber
items from the stream.default MPairStream<T,U>
shuffle()
Shuffles the items in the stream.MPairStream<T,U>
shuffle(Random random)
Shuffles the items in the string using the givenRandom
object.default MPairStream<T,U>
sortByKey(boolean ascending)
Sorts the items in the stream by key in ascending or descending order.MPairStream<T,U>
sortByKey(SerializableComparator<T> comparator)
Sorts the items in the stream by key using the given comparator.MPairStream<T,U>
union(MPairStream<? extends T,? extends U> other)
Unions this stream with another.default void
updateConfig()
Updates the config instance used for this StringMStream<U>
values()
Returns a stream of values-
Methods inherited from interface java.lang.AutoCloseable
close
-
-
-
-
Method Detail
-
cache
MPairStream<T,U> cache()
Caches the stream.- Returns:
- the cached stream
-
collectAsList
List<Map.Entry<T,U>> collectAsList()
Collects the items in the stream as a list- Returns:
- the list of items in the stream
-
collectAsMap
Map<T,U> collectAsMap()
Collects the items in the stream as a map- Returns:
- the map of items in the stream
-
count
long count()
The number of items in the stream- Returns:
- the number of items in the stream
-
filter
MPairStream<T,U> filter(SerializableBiPredicate<? super T,? super U> predicate)
Filters the stream.- Parameters:
predicate
- the predicate to use to determine which objects are kept- Returns:
- the new stream
-
filterByKey
MPairStream<T,U> filterByKey(SerializablePredicate<T> predicate)
Filters the stream by key.- Parameters:
predicate
- the predicate to apply to keys in order to determine which objects are kept- Returns:
- the new stream
-
filterByValue
MPairStream<T,U> filterByValue(SerializablePredicate<U> predicate)
Filters the stream by value.- Parameters:
predicate
- the predicate to apply to values in order to determine which objects are kept- Returns:
- the new stream
-
flatMapToPair
<R,V> MPairStream<R,V> flatMapToPair(SerializableBiFunction<? super T,? super U,Stream<Map.Entry<? extends R,? extends V>>> function)
Maps the key-value pairs to one or more new key-value pairs- Type Parameters:
R
- the new key type parameterV
- the new value type parameter- Parameters:
function
- the function to map key-value pairs- Returns:
- the new pair stream
-
forEach
void forEach(SerializableBiConsumer<? super T,? super U> consumer)
Performs an operation on each item in the stream- Parameters:
consumer
- the consumer action to perform
-
forEachLocal
void forEachLocal(SerializableBiConsumer<? super T,? super U> consumer)
Performs an operation on each item in the stream ensuring that is done locally and not distributed.- Parameters:
consumer
- the consumer action to perform
-
getContext
StreamingContext getContext()
Gets the context used to create the stream- Returns:
- the context
-
groupByKey
MPairStream<T,Iterable<U>> groupByKey()
Group bys the items in the stream by the key- Returns:
- the new stream
-
isDistributed
default boolean isDistributed()
-
persist
MPairStream<T,U> persist(StorageLevel storageLevel)
-
isEmpty
boolean isEmpty()
Determines if the stream is empty or not- Returns:
- True if empty, False otherwise
-
isReusable
default boolean isReusable()
Can this stream be consumed more the once?- Returns:
- True the stream can be reused multiple times, False the stream can only be used once
-
join
<V> MPairStream<T,Map.Entry<U,V>> join(MPairStream<? extends T,? extends V> stream)
Performs an inner join between this stream and another using the key to match.- Type Parameters:
V
- the value type parameter of the stream to join with- Parameters:
stream
- the other stream to inner join with- Returns:
- the new stream
-
leftOuterJoin
<V> MPairStream<T,Map.Entry<U,V>> leftOuterJoin(MPairStream<? extends T,? extends V> stream)
Performs a left outer join between this stream and another using the key to match.- Type Parameters:
V
- the value type parameter of the stream to join with- Parameters:
stream
- the other stream to left outer join with- Returns:
- the new stream
-
map
<R> MStream<R> map(SerializableBiFunction<? super T,? super U,? extends R> function)
Maps the key-value pairs to a new object- Type Parameters:
R
- the type parameter being mapped to- Parameters:
function
- the function to map from key-value pairs to objects of typeR
- Returns:
- the new stream
-
mapToDouble
MDoubleStream mapToDouble(SerializableToDoubleBiFunction<? super T,? super U> function)
Maps the key-value pairs to doubles- Parameters:
function
- the function to map from key-value pairs to double values- Returns:
- the new double stream
-
mapToPair
<R,V> MPairStream<R,V> mapToPair(SerializableBiFunction<? super T,? super U,? extends Map.Entry<? extends R,? extends V>> function)
Maps the key-value pairs to new key-value pairs- Type Parameters:
R
- the new key type parameterV
- the new value type parameter- Parameters:
function
- the function to map key-value pairs- Returns:
- the new pair stream
-
max
Optional<Map.Entry<T,U>> max(SerializableComparator<Map.Entry<T,U>> comparator)
Returns the max item in the stream using the given comparator to compare items.- Parameters:
comparator
- the comparator to use to compare values in the stream- Returns:
- the optional containing the max value
-
maxByKey
default Optional<Map.Entry<T,U>> maxByKey()
Returns the maximum entry in the stream comparing on key- Returns:
- the optional containing the entry with max key
-
maxByKey
default Optional<Map.Entry<T,U>> maxByKey(SerializableComparator<? super T> comparator)
Returns the maximum entry in the stream comparing on key- Parameters:
comparator
- the comparator to use to compare keys in the stream- Returns:
- the optional containing the entry with max key
-
maxByValue
default Optional<Map.Entry<T,U>> maxByValue()
Returns the maximum entry in the stream comparing on value- Returns:
- the optional containing the entry with max value
-
maxByValue
default Optional<Map.Entry<T,U>> maxByValue(SerializableComparator<? super U> comparator)
Returns the maximum entry in the stream comparing on value- Parameters:
comparator
- the comparator to use to compare value in the stream- Returns:
- the optional containing the entry with max value
-
min
Optional<Map.Entry<T,U>> min(SerializableComparator<Map.Entry<T,U>> comparator)
Returns the min item in the stream requiring that the items be comparable.- Returns:
- the optional containing the min value
-
minByKey
default Optional<Map.Entry<T,U>> minByKey()
Returns the minimum entry in the stream comparing on key- Returns:
- the optional containing the entry with min key
-
minByKey
default Optional<Map.Entry<T,U>> minByKey(SerializableComparator<? super T> comparator)
Returns the minimum entry in the stream comparing on key- Parameters:
comparator
- the comparator to use to compare keys in the stream- Returns:
- the optional containing the entry with min key
-
minByValue
default Optional<Map.Entry<T,U>> minByValue()
Returns the minimum entry in the stream comparing on value- Returns:
- the optional containing the entry with min value
-
minByValue
default Optional<Map.Entry<T,U>> minByValue(SerializableComparator<? super U> comparator)
Returns the minimum entry in the stream comparing on value- Parameters:
comparator
- the comparator to use to compare value in the stream- Returns:
- the optional containing the entry with min value
-
onClose
MPairStream<T,U> onClose(SerializableRunnable closeHandler)
Sets the handler to call when the stream is closed. Typically, this is to clean up any open resources, such as file handles.- Parameters:
closeHandler
- the handler to run when the stream is closed.
-
parallel
MPairStream<T,U> parallel()
Ensures that the stream is parallel or distributed.- Returns:
- the new stream
-
reduceByKey
MPairStream<T,U> reduceByKey(SerializableBinaryOperator<U> operator)
Performs a reduction by key on the elements of this stream using the given binary operator.- Parameters:
operator
- the binary operator used to combine two objects- Returns:
- the new stream containing keys and reduced values
-
repartition
MPairStream<T,U> repartition(int numPartitions)
Repartitions the stream to the given number of partitions. This may be a no-op for some streams, i.e. Local Streams.- Parameters:
numPartitions
- the number of partitions the stream should have- Returns:
- the new stream
-
rightOuterJoin
<V> MPairStream<T,Map.Entry<U,V>> rightOuterJoin(MPairStream<? extends T,? extends V> stream)
Performs a right outer join between this stream and another using the key to match.- Type Parameters:
V
- the value type parameter of the stream to join with- Parameters:
stream
- the other stream to right outer join with- Returns:
- the new stream
-
sample
MPairStream<T,U> sample(boolean withReplacement, long number)
Randomly samplesnumber
items from the stream.- Parameters:
withReplacement
- true allow a single item to be represented in the sample multiple times, false allow a single item to only be picked once.number
- the number of items desired in the sample- Returns:
- the new stream
-
shuffle
default MPairStream<T,U> shuffle()
Shuffles the items in the stream.- Returns:
- the new stream
-
shuffle
MPairStream<T,U> shuffle(Random random)
Shuffles the items in the string using the givenRandom
object.- Parameters:
random
- the random number generator- Returns:
- the new stream
-
sortByKey
default MPairStream<T,U> sortByKey(boolean ascending)
Sorts the items in the stream by key in ascending or descending order. Requires items to implement theComparable
interface.- Parameters:
ascending
- determines if the items should be sorted in ascending (true) or descending (false) order- Returns:
- the new stream
-
sortByKey
MPairStream<T,U> sortByKey(SerializableComparator<T> comparator)
Sorts the items in the stream by key using the given comparator.- Parameters:
comparator
- The comparator to use to comapre keys- Returns:
- the new stream
-
union
MPairStream<T,U> union(MPairStream<? extends T,? extends U> other)
Unions this stream with another.- Parameters:
other
- the other stream to add to this one.- Returns:
- the new stream
-
updateConfig
default void updateConfig()
Updates the config instance used for this String
-
-