Package com.gengoai.stream
Interface MStream<T>
-
- Type Parameters:
T
- the component type of the stream
- All Superinterfaces:
AutoCloseable
,Iterable<T>
- All Known Implementing Classes:
LocalInMemoryMStream
,LocalReusableMStream
,SparkStream
public interface MStream<T> extends AutoCloseable, Iterable<T>
A facade for stream classes, such as Java's
Stream
and Spark'sRDD
objects. Provides a common interface to working with an manipulating streams regardless of their backend implementation.- Author:
- David B. Bracewell
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description MStream<T>
cache()
Caches the stream.List<T>
collect()
Collects the items in the stream as a list<R> R
collect(Collector<? super T,?,R> collector)
Performs a reduction on the string using hte given collector.long
count()
The number of items in the streamMap<T,Long>
countByValue()
Counts the number of times each item occurs in the streamMStream<T>
distinct()
Removes duplicates from the streamMStream<T>
filter(SerializablePredicate<? super T> predicate)
Filters the stream.Optional<T>
first()
Gets the first item in the stream<R> MStream<R>
flatMap(SerializableFunction<? super T,Stream<? extends R>> mapper)
Maps the objects in this stream to one or more new objects using the given function.<R,U>
MPairStream<R,U>flatMapToPair(SerializableFunction<? super T,Stream<? extends Map.Entry<? extends R,? extends U>>> function)
Maps the objects in this stream to one or more new key-value pairs using the given function.T
fold(T zeroValue, SerializableBinaryOperator<T> operator)
Performs a reduction on the elements of this stream using the given binary operator.void
forEach(SerializableConsumer<? super T> consumer)
Performs an operation on each item in the streamvoid
forEachLocal(SerializableConsumer<? super T> consumer)
Performs an operation on each item in the stream ensuring that is done locally and not distributed.StreamingContext
getContext()
Gets the context used to create the stream<U> MPairStream<U,Iterable<T>>
groupBy(SerializableFunction<? super T,? extends U> function)
Groups the items in the stream using the given function that maps objects to key valuesMStream<T>
intersection(MStream<T> other)
Returns a new MStream containing the intersection of elements in this stream and the argument stream.boolean
isDistributed()
Is distributed boolean.boolean
isEmpty()
Determines if the stream is empty or notIterator<T>
iterator()
Gets an iterator for the streamStream<T>
javaStream()
Converts this stream into a java streamMStream<T>
limit(long number)
Limits the stream to the firstnumber
items.<R> MStream<R>
map(SerializableFunction<? super T,? extends R> function)
Maps the objects in the stream using the given functionMDoubleStream
mapToDouble(SerializableToDoubleFunction<? super T> function)
Maps objects in this stream to double values<R,U>
MPairStream<R,U>mapToPair(SerializableFunction<? super T,? extends Map.Entry<? extends R,? extends U>> function)
Maps the objects in this stream to a key-value pair using the given function.default Optional<T>
max()
Returns the max item in the stream requiring that the items be comparable.Optional<T>
max(SerializableComparator<? super T> comparator)
Returns the max item in the stream using the given comparator to compare items.default Optional<T>
min()
Returns the min item in the stream requiring that the items be comparable.Optional<T>
min(SerializableComparator<? super T> comparator)
Returns the min item in the stream using the given comparator to compare items.MStream<T>
onClose(SerializableRunnable closeHandler)
Sets the handler to call when the stream is closed.MStream<T>
parallel()
Ensures that the stream is parallel or distributed.MStream<Stream<T>>
partition(long partitionSize)
Partitions the stream into iterables each of size<=
partitionSize
.MStream<T>
persist(StorageLevel storageLevel)
Persists the stream to the given storage levelOptional<T>
reduce(SerializableBinaryOperator<T> reducer)
Performs a reduction on the elements of this stream using the given binary operator.MStream<T>
repartition(int numPartitions)
Repartitions the stream to the given number of partitions.MStream<T>
sample(boolean withReplacement, int number)
Randomly samplesnumber
items from the stream.void
saveAsTextFile(Resource location)
Save as the stream to a text file at the given location.default void
saveAsTextFile(String location)
Save as the stream to a text file at the given location.default MStream<T>
shuffle()
Shuffles the items in the stream.MStream<T>
shuffle(Random random)
Shuffles the items in the string using the givenRandom
object.MStream<T>
skip(long n)
Skips the firstn
items in the stream<R extends Comparable<R>>
MStream<T>sortBy(boolean ascending, SerializableFunction<? super T,? extends R> keyFunction)
Sorts the items in the stream in ascending or descending order using the given keyFunction to determine how to compare.default MStream<T>
sorted(boolean ascending)
Sorts the items in the stream in ascending or descending order.List<T>
take(int n)
Takes the firstn
items from the stream.default SparkStream<T>
toDistributedStream()
To distributed stream spark stream.MStream<T>
union(MStream<T> other)
Unions this stream with another.default void
updateConfig()
Updates the config instance used for this stream<U> MPairStream<T,U>
zip(MStream<U> other)
Zips (combines) this stream together with the given other creating a pair stream.MPairStream<T,Long>
zipWithIndex()
Creates a pair stream where the keys are items in this stream and values are the index (starting at 0) of the item in the stream.-
Methods inherited from interface java.lang.AutoCloseable
close
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
collect
<R> R collect(Collector<? super T,?,R> collector)
Performs a reduction on the string using hte given collector.- Type Parameters:
R
- the component type of the collection after applying the collector- Parameters:
collector
- the collector to use in reducing the stream- Returns:
- the result of the collector
-
collect
List<T> collect()
Collects the items in the stream as a list- Returns:
- the list of items in the stream
-
count
long count()
The number of items in the stream- Returns:
- the number of items in the stream
-
countByValue
Map<T,Long> countByValue()
Counts the number of times each item occurs in the stream- Returns:
- a map of object - long counts
-
distinct
MStream<T> distinct()
Removes duplicates from the stream- Returns:
- the new stream without duplicates
-
filter
MStream<T> filter(SerializablePredicate<? super T> predicate)
Filters the stream.- Parameters:
predicate
- the predicate to use to determine which objects are kept- Returns:
- the new stream
-
first
Optional<T> first()
Gets the first item in the stream- Returns:
- the optional containing the first item
-
flatMap
<R> MStream<R> flatMap(SerializableFunction<? super T,Stream<? extends R>> mapper)
Maps the objects in this stream to one or more new objects using the given function.- Type Parameters:
R
- the component type of the returning stream- Parameters:
mapper
- the function to use to map objects- Returns:
- the new stream
-
flatMapToPair
<R,U> MPairStream<R,U> flatMapToPair(SerializableFunction<? super T,Stream<? extends Map.Entry<? extends R,? extends U>>> function)
Maps the objects in this stream to one or more new key-value pairs using the given function.- Type Parameters:
R
- the key type parameterU
- the value type parameter- Parameters:
function
- the function to use to map objects- Returns:
- the new pair stream
-
fold
T fold(T zeroValue, SerializableBinaryOperator<T> operator)
Performs a reduction on the elements of this stream using the given binary operator.- Parameters:
zeroValue
- The initial valueoperator
- the binary operator used to combine two objects- Returns:
- the optional describing the reduction
-
forEach
void forEach(SerializableConsumer<? super T> consumer)
Performs an operation on each item in the stream- Parameters:
consumer
- the consumer action to perform
-
forEachLocal
void forEachLocal(SerializableConsumer<? super T> consumer)
Performs an operation on each item in the stream ensuring that is done locally and not distributed.- Parameters:
consumer
- the consumer action to perform
-
getContext
StreamingContext getContext()
Gets the context used to create the stream- Returns:
- the context
-
groupBy
<U> MPairStream<U,Iterable<T>> groupBy(SerializableFunction<? super T,? extends U> function)
Groups the items in the stream using the given function that maps objects to key values- Type Parameters:
U
- the key type parameter- Parameters:
function
- the function that determines the key of the objects in the stream- Returns:
- the new pair stream
-
intersection
MStream<T> intersection(MStream<T> other)
Returns a new MStream containing the intersection of elements in this stream and the argument stream.- Parameters:
other
- Stream to perform intersection with- Returns:
- the new stream
-
isDistributed
boolean isDistributed()
Is distributed boolean.- Returns:
- True if the stream is distributed
-
isEmpty
boolean isEmpty()
Determines if the stream is empty or not- Returns:
- True if empty, False otherwise
-
limit
MStream<T> limit(long number)
Limits the stream to the firstnumber
items.- Parameters:
number
- the number of items desired- Returns:
- the new stream of size
number
-
map
<R> MStream<R> map(SerializableFunction<? super T,? extends R> function)
Maps the objects in the stream using the given function- Type Parameters:
R
- the component type of the returning stream- Parameters:
function
- the function to use to map objects- Returns:
- the new stream
-
mapToDouble
MDoubleStream mapToDouble(SerializableToDoubleFunction<? super T> function)
Maps objects in this stream to double values- Parameters:
function
- the function to convert objects to doubles- Returns:
- the new double stream
-
mapToPair
<R,U> MPairStream<R,U> mapToPair(SerializableFunction<? super T,? extends Map.Entry<? extends R,? extends U>> function)
Maps the objects in this stream to a key-value pair using the given function.- Type Parameters:
R
- the key type parameterU
- the value type parameter- Parameters:
function
- the function to use to map objects- Returns:
- the new pair stream
-
max
default Optional<T> max()
Returns the max item in the stream requiring that the items be comparable.- Returns:
- the optional containing the max value
-
max
Optional<T> max(SerializableComparator<? super T> comparator)
Returns the max item in the stream using the given comparator to compare items.- Parameters:
comparator
- the comparator to use to compare values in the stream- Returns:
- the optional containing the max value
-
min
default Optional<T> min()
Returns the min item in the stream requiring that the items be comparable.- Returns:
- the optional containing the min value
-
min
Optional<T> min(SerializableComparator<? super T> comparator)
Returns the min item in the stream using the given comparator to compare items.- Parameters:
comparator
- the comparator to use to compare values in the stream- Returns:
- the optional containing the min value
-
onClose
MStream<T> onClose(SerializableRunnable closeHandler)
Sets the handler to call when the stream is closed. Typically, this is to clean up any open resources, such as file handles.- Parameters:
closeHandler
- the handler to run when the stream is closed.- Returns:
- the m stream
-
parallel
MStream<T> parallel()
Ensures that the stream is parallel or distributed.- Returns:
- the new stream
-
partition
MStream<Stream<T>> partition(long partitionSize)
Partitions the stream into iterables each of size<=
partitionSize
.- Parameters:
partitionSize
- the desired number of objects in each partition- Returns:
- the new stream
-
persist
MStream<T> persist(StorageLevel storageLevel)
Persists the stream to the given storage level- Parameters:
storageLevel
- the storage level- Returns:
- the persisted MStream
-
reduce
Optional<T> reduce(SerializableBinaryOperator<T> reducer)
Performs a reduction on the elements of this stream using the given binary operator.- Parameters:
reducer
- the binary operator used to combine two objects- Returns:
- the optional describing the reduction
-
repartition
MStream<T> repartition(int numPartitions)
Repartitions the stream to the given number of partitions. This may be a no-op for some streams, i.e. Local Streams.- Parameters:
numPartitions
- the number of partitions the stream should have- Returns:
- the new stream
-
sample
MStream<T> sample(boolean withReplacement, int number)
Randomly samplesnumber
items from the stream.- Parameters:
withReplacement
- true allow a single item to be represented in the sample multiple times, false allow a single item to only be picked once.number
- the number of items desired in the sample- Returns:
- the new stream
-
saveAsTextFile
void saveAsTextFile(Resource location)
Save as the stream to a text file at the given location. Writing may result in multiple files being created.- Parameters:
location
- the location to write the stream to
-
saveAsTextFile
default void saveAsTextFile(String location)
Save as the stream to a text file at the given location. Writing may result in multiple files being created.- Parameters:
location
- the location to write the stream to
-
shuffle
MStream<T> shuffle(Random random)
Shuffles the items in the string using the givenRandom
object.- Parameters:
random
- the random number generator- Returns:
- the new stream
-
skip
MStream<T> skip(long n)
Skips the firstn
items in the stream- Parameters:
n
- the number of items in the stream- Returns:
- the new stream
-
sortBy
<R extends Comparable<R>> MStream<T> sortBy(boolean ascending, SerializableFunction<? super T,? extends R> keyFunction)
Sorts the items in the stream in ascending or descending order using the given keyFunction to determine how to compare.- Type Parameters:
R
- the type parameter- Parameters:
ascending
- determines if the items should be sorted in ascending (true) or descending (false) orderkeyFunction
- function to use to convert the items in the stream to something that is comparable.- Returns:
- the new stream
-
sorted
default MStream<T> sorted(boolean ascending)
Sorts the items in the stream in ascending or descending order. Requires items to implement theComparable
interface.- Parameters:
ascending
- determines if the items should be sorted in ascending (true) or descending (false) order- Returns:
- the new stream
-
take
List<T> take(int n)
Takes the firstn
items from the stream.- Parameters:
n
- the number of items to take- Returns:
- a list of the first n items
-
toDistributedStream
default SparkStream<T> toDistributedStream()
To distributed stream spark stream.- Returns:
- A distributed version of the stream
-
union
MStream<T> union(MStream<T> other)
Unions this stream with another.- Parameters:
other
- the other stream to add to this one.- Returns:
- the new stream
-
updateConfig
default void updateConfig()
Updates the config instance used for this stream
-
zip
<U> MPairStream<T,U> zip(MStream<U> other)
Zips (combines) this stream together with the given other creating a pair stream. For example, if this stream contains [1,2,3] and stream 2 contains [4,5,6] the result would be a pair stream containing the key value pairs [(1,4), (2,5), (3,6)]. Note that the length of the resulting stream will be the minimum of the two streams.
- Type Parameters:
U
- the component type of the second stream- Parameters:
other
- the stream making up the value in the resulting entries- Returns:
- a new pair stream with keys from this stream and values for the other stream
-
zipWithIndex
MPairStream<T,Long> zipWithIndex()
Creates a pair stream where the keys are items in this stream and values are the index (starting at 0) of the item in the stream.- Returns:
- the new pair stream
-
-