- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 274字
- 2021-07-02 19:01:57
Streams
A stream represents a collection of elements on which a chain of aggregate operations can be performed lazily. Streams have been optimized to cater to both sequential as well as parallel computation, keeping in mind the hardware capabilities of CPU cores. The Steams API was introduced in Java 8 to cater to functional programming needs of the developer. Streams are not Java based collections; however, they are capable enough to operate over collections by first converting them into streams. Some of the characteristics of streams that make them uniquely different from Java collection APIs are:
- Streams do not store elements. It only transfer values received from sources such as I/O channels, generating functions, data structures (Collections API), and perform a set of pipelined computation on them.
- Streams do not change the underlying data, they only process them and produce a new set of resultant data. When a distinct() or sorted() method is called on a stream, the source data does not change, but produces a result that is distinct or sorted.
- In general, streams are lazily evaluated; this leaves plenty of opportunity for code optimization. The thumb rule for lazy evaluation being that if an operation returns a stream then it is lazy or otherwise eager.
- Unlike collections streams, can be unbounded. This critically serves the purpose of processing streaming data in particular. Unbounded streams can be interrupted using the methods limit(), findFirst(), and so on.
- Streams are consumable, which means once a terminal operation gets fired on a stream it cannot be reused again. A similar parlance can be drawn from an iterator, which once iterated over needs to be regenerated again.