官术网_书友最值得收藏!

Record reader

The input reader pides the input into appropriately sized splits (in practice, typically, 64 MB to 128 MB), and the framework assigns one split to each map function. The input reader reads data from stable storage (typically, a distributed filesystem) and generates key/value pairs.

A common example will read a directory full of text files and return each line as a record.

The record reader translates an input split generated by input format into records. The purpose of the record reader is to parse the data into records, but not to parse the record itself. It passes the data to the mapper in the form of a key/value pair. Usually, the key in this context is positional information, and the value is the chunk of data that composes a record. Customized record readers are outside of the scope of this book. We generally assume you have an appropriate record reader for your data. LineRecordReader is the default RecordReader that TextInputFormat provides and it treats each line of the input file as the new value; the associated key is byte offset. LineRecordReader always skips the first line in the split (or part of it), if it is not the first split. It reads one line after the boundary of the split at the end (if data is available, so it is not the last split).

主站蜘蛛池模板: 正宁县| 洱源县| 宁陵县| 教育| 噶尔县| 陆丰市| 阳高县| 嘉兴市| 新建县| 汝州市| 峡江县| 南陵县| 明水县| 宝应县| 星座| 顺平县| 抚顺县| 河间市| 志丹县| 安化县| 霍林郭勒市| 五华县| 黔西县| 犍为县| 磐安县| 濮阳县| 萝北县| 西畴县| 石城县| 东光县| 庆元县| 乐东| 澄迈县| 昂仁县| 井陉县| 长汀县| 绥宁县| 加查县| 龙里县| 唐海县| 乐山市|