官术网_书友最值得收藏!

Hadoop-specific data types

Up to this point we've glossed over the actual data types used as the input and output of the map and reduce classes. Let's take a look at them now.

The Writable and WritableComparable interfaces

If you browse the Hadoop API for the org.apache.hadoop.io package, you'll see some familiar classes such as Text and IntWritable along with others with the Writable suffix.

This package also contains the Writable interface specified as follows:

import java.io.DataInput ;
import java.io.DataOutput ;
import java.io.IOException ;

public interface Writable
{
void write(DataOutput out) throws IOException ;
void readFields(DataInput in) throws IOException ;
}

The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Every data type to be used as a value input or output from a mapper or reducer (that is, V1, V2, or V3) must implement this interface.

Data to be used as keys (K1, K2, K3) has a stricter requirement: in addition to Writable,it must also provide an implementation of the standard Java Comparable interface. This has the following specifications:

public interface Comparable
{
public int compareTO( Object obj) ;
}

The compare method returns -1, 0, or 1 depending on whether the compared object is less than, equal to, or greater than the current object.

As a convenience interface, Hadoop provides the WritableComparable interface in the org.apache.hadoop.io package.

public interface WritableComparable extends Writable, Comparable
{}

Introducing the wrapper classes

Fortunately, you don't have to start from scratch; as you've already seen, Hadoop provides classes that wrap the Java primitive types and implement WritableComparable. They are provided in the org.apache.hadoop.io package.

Primitive wrapper classes

These classes are conceptually similar to the primitive wrapper classes, such as Integer and Long found in java.lang. They hold a single primitive value that can be set either at construction or via a setter method.

  • BooleanWritable
  • ByteWritable
  • DoubleWritable
  • FloatWritable
  • IntWritable
  • LongWritable
  • VIntWritable – a variable length integer type
  • VLongWritable – a variable length long type

Array wrapper classes

These classes provide writable wrappers for arrays of other Writable objects. For example, an instance of either could hold an array of IntWritable or DoubleWritable but not arrays of the raw int or float types. A specific subclass for the required Writable class will be required. They are as follows:

  • ArrayWritable
  • TwoDArrayWritable

Map wrapper classes

These classes allow implementations of the java.util.Map interface to be used as keys or values. Note that they are defined as Map<Writable, Writable> and effectively manage a degree of internal-runtime-type checking. This does mean that compile type checking is weakened, so be careful.

  • AbstractMapWritable: This is a base class for other concrete Writable map implementations
  • MapWritable: This is a general purpose map mapping Writable keys to Writable values
  • SortedMapWritable: This is a specialization of the MapWritable class that also implements the SortedMap interface
主站蜘蛛池模板: 昆山市| 福清市| 湖口县| 永福县| 高雄县| 交口县| 房产| 栖霞市| 西乌| 岳阳市| 彰武县| 唐山市| 临清市| 荆州市| 胶南市| 汶川县| 年辖:市辖区| 浦城县| 黄龙县| 安乡县| 阿荣旗| 普兰店市| 新丰县| 郑州市| 江口县| 松溪县| 自贡市| 梁平县| 勃利县| 诸暨市| 左云县| 中江县| 库伦旗| 旬阳县| 襄樊市| 西昌市| 龙海市| 临沂市| 盐城市| 鹤山市| 长垣县|