public class BasicTableOutputFormat extends org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
OutputFormat
class for creating a
BasicTable.
Usage Example:
In the main program, add the following code.
job.setOutputFormatClass(BasicTableOutputFormat.class); Path outPath = new Path("path/to/the/BasicTable"); BasicTableOutputFormat.setOutputPath(job, outPath); BasicTableOutputFormat.setSchema(job, "Name, Age, Salary, BonusPct");The above code does the following things:
String multiLocs = "commaSeparatedPaths" job.setOutputFormatClass(BasicTableOutputFormat.class); BasicTableOutputFormat.setMultipleOutputPaths(job, multiLocs); job.setOutputFormat(BasicTableOutputFormat.class); BasicTableOutputFormat.setSchema(job, "Name, Age, Salary, BonusPct"); BasicTableOutputFormat.setZebraOutputPartitionClass( job, MultipleOutputsTest.OutputPartitionerClass.class);The user ZebraOutputPartitionClass should like this
static class OutputPartitionerClass implements ZebraOutputPartition { @Override public int getOutputPartition(BytesWritable key, Tuple value) { return someIndexInOutputParitionlist0; }The user Reducer code (or similarly Mapper code if it is a Map-only job) should look like the following:
static class MyReduceClass implements Reducer<K, V, BytesWritable, Tuple> { // keep the tuple object for reuse. Tuple outRow; // indices of various fields in the output Tuple. int idxName, idxAge, idxSalary, idxBonusPct; @Override public void configure(Job job) { Schema outSchema = BasicTableOutputFormat.getSchema(job); // create a tuple that conforms to the output schema. outRow = TypesUtils.createTuple(outSchema); // determine the field indices. idxName = outSchema.getColumnIndex("Name"); idxAge = outSchema.getColumnIndex("Age"); idxSalary = outSchema.getColumnIndex("Salary"); idxBonusPct = outSchema.getColumnIndex("BonusPct"); } @Override public void reduce(K key, Iterator<V> values, OutputCollector<BytesWritable, Tuple> output, Reporter reporter) throws IOException { String name; int age; int salary; double bonusPct; // ... Determine the value of the individual fields of the row to be inserted. try { outTuple.set(idxName, name); outTuple.set(idxAge, new Integer(age)); outTuple.set(idxSalary, new Integer(salary)); outTuple.set(idxBonusPct, new Double(bonusPct)); output.collect(new BytesWritable(name.getBytes()), outTuple); } catch (ExecException e) { // should never happen } } @Override public void close() throws IOException { // no-op } }
Constructor and Description |
---|
BasicTableOutputFormat() |
Modifier and Type | Method and Description |
---|---|
void |
checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext jobContext)
Note: we perform the Initialization of the table here.
|
static void |
close(org.apache.hadoop.mapreduce.JobContext jobContext)
Close the output BasicTable, No more rows can be added into the table.
|
org.apache.hadoop.mapreduce.OutputCommitter |
getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext) |
static java.lang.String |
getOutputPartitionClassArguments(org.apache.hadoop.conf.Configuration conf)
Get the output partition class arguments string from job configuration
|
static org.apache.hadoop.fs.Path |
getOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the output path of the BasicTable from JobContext
|
static org.apache.hadoop.fs.Path[] |
getOutputPaths(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the multiple output paths of the BasicTable from JobContext
|
org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.BytesWritable,Tuple> |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext) |
static Schema |
getSchema(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the table schema in JobContext.
|
static SortInfo |
getSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the SortInfo object
|
static org.apache.hadoop.io.BytesWritable |
getSortKey(java.lang.Object builder,
Tuple t)
Generates a BytesWritable key for the input key
using keygenerate provided.
|
static java.lang.Object |
getSortKeyGenerator(org.apache.hadoop.mapreduce.JobContext jobContext)
Generates a zebra specific sort key generator which is used to generate BytesWritable key
Sort Key(s) are used to generate this object
|
static java.lang.String |
getStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext)
Get the table storage hint in JobContext.
|
static java.lang.Class<? extends ZebraOutputPartition> |
getZebraOutputPartitionClass(org.apache.hadoop.mapreduce.JobContext jobContext) |
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.Class<? extends ZebraOutputPartition> theClass,
org.apache.hadoop.fs.Path... paths)
Set the multiple output paths of the BasicTable in JobContext
|
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.Class<? extends ZebraOutputPartition> theClass,
java.lang.String arguments,
org.apache.hadoop.fs.Path... paths)
Set the multiple output paths of the BasicTable in JobContext
|
static void |
setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.String commaSeparatedLocations,
java.lang.Class<? extends ZebraOutputPartition> theClass)
Deprecated.
Use
#setMultipleOutputs(JobContext, class extends ZebraOutputPartition>, Path ...) instead. |
static void |
setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
org.apache.hadoop.fs.Path path)
Set the output path of the BasicTable in JobContext
|
static void |
setSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.String schema)
Deprecated.
|
static void |
setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.String sortColumns)
Deprecated.
|
static void |
setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.String sortColumns,
java.lang.Class<? extends org.apache.hadoop.io.RawComparator<java.lang.Object>> comparatorClass)
Deprecated.
|
static void |
setStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext,
java.lang.String storehint)
Deprecated.
|
static void |
setStorageInfo(org.apache.hadoop.mapreduce.JobContext jobContext,
ZebraSchema zSchema,
ZebraStorageHint zStorageHint,
ZebraSortInfo zSortInfo)
Set the table storage info including ZebraSchema,
|
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String commaSeparatedLocations, java.lang.Class<? extends ZebraOutputPartition> theClass) throws java.io.IOException
#setMultipleOutputs(JobContext, class extends ZebraOutputPartition>, Path ...)
instead.jobContext
- The JobContext object.commaSeparatedLocations
- The comma separated output paths to the tables.
The path must either not existent, or must be an empty directory.theClass
- Zebra output partitioner classjava.io.IOException
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.Class<? extends ZebraOutputPartition> theClass, org.apache.hadoop.fs.Path... paths) throws java.io.IOException
jobContext
- The JobContext object.theClass
- Zebra output partitioner classpaths
- The list of paths
The path must either not existent, or must be an empty directory.java.io.IOException
public static void setMultipleOutputs(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.Class<? extends ZebraOutputPartition> theClass, java.lang.String arguments, org.apache.hadoop.fs.Path... paths) throws java.io.IOException
jobContext
- The JobContext object.theClass
- Zebra output partitioner classarguments
- Arguments string to partitioner classpaths
- The list of paths
The path must either not existent, or must be an empty directory.java.io.IOException
public static java.lang.String getOutputPartitionClassArguments(org.apache.hadoop.conf.Configuration conf)
conf
- The job configuration object.public static org.apache.hadoop.fs.Path[] getOutputPaths(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException
jobContext
- The JobContext object.java.io.IOException
public static java.lang.Class<? extends ZebraOutputPartition> getZebraOutputPartitionClass(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException
java.io.IOException
public static void setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext, org.apache.hadoop.fs.Path path)
jobContext
- The JobContext object.path
- The output path to the table. The path must either not existent,
or must be an empty directory.public static org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext)
jobContext
- jobContext objectpublic static void setSchema(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String schema)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo)
instead.jobContext
- The JobContext object.schema
- The schema of the BasicTable to be created. For the initial
implementation, the schema string is simply a comma separated list
of column names, such as "Col1, Col2, Col3".public static Schema getSchema(org.apache.hadoop.mapreduce.JobContext jobContext) throws org.apache.hadoop.zebra.parser.ParseException
jobContext
- The JobContext object.org.apache.hadoop.zebra.parser.ParseException
public static java.lang.Object getSortKeyGenerator(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException, org.apache.hadoop.zebra.parser.ParseException
jobContext
- The JobContext object.java.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public static org.apache.hadoop.io.BytesWritable getSortKey(java.lang.Object builder, Tuple t) throws java.lang.Exception
builder
- Opaque key generator created by getSortKeyGenerator() methodt
- Tuple to create sort key fromjava.lang.Exception
public static void setStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String storehint) throws org.apache.hadoop.zebra.parser.ParseException, java.io.IOException
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo)
instead.jobContext
- The JobContext object.storehint
- The storage hint of the BasicTable to be created. The format would
be like "[f1, f2.subfld]; [f3, f4]".org.apache.hadoop.zebra.parser.ParseException
java.io.IOException
public static java.lang.String getStorageHint(org.apache.hadoop.mapreduce.JobContext jobContext)
jobContext
- The JobContext object.public static void setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String sortColumns, java.lang.Class<? extends org.apache.hadoop.io.RawComparator<java.lang.Object>> comparatorClass)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo)
instead.jobContext
- The JobContext object.sortColumns
- Comma-separated sort column namescomparatorClass
- comparator class name; null for defaultpublic static void setSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String sortColumns)
setStorageInfo(JobContext, ZebraSchema, ZebraStorageHint, ZebraSortInfo)
instead.jobContext
- The JobContext object.sortColumns
- Comma-separated sort column namespublic static void setStorageInfo(org.apache.hadoop.mapreduce.JobContext jobContext, ZebraSchema zSchema, ZebraStorageHint zStorageHint, ZebraSortInfo zSortInfo) throws org.apache.hadoop.zebra.parser.ParseException, java.io.IOException
jobcontext
- The JobContext object.zSchema
- The ZebraSchema object containing schema information.zStorageHint
- The ZebraStorageHint object containing storage hint information.zSortInfo
- The ZebraSortInfo object containing sorting information.org.apache.hadoop.zebra.parser.ParseException
java.io.IOException
public static SortInfo getSortInfo(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException
jobContext
- The JobContext object.java.io.IOException
public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException
BasicTableOutputFormat#getRecordWriter(FileSystem, JobContext, String, Progressable)
checkOutputSpecs
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
java.io.IOException
OutputFormat.checkOutputSpecs(JobContext)
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.BytesWritable,Tuple> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext) throws java.io.IOException
getRecordWriter
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
java.io.IOException
OutputFormat.getRecordWriter(TaskAttemptContext)
public static void close(org.apache.hadoop.mapreduce.JobContext jobContext) throws java.io.IOException
jobContext
- The JobContext object.java.io.IOException
public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext taContext) throws java.io.IOException, java.lang.InterruptedException
getOutputCommitter
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
java.io.IOException
java.lang.InterruptedException
Copyright © 2007-2012 The Apache Software Foundation