@Deprecated public class TableInputFormat extends java.lang.Object implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
InputFormat
class for reading one or more
BasicTables.
Usage Example:
In the main program, add the following code.
jobConf.setInputFormat(TableInputFormat.class); TableInputFormat.setInputPaths(jobConf, new Path("path/to/table1", new Path("path/to/table2"); TableInputFormat.setProjection(jobConf, "Name, Salary, BonusPct");The above code does the following things:
static class MyMapClass implements Mapper<BytesWritable, Tuple, K, V> { // keep the tuple object for reuse. // indices of various fields in the input Tuple. int idxName, idxSalary, idxBonusPct; @Override public void configure(JobConf job) { Schema projection = TableInputFormat.getProjection(job); // determine the field indices. idxName = projection.getColumnIndex("Name"); idxSalary = projection.getColumnIndex("Salary"); idxBonusPct = projection.getColumnIndex("BonusPct"); } @Override public void map(BytesWritable key, Tuple value, OutputCollector<K, V> output, Reporter reporter) throws IOException { try { String name = (String) value.get(idxName); int salary = (Integer) value.get(idxSalary); double bonusPct = (Double) value.get(idxBonusPct); // do something with the input data } catch (ExecException e) { e.printStackTrace(); } } @Override public void close() throws IOException { // no-op } }A little bit more explanation on the PIG
Tuple
objects. A Tuple is an
ordered list of PIG datum objects. The permitted PIG datum types can be
categorized as Scalar types and Composite types.
Supported Scalar types include seven native Java types: Boolean, Byte,
Integer, Long, Float, Double, String, as well as one PIG class called
DataByteArray
that represents type-less byte array.
Supported Composite types include:
Map
: It is the same as Java Map class, with the additional
restriction that the key-type must be one of the scalar types PIG recognizes,
and the value-type any of the scaler or composite types PIG understands.
DataBag
: A DataBag is a collection of Tuples.
Tuple
: Yes, Tuple itself can be a datum in another Tuple.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
INPUT_DELETED_CGS
Deprecated.
|
static java.lang.String |
INPUT_EXPR
Deprecated.
|
static java.lang.String |
INPUT_FE
Deprecated.
|
static java.lang.String |
INPUT_PROJ
Deprecated.
|
static java.lang.String |
INPUT_SORT
Deprecated.
|
Constructor and Description |
---|
TableInputFormat()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
getProjection(org.apache.hadoop.mapred.JobConf conf)
Deprecated.
Get the projection from the JobConf
|
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
Deprecated.
|
static Schema |
getSchema(org.apache.hadoop.mapred.JobConf conf)
Deprecated.
Get the schema of a table expr
|
static SortInfo |
getSortInfo(org.apache.hadoop.mapred.JobConf conf)
Deprecated.
Get the SortInfo object regarding a Zebra table
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf conf,
int numSplits)
Deprecated.
|
static TableRecordReader |
getTableRecordReader(org.apache.hadoop.mapred.JobConf conf,
java.lang.String projection)
Deprecated.
Get a TableRecordReader on a single split
|
static void |
requireSortedTable(org.apache.hadoop.mapred.JobConf conf,
ZebraSortInfo sortInfo)
Deprecated.
Requires sorted table or table union
|
static void |
setInputPaths(org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.fs.Path... paths)
Deprecated.
Set the paths to the input table.
|
static void |
setMinSplitSize(org.apache.hadoop.mapred.JobConf conf,
long minSize)
Deprecated.
Set the minimum split size.
|
static void |
setProjection(org.apache.hadoop.mapred.JobConf conf,
java.lang.String projection)
Deprecated.
Use
setProjection(JobConf, ZebraProjection) instead. |
static void |
setProjection(org.apache.hadoop.mapred.JobConf conf,
ZebraProjection projection)
Deprecated.
Set the input projection in the JobConf object.
|
void |
validateInput(org.apache.hadoop.mapred.JobConf conf)
Deprecated.
|
public static final java.lang.String INPUT_EXPR
public static final java.lang.String INPUT_PROJ
public static final java.lang.String INPUT_SORT
public static final java.lang.String INPUT_FE
public static final java.lang.String INPUT_DELETED_CGS
public static void setInputPaths(org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.fs.Path... paths)
conf
- JobConf object.paths
- one or more paths to BasicTables. The InputFormat class will
produce splits on the "union" of these BasicTables.public static Schema getSchema(org.apache.hadoop.mapred.JobConf conf) throws java.io.IOException
conf
- JobConf object.java.io.IOException
public static void setProjection(org.apache.hadoop.mapred.JobConf conf, java.lang.String projection) throws org.apache.hadoop.zebra.parser.ParseException
setProjection(JobConf, ZebraProjection)
instead.conf
- JobConf object.projection
- A common separated list of column names. If we want select all
columns, pass projection==null. The syntax of the projection
conforms to the Schema
string.org.apache.hadoop.zebra.parser.ParseException
public static void setProjection(org.apache.hadoop.mapred.JobConf conf, ZebraProjection projection) throws org.apache.hadoop.zebra.parser.ParseException
conf
- JobConf object.projection
- A common separated list of column names. If we want select all
columns, pass projection==null. The syntax of the projection
conforms to the Schema
string.org.apache.hadoop.zebra.parser.ParseException
public static java.lang.String getProjection(org.apache.hadoop.mapred.JobConf conf) throws java.io.IOException, org.apache.hadoop.zebra.parser.ParseException
conf
- The JobConf objectjava.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public static SortInfo getSortInfo(org.apache.hadoop.mapred.JobConf conf) throws java.io.IOException
conf
- JobConf objectjava.io.IOException
public static void requireSortedTable(org.apache.hadoop.mapred.JobConf conf, ZebraSortInfo sortInfo) throws java.io.IOException
conf
- JobConf object.sortInfo
- ZebraSortInfo object containing sorting information.java.io.IOException
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter) throws java.io.IOException
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
java.io.IOException
InputFormat.getRecordReader(InputSplit, JobConf, Reporter)
public static TableRecordReader getTableRecordReader(org.apache.hadoop.mapred.JobConf conf, java.lang.String projection) throws java.io.IOException, org.apache.hadoop.zebra.parser.ParseException
conf
- JobConf object.projection
- comma-separated column names in projection. null means all columns in projectionjava.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public static void setMinSplitSize(org.apache.hadoop.mapred.JobConf conf, long minSize)
conf
- The job conf object.minSize
- Minimum size.public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf, int numSplits) throws java.io.IOException
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
java.io.IOException
InputFormat.getSplits(JobConf, int)
@Deprecated public void validateInput(org.apache.hadoop.mapred.JobConf conf) throws java.io.IOException
java.io.IOException
Copyright © 2007-2012 The Apache Software Foundation