public static class BasicTable.Reader
extends java.lang.Object
implements java.io.Closeable
Modifier and Type | Class and Description |
---|---|
static class |
BasicTable.Reader.RangeSplit
A range-based split on the metaReadertable.The content of the split is
implementation-dependent.
|
static class |
BasicTable.Reader.RowSplit
A row-based split on the zebra table;
|
Constructor and Description |
---|
BasicTable.Reader(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Create a BasicTable reader.
|
BasicTable.Reader(org.apache.hadoop.fs.Path path,
java.lang.String[] deletedCGs,
org.apache.hadoop.conf.Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Close the BasicTable for reading.
|
BlockDistribution |
getBlockDistribution(BasicTable.Reader.RangeSplit split)
Given a split range, calculate how the file data that fall into the range
are distributed among hosts.
|
BlockDistribution |
getBlockDistribution(BasicTable.Reader.RowSplit split)
Given a row-based split, calculate how the file data that fall into the split
are distributed among hosts.
|
java.lang.String |
getDeletedCGs() |
static java.lang.String |
getDeletedCGs(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf) |
KeyDistribution |
getKeyDistribution(int n,
int nTables,
BlockDistribution lastBd)
Collect some key samples and use them to partition the table.
|
java.io.DataInputStream |
getMetaBlock(java.lang.String name)
Obtain an input stream for reading a meta block.
|
java.lang.String |
getName(int i) |
java.lang.String |
getPath()
Get the path to the table.
|
org.apache.hadoop.fs.PathFilter |
getPathFilter(org.apache.hadoop.conf.Configuration conf)
Get the path filter used by the table.
|
int |
getRowSplitCGIndex()
Get index of the column group that will be used for row-based split.
|
TableScanner |
getScanner(BasicTable.Reader.RangeSplit split,
boolean closeReader)
Get a scanner that reads a consecutive number of rows as defined in the
BasicTable.Reader.RangeSplit object, which should be obtained from previous calls
of rangeSplit(int) . |
TableScanner |
getScanner(boolean closeReader,
BasicTable.Reader.RowSplit rowSplit)
Get a scanner that reads a consecutive number of rows as defined in the
BasicTable.Reader.RowSplit object. |
TableScanner |
getScanner(org.apache.hadoop.io.BytesWritable beginKey,
org.apache.hadoop.io.BytesWritable endKey,
boolean closeReader)
Get a scanner that reads all rows whose row keys fall in a specific
range.
|
Schema |
getSchema()
Get the schema of the table.
|
static Schema |
getSchema(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Get the BasicTable schema without loading the full table index.
|
SortInfo |
getSortInfo() |
BasicTableStatus |
getStatus()
Get the status of the BasicTable.
|
boolean |
isSorted()
Is the Table sorted?
|
java.util.List<BasicTable.Reader.RangeSplit> |
rangeSplit(int n)
Split the table into at most n parts.
|
void |
rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus)
Rearrange the files according to the column group index ordering
|
java.util.List<BasicTable.Reader.RowSplit> |
rowSplit(long[] starts,
long[] lengths,
org.apache.hadoop.fs.Path[] paths,
int splitCGIndex,
int[] batchSizes,
int numBatches)
We already use FileInputFormat to create byte offset-based input splits.
|
void |
setProjection(java.lang.String projection)
Set the projection for the reader.
|
public BasicTable.Reader(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
path
- The directory path to the BasicTable.conf
- Optional configuration parameters.java.io.IOException
public BasicTable.Reader(org.apache.hadoop.fs.Path path, java.lang.String[] deletedCGs, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOException
public boolean isSorted()
public SortInfo getSortInfo()
public java.lang.String getName(int i)
public void setProjection(java.lang.String projection) throws org.apache.hadoop.zebra.parser.ParseException, java.io.IOException
getScanner(RangeSplit, boolean)
,
getScanner(BytesWritable, BytesWritable, boolean)
,
getStatus()
, getSchema()
.projection
- The projection on the BasicTable for subsequent read operations.
For this version of implementation, the projection is a comma
separated list of column names, such as
"FirstName, LastName, Sex, Department". If we want select all
columns, pass projection==null.java.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public BasicTableStatus getStatus() throws java.io.IOException
java.io.IOException
public BlockDistribution getBlockDistribution(BasicTable.Reader.RangeSplit split) throws java.io.IOException
split
- The range-based split. Can be null to indicate the whole TFile.java.io.IOException
rangeSplit(int)
public BlockDistribution getBlockDistribution(BasicTable.Reader.RowSplit split) throws java.io.IOException
split
- The row-based split. Cannot be null.java.io.IOException
public KeyDistribution getKeyDistribution(int n, int nTables, BlockDistribution lastBd) throws java.io.IOException
KeyDistribution
object also contains information on how data are distributed for each
key-partitioned bucket.n
- Targeted size of the sampling.nTables
- Number of tables in unionjava.io.IOException
public TableScanner getScanner(org.apache.hadoop.io.BytesWritable beginKey, org.apache.hadoop.io.BytesWritable endKey, boolean closeReader) throws java.io.IOException
beginKey
- The begin key of the scan range. If null, start from the first
row in the table.endKey
- The end key of the scan range. If null, scan till the last row
in the table.closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.java.io.IOException
public TableScanner getScanner(BasicTable.Reader.RangeSplit split, boolean closeReader) throws java.io.IOException, org.apache.hadoop.zebra.parser.ParseException
BasicTable.Reader.RangeSplit
object, which should be obtained from previous calls
of rangeSplit(int)
.split
- The split range. If null, get a scanner to read the complete
table.closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.java.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public TableScanner getScanner(boolean closeReader, BasicTable.Reader.RowSplit rowSplit) throws java.io.IOException, org.apache.hadoop.zebra.parser.ParseException, org.apache.hadoop.zebra.parser.ParseException
BasicTable.Reader.RowSplit
object.closeReader
- close the underlying Reader object when we close the scanner.
Should be set to true if we have only one scanner on top of the
reader, so that we should release resources after the scanner is
closed.rowSplit
- split based on row numbers.java.io.IOException
org.apache.hadoop.zebra.parser.ParseException
public Schema getSchema()
getSchema(Path, Configuration)
if a projection
has been set on the table.public static Schema getSchema(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
path
- The path to the BasicTable.conf
- java.io.IOException
public java.lang.String getPath()
public org.apache.hadoop.fs.PathFilter getPathFilter(org.apache.hadoop.conf.Configuration conf)
public java.util.List<BasicTable.Reader.RangeSplit> rangeSplit(int n) throws java.io.IOException
n
- Maximum number of parts in the output list.java.io.IOException
public java.util.List<BasicTable.Reader.RowSplit> rowSplit(long[] starts, long[] lengths, org.apache.hadoop.fs.Path[] paths, int splitCGIndex, int[] batchSizes, int numBatches) throws java.io.IOException
starts
- array of starting byte of fileSplits.lengths
- array of length of fileSplits.paths
- array of path of fileSplits.splitCGIndex
- index of column group that is used to create fileSplits.java.io.IOException
public void rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus) throws java.io.IOException
filestatus
- array of FileStatus to be rearraged onjava.io.IOException
public int getRowSplitCGIndex() throws java.io.IOException
java.io.IOException
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
java.io.IOException
public java.lang.String getDeletedCGs()
public static java.lang.String getDeletedCGs(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOException
public java.io.DataInputStream getMetaBlock(java.lang.String name) throws MetaBlockDoesNotExist, java.io.IOException
name
- The name of the meta block.java.io.IOException
MetaBlockDoesNotExist
Copyright © 2007-2012 The Apache Software Foundation