org.wikipedia.miner.extraction
Class XmlInputFormat

java.lang.Object
  extended by TextInputFormat
      extended by org.wikipedia.miner.extraction.XmlInputFormat

public class XmlInputFormat
extends TextInputFormat

Reads records that are delimited by a specifc begin/end tag.


Nested Class Summary
static class XmlInputFormat.XmlRecordReader
          XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag
 
Field Summary
static java.lang.String END_TAG_KEY
           
static java.lang.String START_TAG_KEY
           
 
Constructor Summary
XmlInputFormat()
           
 
Method Summary
  getRecordReader(InputSplit inputSplit, JobConf jobConf, Reporter reporter)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START_TAG_KEY

public static final java.lang.String START_TAG_KEY
See Also:
Constant Field Values

END_TAG_KEY

public static final java.lang.String END_TAG_KEY
See Also:
Constant Field Values
Constructor Detail

XmlInputFormat

public XmlInputFormat()
Method Detail

getRecordReader

public  getRecordReader(InputSplit inputSplit,
                             JobConf jobConf,
                             Reporter reporter)
                      throws java.io.IOException
Throws:
java.io.IOException