net.firstpartners.rp.back.extractor
Class WebDataExtractor

java.lang.Object
  extended by net.firstpartners.rp.back.extractor.GenericDataExtractor
      extended by net.firstpartners.rp.back.extractor.WebDataExtractor
All Implemented Interfaces:
IDataExtractor, IPlugin

public class WebDataExtractor
extends GenericDataExtractor

Extracts information Query (like google)

Author:
brownpa

Field Summary
protected  org.apache.log4j.Logger logger
          Logger for this class and subclasses
 
Constructor Summary
WebDataExtractor()
           
 
Method Summary
 int canHandle(INewInformation info)
          How well the plugin thinks it can handle a new piece of information
 void convert(INewInformation info)
          Convert the web information into a list of documents
 java.lang.String getOriginalUri()
          The original place where we got this data
 void onLoad()
          Carry out any initiation tasks
 
Methods inherited from class net.firstpartners.rp.back.extractor.GenericDataExtractor
getListExtensions, getMaxLengthSummary, getMinLengthWord, getNotIgnoreChars, getReplaceChars, getType, setListExtensions, setMaxLengthSummary, setMinLengthWord, setNotIgnoreChars, setReplaceChars, setType
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected final org.apache.log4j.Logger logger
Logger for this class and subclasses

Constructor Detail

WebDataExtractor

public WebDataExtractor()
Method Detail

getOriginalUri

public java.lang.String getOriginalUri()
The original place where we got this data

Specified by:
getOriginalUri in interface IDataExtractor
Overrides:
getOriginalUri in class GenericDataExtractor
Returns:
pointer

onLoad

public void onLoad()
Carry out any initiation tasks

Specified by:
onLoad in interface IPlugin
Overrides:
onLoad in class GenericDataExtractor

canHandle

public int canHandle(INewInformation info)
How well the plugin thinks it can handle a new piece of information

Specified by:
canHandle in interface IDataExtractor
Overrides:
canHandle in class GenericDataExtractor
Parameters:
info -
Returns:
int , saying how well this plugin thinks it can handle this new piece of information

convert

public void convert(INewInformation info)
             throws RpException
Convert the web information into a list of documents

Specified by:
convert in interface IDataExtractor
Overrides:
convert in class GenericDataExtractor
Parameters:
info - Information to be converted
Throws:
RpException - If an error occur in processing the data