net.firstpartners.rp.back.extractor
Class GenericDataExtractor

java.lang.Object
  extended by net.firstpartners.rp.back.extractor.GenericDataExtractor
All Implemented Interfaces:
IDataExtractor, IPlugin
Direct Known Subclasses:
PdfDataExtractor, WebDataExtractor, XmlDataExtractor

public class GenericDataExtractor
extends java.lang.Object
implements IDataExtractor

Extracts information from a binary files in a format that can be added to an Index.

Author:
brownpa

Field Summary
private  java.lang.String[] listExtensions
          Supported extension list
protected  org.apache.log4j.Logger logger
          Logger for this class and subclasses
private  int maxLengthSummary
          Maxim length for summary
private  int minLengthWord
          Minim length for the words
private  java.lang.String notIgnoreChars
          String pattern which defines the characters not to ignore
private  java.lang.String replaceChars
          String pattern which defines the characters to be replaced
private  java.lang.String type
          Extractor type
 
Constructor Summary
GenericDataExtractor()
           
 
Method Summary
 int canHandle(INewInformation info)
          How well the plugin thinks it can handle a new piece of information
 void convert(INewInformation info)
          Convert the file information into tuples
 java.lang.String[] getListExtensions()
          List of the supported extensions
 int getMaxLengthSummary()
           
 int getMinLengthWord()
           
 java.lang.String getNotIgnoreChars()
           
 java.lang.String getOriginalUri()
          The original place where we got this data
 java.lang.String getReplaceChars()
           
 java.lang.String getType()
          The type of the extractor
 void onLoad()
          Carry out any initiation tasks
 void setListExtensions(java.lang.String[] listExtensions)
           
 void setMaxLengthSummary(int maxLengthSummary)
           
 void setMinLengthWord(int minLengthWord)
           
 void setNotIgnoreChars(java.lang.String notIgnoreChars)
           
 void setReplaceChars(java.lang.String replaceChars)
           
 void setType(java.lang.String type)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected final org.apache.log4j.Logger logger
Logger for this class and subclasses


type

private java.lang.String type
Extractor type


listExtensions

private java.lang.String[] listExtensions
Supported extension list


minLengthWord

private int minLengthWord
Minim length for the words


maxLengthSummary

private int maxLengthSummary
Maxim length for summary


notIgnoreChars

private java.lang.String notIgnoreChars
String pattern which defines the characters not to ignore


replaceChars

private java.lang.String replaceChars
String pattern which defines the characters to be replaced

Constructor Detail

GenericDataExtractor

public GenericDataExtractor()
Method Detail

getOriginalUri

public java.lang.String getOriginalUri()
The original place where we got this data

Specified by:
getOriginalUri in interface IDataExtractor
Returns:
pointer

onLoad

public void onLoad()
Carry out any initiation tasks

Specified by:
onLoad in interface IPlugin

canHandle

public int canHandle(INewInformation info)
How well the plugin thinks it can handle a new piece of information

Specified by:
canHandle in interface IDataExtractor
Parameters:
info - Information to be handled
Returns:
1 In case that the open for the location is succesfully.

convert

public void convert(INewInformation info)
             throws RpException
Convert the file information into tuples

Specified by:
convert in interface IDataExtractor
Parameters:
info - Information to be converted
Throws:
RpException - If an error occur in processing the file

getMaxLengthSummary

public int getMaxLengthSummary()
Returns:
Returns the maxLengthSummary.

setMaxLengthSummary

public void setMaxLengthSummary(int maxLengthSummary)
Parameters:
maxLengthSummary - The maxLengthSummary to set.

getMinLengthWord

public int getMinLengthWord()
Returns:
Returns the minLengthWord.

setMinLengthWord

public void setMinLengthWord(int minLengthWord)
Parameters:
minLengthWord - The minLengthWord to set.

getNotIgnoreChars

public java.lang.String getNotIgnoreChars()
Returns:
Returns the notIgnoreChars.

setNotIgnoreChars

public void setNotIgnoreChars(java.lang.String notIgnoreChars)
Parameters:
notIgnoreChars - The notIgnoreChars to set.

getReplaceChars

public java.lang.String getReplaceChars()
Returns:
Returns the replaceChars.

setReplaceChars

public void setReplaceChars(java.lang.String replaceChars)
Parameters:
replaceChars - The replaceChars to set.

getType

public java.lang.String getType()
Description copied from interface: IDataExtractor
The type of the extractor

Specified by:
getType in interface IDataExtractor
Returns:
Returns the type.

setType

public void setType(java.lang.String type)
Parameters:
type - The type to set.

getListExtensions

public java.lang.String[] getListExtensions()
Description copied from interface: IDataExtractor
List of the supported extensions

Specified by:
getListExtensions in interface IDataExtractor
Returns:
Returns the listExtensions.

setListExtensions

public void setListExtensions(java.lang.String[] listExtensions)
Parameters:
listExtensions - The listExtensions to set.