net.firstpartners.rp.back.extractor.util
Class Spider

java.lang.Object
  extended by net.firstpartners.rp.back.extractor.util.Spider

public class Spider
extends java.lang.Object

Class responsable for spidering the uri location and return the information

Version:
1.1
Author:
Firstpartners.net

Nested Class Summary
 class Spider.SpiderParserCallback
          Inner class
 
Field Summary
private  java.lang.String author
          Author of the document
private  java.net.URL base
          Base of the links inside document
private  java.lang.String description
          Description (Summary) of the document
private  java.util.LinkedList links
          Links list
protected  org.apache.log4j.Logger logger
          Logger for this class and subclasses
private  int maxLengthDesc
          Maxim length for the description
private  java.lang.String title
          Title of the document
private  java.lang.String uri
          Uri location of the document to spider
private  java.util.LinkedList values
          Values list
 
Constructor Summary
Spider(java.lang.String uri)
          Creates a new Spider object for the specified location with no summary length specified of the summary required
Spider(java.lang.String uri, int lengthSummary)
          Creates a new Spider object for the specified location and maxim length of the summary required
 
Method Summary
 void addLink(java.net.URL u)
          Add the URL-object to the links list
 void addValue(java.lang.String value)
          Add the value-object to the values list
 java.lang.String fixHref(java.lang.String href)
          repairs a sloppy href, flips backwards /, adds missing /
 java.lang.String getAuthor()
          Get the author
 java.net.URL getBase()
          Get the document base
 java.lang.String getDescription()
          Get the page description
 java.util.LinkedList getLinks()
          Get the list of the links from the document
 int getMaxLengthDesc()
          Get the maxim length for the description
 java.lang.String getTitle()
          Get the title of the document
 java.lang.String getUri()
          Get the uri of the document
 java.util.LinkedList getValues()
          Get the list of the document values
 void setAuthor(java.lang.String author)
          Set the page author
 void setBase(java.lang.String abase)
          Set the document base
 void setDescription(java.lang.String description)
          Set the page description
 void setLinks(java.util.LinkedList links)
          Set the list of links
 void setTitle(java.lang.String title)
          Set the title of the document
 void start()
          Start to spider the data
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected final org.apache.log4j.Logger logger
Logger for this class and subclasses


uri

private java.lang.String uri
Uri location of the document to spider


maxLengthDesc

private int maxLengthDesc
Maxim length for the description


base

private java.net.URL base
Base of the links inside document


title

private java.lang.String title
Title of the document


author

private java.lang.String author
Author of the document


description

private java.lang.String description
Description (Summary) of the document


links

private java.util.LinkedList links
Links list


values

private java.util.LinkedList values
Values list

Constructor Detail

Spider

public Spider(java.lang.String uri,
              int lengthSummary)
Creates a new Spider object for the specified location and maxim length of the summary required

Parameters:
uri - Uri location to spider
lengthSummary - Maxim length of the summary required

Spider

public Spider(java.lang.String uri)
Creates a new Spider object for the specified location with no summary length specified of the summary required

Parameters:
uri - Uri location to spider
Method Detail

start

public void start()
           throws RpException
Start to spider the data

Throws:
RpException - Exception in parsing the data

fixHref

public java.lang.String fixHref(java.lang.String href)
repairs a sloppy href, flips backwards /, adds missing /

Parameters:
href - web site reference
Returns:
repaired web page reference

addLink

public void addLink(java.net.URL u)
Add the URL-object to the links list

Parameters:
u - Url object

addValue

public void addValue(java.lang.String value)
Add the value-object to the values list

Parameters:
value - DOCUMENT ME!

getAuthor

public java.lang.String getAuthor()
Get the author

Returns:
Author of the page

setAuthor

public void setAuthor(java.lang.String author)
Set the page author

Parameters:
author - Page author

getBase

public java.net.URL getBase()
Get the document base

Returns:
Document base

setBase

public void setBase(java.lang.String abase)
Set the document base

Parameters:
abase - Document base

getDescription

public java.lang.String getDescription()
Get the page description

Returns:
Page description

setDescription

public void setDescription(java.lang.String description)
Set the page description

Parameters:
description - Page description

getLinks

public java.util.LinkedList getLinks()
Get the list of the links from the document

Returns:
List of links from documen

setLinks

public void setLinks(java.util.LinkedList links)
Set the list of links

Parameters:
links - List of the links of the document

getMaxLengthDesc

public int getMaxLengthDesc()
Get the maxim length for the description

Returns:
Maxim length for the description

getTitle

public java.lang.String getTitle()
Get the title of the document

Returns:
Title of the document

setTitle

public void setTitle(java.lang.String title)
Set the title of the document

Parameters:
title - Title of the document

getUri

public java.lang.String getUri()
Get the uri of the document

Returns:
Document location

getValues

public java.util.LinkedList getValues()
Get the list of the document values

Returns:
List of the values from docuemnt