|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.cleancode.net.URLReader
public class URLReader
Fetches the contents of a URL with a variety of options.
Field Summary | |
---|---|
static String |
PSEUDO_LINE_DELIMITER
For POST data on the command line, use this to indicate where actual line breaks go. |
static String |
VERSION
Current version of this class. |
Constructor Summary | |
---|---|
URLReader(String urlString,
boolean verbose,
String proxyProperty)
Creates a URLReader object to fetch URLs. |
Method Summary | |
---|---|
String |
getContent()
Returns HTML content of previously fetched URL. |
String |
getText(boolean keepImages)
Returns text extracted from content of previously fetched URL. |
static void |
main(String[] args)
Standalone program to fetch a URL. |
void |
readFromURL()
Fetch a URL using URL objects
to establish a connection. |
void |
readFromURLConn(String[] args,
String agent,
String postData)
Fetch a URL using URLConnection objects. |
void |
readRaw()
Fetch a URL in raw mode--using Socket objects--to
establish a connection. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String VERSION
public static final String PSEUDO_LINE_DELIMITER
Constructor Detail |
---|
public URLReader(String urlString, boolean verbose, String proxyProperty)
urlString
- target URLverbose
- diagnostic output flagproxyProperty
- host:port string for proxy server to useMethod Detail |
---|
public String getContent()
public String getText(boolean keepImages)
SimpleHtmlToText
converter to extract
text from the HTML content. You may optionally retain a marker
for each image (i.e. the image file name in brackets).
keepImages
- boolean indicating whether to keep markers
for each image in the text.
public void readRaw() throws IOException
Socket
objects--to
establish a connection.
A raw HTTP GET command initiates the transaction.
IOException
- if I/O problempublic void readFromURL() throws IOException
URL
objects
to establish a connection.
No cookies may be sent with this mode.
IOException
- if I/O problempublic void readFromURLConn(String[] args, String agent, String postData) throws IOException
URLConnection
objects.
Cookies may be sent with the URL.
If postData is empty, an HTTP GET is used. If postData is present,
the string is split into lines via embedded instances of
the PSEUDO_LINE_DELIMITER
, then POSTed.
args
- list of Strings, beginning with URL; remainder of list
are cookies.agent
- agent string to send (or null to use the system default)postData
- data to send via HTTP POST, if present
IOException
- if I/O problempublic static void main(String[] args)
Usage: java [ options ] URLReader url { cookie... } Options: -Dproxy=<string> - host:port specification for proxy server -Draw - use sockets -Dtext - convert HTML to text -Dtext=1 - convert HTML to text, but leave image references -Dverbose - indicate program actions -Dagent=IE - use IE 5.0 user agent identifier -Dagent=NS - use NS 4.76 user agent identifier -Dagent=<string> - use specified user agent identifier -Dpost=<string> - data for HTTP POST [experimental] Sample invocations: java URLReader "http://www.dell.com/" java URLReader "http://www.aaii.com/stkscrns/archive/" "session=LOEFGMO"
args
- command-line arguments.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
CleanCode Java Libraries | Copyright © 2001-2012 Michael Sorens - Revised 2012.12.10 |