com.cleancode.net
Class WebPageInspector

java.lang.Object
  extended by com.cleancode.net.WebPageInspector

public class WebPageInspector
extends Object

An interactive web page inspector that separates and identifies the components of a web page, including extracting the text of the page and a limited browser view of the page. This is an tool to analyze the bits and pieces (some might say flotsam and jetsam) comprising a web page, from details about the connection and the URL to the representation of the page as plain text and rendered as HTML. For those familiar with the 7-layer model of networking, I like to think of this as the 8-layer model of a web page. You specify a URL which is then fetched from the web. The connection and the page are analyzed into these components, each available as a separate tab in the program: connection, URL, header, document info, cookies, HTML text, plain text, and rendered web page content. (Example illustrations of each tab is available here.)

WebPageInspector is a GUI application constructed with standard Swing components, enhanced by my own com.cleancode.swing.* modules. A JComboBox is main recipient of user input other than menu commands and buttons. One enters a URL in the box. That URL is then added to the list of choices in that JComboBox for quick repeat selection. Opening the pull-down menu of the JComboBox and selecting an earlier entry also moves that entry to the top. Hence, the pull-down serves as an ordered history list, most recent first. If you wish to load a web page on your local drive, use the open file command in the File menu. This will also be entered into the history list in the pull-down for subsequent use.

Entering a new URL via the keyboard, selecting one via the pull-down box, or just executing the refresh command on the current URL will initiate a fetch and analysis cycle. The progress of the cycle is displayed, along with an elapsed time, in the status bar, just below the JComboBox. The fetch is the same as if a URL was entered in a browser, i.e. the URL is sent over the world wide web and the corresponding web page is returned. The analysis involves taking slices of the web page data and its meta-data to populate each of the display component tables, corresponding to the tabs in the UI.

WebPageInspector uses the standard CleanCode diagnostics facility for logging. This allows you to trace the behavior of the program to any level of detail you desire. See the Diagnostic module for further details. Since WebPageInspector is a GUI application, directing diagnostic output to a log file is the most appropriate. You specify where to put this log file in the configuration file.

Information about the server connection is presented in the first group of tabs. The Connection tab provides details available from the network transaction itself, including the full URL as known by the server. The URL is then broken down into its components on the URL tab, including the protocol (http, https, etc.), the port, the query (the portion following the question mark), and others. The Header tab displays information from the HTTP header of the transaction, i.e. meta-information sent with the web page but not on the web page proper. It includes, for example, the server response (200 OK vs. 404 not found, etc.), the type of the content (text, html, pdf, etc.), the length of the web page, the received cookies, the date of the transaction, the server version, etc. The cookies are further broken down on the Cookie tab, described alter.

The Doc Info tab contains details extracted from the web page proper, including the DTD, the title (from the <title> element), meta data (i.e. specific data from the <meta> elements, as opposed to the HTTP header meta-information), included files (stylesheets, images, javascript, etc.), as well as all links on the page.

The Cookie tab separates out the components of each cookie sent by the server (the received cookies), including the name, value, domain, path, expiration, and security setting. The related Xmit Cookie tab handles transmitted cookies i.e., cookies that you send back to the server. You may choose to send cookies or not with your URL. In the Cookie menu, you'll observe the 3 possibilities: no cookies, entered cookies, or stored cookies (entered and stored cookies refer to the tables on the Xmit Cookies tab). The stored cookies table provides a permanent repository, whereas the entered cookies table serves as a scratch pad. You may (via the cookie menu) use either of these when you fetch a URL. If you wish, for example, to simply echo the same set of cookies you received when you fetch the URL, first copy the received cookies to the entered cookies table (via menu or via button on the Xmit Cookies tab). Then set the cookie mode to entered cookies on the Cookie menu.

You may add, delete, or edit the cookies in the entered cookie table. Since the received cookie table (in the Cookie tab) is overwritten with each fetch of a URL, the stored cookie table is provided as a storage area to, for example, keep a copy of a cookie set before you start editing it. Note that the stored cookie table is not directly editable. To edit stored cookies, you must transfer the cookie set to the entered table, edit, then transfer them back to the stored cookie table.

The stored cookie table is also used for persistence across invocations. That is whatever cookies you have moved to the stored cookie table for the current URL will be available the next time you run WebPageInspector. When you enter a URL in the input field (either typing or selecting from past history in the pull-down) along with fetching the URL and getting new received cookies, the stored cookie table will be reloaded with the last cookies you specifically stored there. So this allows you to, for example, compare cookies between one fetch and another, even if it was something you saved last month. If you wish to scan through the cookies that you have stored for various URLs in your history, switch to offline mode, then use the pull-down in the input field to switch to each URL (or Ctrl-N and Ctrl-P to scroll through them). Offline mode brings the selected URL to the top, and loads the stored cookies (if any), but does not fetch the contents of the URL from the net.

The next group of tabs displays the contents of the web page itself, in various formats. The HTML tab displays the raw HTML, the source code of the web page. The Text and Content tabs, on the other hand, display variations of the rendering of the web page.

The Text tab gives a text-only representation of the web page, i.e. stripping out all HTML encoding, and adding minimal formatting that a text-only display could support (i.e. tabs, spaces, and returns).

The Content tab is a mini-browser using Java's native support for HTML display. Unfortunately, this is not terribly robust. The web pages I have tried just don't display with very clean formatting. I have, at least, added hyperlink support on the page with a chunk of code I found from Sun, so links on the page will work, and will add to the JComboBox and history list, just as if you had typed in the URL.

There are several files WebPageInspector users should be aware of. First, the configuration file CONFIG_FILE must be stored in the current directory (from which you execute the program). This configuration allows you to specify diagnostic settings as well as the location of the other files used by WebPageInspector. The history file--used to store URLs and cookies that you visit, for later recall--is by default stored in StorageMgr.DEFAULT_WPI_HISTORY_FILE, but you may specify a different location by setting the StorageMgr.HISTORY_FILE_PARAM parameter. Note that the history file is automatically created/saved when you exit the program, but you may force a save at any time via the Save History command on the File menu. Finally, if you enable diagnostics to output to a log file, the log directory is specified via the LOG_DIR parameter in the configuration file. The default is Diagnostic.DEFAULT_LOG_DIR.

Since:
CleanCode 0.9
Version:
$Revision: 9 $
Author:
Michael Sorens

Field Summary
static String CONFIG_FILE
          WebPageInspector configuration file name.
 String VERSION
          Current version of this class.
 
Constructor Summary
WebPageInspector(JFrame frame)
          Creates an instance of a WebPageInspector.
 
Method Summary
 void doAbout(ActionEvent event)
          Command: ABOUT -- describe program.
 void doAddCookie(ActionEvent event)
          Command: ADD-COOKIE -- add row for cookie.
 void doClear(ActionEvent event)
          Command: CLEAR -- Erase all URL history.
 void doClearCookies(ActionEvent event)
          Command: CLEAR-COOKIES -- erase entered cookie table.
 void doCopyRecvCookies(ActionEvent event)
          Command: COPY-RECV-COOKIES -- copy received cookies.
 void doCopyStoredCookies(ActionEvent event)
          Command: COPY-STORED-COOKIES -- copy stored cookies.
 void doCopyTab(ActionEvent event)
          Command: COPY-TAB -- copy tab contents to clipboard.
 void doDelCookie(ActionEvent event)
          Command: DELETE-COOKIE -- delete row for cookie.
 void doDeleteUrl(ActionEvent event)
          Command: DELETE-URL -- delete current URL from JComboBox.
 void doExit(ActionEvent event)
          Command: EXIT -- exit.
 void doOpenFile(ActionEvent event)
          Command: OPEN-FILE -- open local file.
 void doPasteCookie(ActionEvent event)
          Command: PASTE-COOKIE -- paste cookie on clipboard into 'entered' table.
 void doRefresh(ActionEvent event)
          Command: REFRESH -- fetch current URL again.
 void doSaveHistory(ActionEvent event)
          Command: SAVE-HISTORY -- save history to a file.
 void doSaveHtml(ActionEvent event)
          Command: SAVE-HTML-FILE -- save string to a file.
 void doSaveText(ActionEvent event)
          Command: SAVE-TEXT-FILE -- save string to a file.
 void doSetEnteredCookieMode(ActionEvent event)
          Command: SET COOKIE MODE -- entered.
 void doSetNoCookieMode(ActionEvent event)
          Command: SET COOKIE MODE -- none.
 void doSetOfflineStatus(ActionEvent event)
          Command: OFFLINE -- toggle offline state.
 void doSetStoredCookieMode(ActionEvent event)
          Command: SET COOKIE MODE -- stored.
 void doShowNextTab(ActionEvent event)
          Command: NEXT-TAB -- show next tab.
 void doShowNextUrl(ActionEvent event)
          Command: NEXT-URL -- show next URL in JComboBox.
 void doShowPrevTab(ActionEvent event)
          Command: PREV-TAB -- show previous tab.
 void doShowPrevUrl(ActionEvent event)
          Command: PREV-URL -- show previous URL in JComboBox.
 void doStoreCookies(ActionEvent event)
          Command: STORE-COOKIES -- store cookies.
static void main(String[] args)
          Main routine to operate WebPageInspector as a standalone GUI application.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONFIG_FILE

public static final String CONFIG_FILE
WebPageInspector configuration file name. The configuration file must be in the current directory at the time you execute the program.

See Also:
Constant Field Values

VERSION

public final String VERSION
Current version of this class.

Constructor Detail

WebPageInspector

public WebPageInspector(JFrame frame)
Creates an instance of a WebPageInspector.

Parameters:
frame - JFrame object in which to build GUI.
Method Detail

doStoreCookies

public void doStoreCookies(ActionEvent event)
Command: STORE-COOKIES -- store cookies.

Parameters:
event - actionable event

doCopyStoredCookies

public void doCopyStoredCookies(ActionEvent event)
Command: COPY-STORED-COOKIES -- copy stored cookies.

Parameters:
event - actionable event

doCopyRecvCookies

public void doCopyRecvCookies(ActionEvent event)
Command: COPY-RECV-COOKIES -- copy received cookies.

Parameters:
event - actionable event

doDelCookie

public void doDelCookie(ActionEvent event)
Command: DELETE-COOKIE -- delete row for cookie.

Parameters:
event - actionable event

doAddCookie

public void doAddCookie(ActionEvent event)
Command: ADD-COOKIE -- add row for cookie.

Parameters:
event - actionable event

doClearCookies

public void doClearCookies(ActionEvent event)
Command: CLEAR-COOKIES -- erase entered cookie table.

Parameters:
event - actionable event

doSaveHistory

public void doSaveHistory(ActionEvent event)
Command: SAVE-HISTORY -- save history to a file.

Parameters:
event - actionable event

doOpenFile

public void doOpenFile(ActionEvent event)
Command: OPEN-FILE -- open local file.

Parameters:
event - actionable event

doSaveHtml

public void doSaveHtml(ActionEvent event)
Command: SAVE-HTML-FILE -- save string to a file.

Parameters:
event - actionable event

doSaveText

public void doSaveText(ActionEvent event)
Command: SAVE-TEXT-FILE -- save string to a file.

Parameters:
event - actionable event

doRefresh

public void doRefresh(ActionEvent event)
Command: REFRESH -- fetch current URL again.

Parameters:
event - actionable event

doShowNextTab

public void doShowNextTab(ActionEvent event)
Command: NEXT-TAB -- show next tab.

Parameters:
event - actionable event

doShowPrevTab

public void doShowPrevTab(ActionEvent event)
Command: PREV-TAB -- show previous tab.

Parameters:
event - actionable event

doShowNextUrl

public void doShowNextUrl(ActionEvent event)
Command: NEXT-URL -- show next URL in JComboBox.

Parameters:
event - actionable event

doShowPrevUrl

public void doShowPrevUrl(ActionEvent event)
Command: PREV-URL -- show previous URL in JComboBox.

Parameters:
event - actionable event

doSetOfflineStatus

public void doSetOfflineStatus(ActionEvent event)
Command: OFFLINE -- toggle offline state.

Parameters:
event - actionable event

doSetNoCookieMode

public void doSetNoCookieMode(ActionEvent event)
Command: SET COOKIE MODE -- none.

Parameters:
event - actionable event

doSetEnteredCookieMode

public void doSetEnteredCookieMode(ActionEvent event)
Command: SET COOKIE MODE -- entered.

Parameters:
event - actionable event

doSetStoredCookieMode

public void doSetStoredCookieMode(ActionEvent event)
Command: SET COOKIE MODE -- stored.

Parameters:
event - actionable event

doClear

public void doClear(ActionEvent event)
Command: CLEAR -- Erase all URL history.

Parameters:
event - actionable event

doCopyTab

public void doCopyTab(ActionEvent event)
Command: COPY-TAB -- copy tab contents to clipboard.

Parameters:
event - actionable event

doPasteCookie

public void doPasteCookie(ActionEvent event)
Command: PASTE-COOKIE -- paste cookie on clipboard into 'entered' table.

Parameters:
event - actionable event

doDeleteUrl

public void doDeleteUrl(ActionEvent event)
Command: DELETE-URL -- delete current URL from JComboBox.

Parameters:
event - actionable event

doExit

public void doExit(ActionEvent event)
Command: EXIT -- exit.

Parameters:
event - actionable event

doAbout

public void doAbout(ActionEvent event)
Command: ABOUT -- describe program.

Parameters:
event - actionable event

main

public static void main(String[] args)
                 throws IOException
Main routine to operate WebPageInspector as a standalone GUI application.

Parameters:
args - command-line settings to override configuration file.
Throws:
IOException - if default configuration file cannot be read


CleanCode Java Libraries Copyright © 2001-2012 Michael Sorens - Revised 2012.12.10 Get CleanCode at SourceForge.net. Fast, secure and Free Open Source software downloads