CleanCode Perl Libraries |
Home | Perl | Java | PowerShell | C# | SQL | Index | Tools | Download | What's New |
Multi-Lingual Library | Maintainability | ||||||||||||
![]() |
Perl | ![]() |
Java | ![]() |
JavaScript | ![]() |
Certified Class |
![]() |
Testable Class |
![]() |
Standalone Mode |
![]() |
Diagnostic Enabled |
slice - Return a slice of one or more files specified by pattern and offset.
slice options files
--bodyTag | --nobodyTag
Adds html and body tag brackets around the extracted text, i.e. <html><body>...</body></html>
.
--titleTag=string
String used to generate a title
tag and an h1
tag. Requires the bodyTag option to be used also. Changes the bracketing tags to: <html><head><title>...</title></head><body><h1>...</h1>...</body></html>
.
--startText=string
String (typically opening HTML fragment) printed preceding each sliced file. See the note on interpolated text below.
--middleText=string
String (typically HTML row/cell tag fragments) printed between each pair of sliced files. Not printed if only one file to slice. See the note on interpolated text below.
--endText=string
String (typically closing HTML fragment) printed following each sliced file. See the note on interpolated text below.
--startPat=pattern
Start extraction with first occurrence of pattern.
--stopPat=pattern
Stop extraction with first occurrence of pattern. If omitted, or not found, extracts through end of file.
--startAdj=[!]pattern | integer
If a pattern, adjusts starting line determined by startPat by searching forward (or backward with ! prefix). If a number, adjusts the starting line by the number (positive or negative).
--stopAdj=[!]pattern | integer
If a pattern, adjusts ending line determined by stopPat by searching forward (or backward with ! prefix). If a number, adjusts the ending line by the number (positive or negative).
--colPattern=pattern
After slicing by rows via the various start and stop options, you may additionally slice by columns by specifying a pattern to match within each line. If omitted, entire line is returned as part of the extraction. If included, you must include exactly one subexpression group (with parentheses) to grab a piece of text; otherwise, you'll just get a count of what was matched. If the pattern fails, the entire line is skipped (i.e. you do not get the original line, nor a blank line--you get no line!).
--verbose | --noverbose
If true, prints info about matched line numbers.
One or more files to slice. If no file specified, reads from STDIN.
Perl5.005, Getopt::Long, Data::Handy, Array::Slice
Slice
extracts a piece of a text file (or a set of files). It was named after the analogous array slice concept in Perl. If you think of a text file as an array of lines, slice
returns an array slice of that array, but rather than specifying by line number, you specify by pattern (i.e. regular expression).
startPat
and stopPat
are the main selection patterns to define a range from a file. Both of them match the first occurrence of their respective patterns in the file. You may refine the range, though, with startAdj
and endAdj
. With these, you may offset the range either forward or backward. startAdj
and endAdj
may be patterns or signed integers. A pattern p will move the range boundary forward; while !p will move the range boundary backward (i.e. prefix the pattern with a "!"). Similarly, a positive integer moves the boundary forward; a negative integer moves it backward. (All of these movements are by line.)
When this program is used with a web page, one would generally lose the proper HTML structure by extracting a middle section. The command-line options bodyTag, titleTag, startText, middleText, and endText provide some correction for this.
The startText, middleText, and endText command-line options are subject to text interpolation as follows. Instances of \n
and \t
are converted to actuals newlines and tabs, respectively.
<FILE_PATH>
is replaced with the full current file specification.
<FILE_NAME>
is replaced with the current file name (i.e. no path).
<FILE_BASE>
is replaced with the base name (i.e. no path or extension).
Example for market guide screen:
% slice.pl --bodyTag \
--startText="<table>\n" --endText="\t</td></tr>\n</table>\n" \
--startPat="Total Match" --stopPat="colspan=10" \
--startAdj=!tr --stopAdj="colspan=10" < input.htm
Example for series of pages from www.entertainmentpublications.com.au
stored in files p01.htm
through p14.htm
:
% perl -I/mydocu~1/ms/devel/perl slice.pl
--startText="<table>\n" --endText="</table>\n" --bodyTag
--titleTag="Melbourne E-Book Listing"
--startPat="search results" --stopPat=zone --startAdj=9 --stopAdj=-7 p*.htm
None
Michael Sorens
$Revision: 1178 $ $Date: 2011-10-31 14:26:51 -0700 (Mon, 31 Oct 2011) $
CleanCode 0.9
Hey! The above document had some coding errors, which are explained below:
=back doesn't take any parameters, but you said =back -- end of SYNOPSIS section
Home | Perl | Java | PowerShell | C# | SQL | Index | Tools | Download | What's New |
CleanCode Perl Libraries | Copyright © 2001-2013 Michael Sorens - Revised 2013.06.30 |