Difference between revisions of "Perl"
Jump to navigation
Jump to search
Line 22: | Line 22: | ||
* my $tree = HTML::TreeBuilder->new_from_file($file_path_string); | * my $tree = HTML::TreeBuilder->new_from_file($file_path_string); | ||
* $tree->delete(); | * $tree->delete(); | ||
+ | ===Traversing=== | ||
+ | * $e->tag = get element's tag name | ||
+ | * $e->parent = get element's parent element | ||
+ | * $e->attr('name') = get value of element's 'name' attribute | ||
* $h->descendents() vs $h->lineage() | * $h->descendents() vs $h->lineage() | ||
* $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names. | * $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names. | ||
* $h->look_down( ...criteria... ) = THE BIG ONE | * $h->look_down( ...criteria... ) = THE BIG ONE | ||
+ | ** looks down at the subtree starting at the given object ($h1), looking for elements that meet criteria you provide. | ||
+ | ** arg list is (key, value, key, value, ...) | ||
+ | ** if called in scalar (or $) context, gets the first element that matches and quits | ||
+ | ** if called in the vector context, returns all matching elements | ||
<pre>my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });</pre> | <pre>my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });</pre> | ||
+ | ** If listing more than one key/value pair, there is an implicit AND. Have to use a sub if you need an "OR" operation. | ||
+ | ===Extracting Data=== | ||
+ | * $h1->as_text = returns a string that contains all the text bits that are children (or otherwise descendants) of the given node. | ||
+ | * $e->content_list = an array of all the elements, text and otherwise | ||
+ | * $h1->as_HTML = return the current HTML element with all it descendants | ||
+ | * Use index notation to pull out elements that have a certain numbered position | ||
+ | <pre>my $col3 = ( $row2−>look−down('_tag', 'td') )[2];</pre> |
Revision as of 20:57, 23 November 2009
Contents
Using Modules
- perldoc perllocal = shows what modules have been installed via cpan or manually on the current system
- get cpan if necessary and install (may also need Test::Simple and Digest::SHA to aid in downloading and checking of modules.
- sudo cpan -i Foo::Bar (get name of Foo::Bar from cpan.org)
- perldoc Foo::Bar = see user contributed perl documentation (man pages) of the module you just downloaded
Executables
- cpan
- perldoc = Look up Perl documentation in Pod format.
- -f perlfunc = look up specific builtin functions, ex: perldoc -f sprintf
Running Perl "perlrun"
- perl -V = Summary of perl5 configuration.
- perl -d and -D = debug options
Syntax
- pass command line arguments using @ARGV, access using $ARGV[n], count with $#ARGV
- indicate end of program using the "__END__" token
- specify certain version of perl by statement "use 5.005_54;" or similar
HTML::Tree module
- my $tree = HTML::TreeBuilder->new_from_file($file_path_string);
- $tree->delete();
Traversing
- $e->tag = get element's tag name
- $e->parent = get element's parent element
- $e->attr('name') = get value of element's 'name' attribute
- $h->descendents() vs $h->lineage()
- $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names.
- $h->look_down( ...criteria... ) = THE BIG ONE
- looks down at the subtree starting at the given object ($h1), looking for elements that meet criteria you provide.
- arg list is (key, value, key, value, ...)
- if called in scalar (or $) context, gets the first element that matches and quits
- if called in the vector context, returns all matching elements
my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });
- If listing more than one key/value pair, there is an implicit AND. Have to use a sub if you need an "OR" operation.
Extracting Data
- $h1->as_text = returns a string that contains all the text bits that are children (or otherwise descendants) of the given node.
- $e->content_list = an array of all the elements, text and otherwise
- $h1->as_HTML = return the current HTML element with all it descendants
- Use index notation to pull out elements that have a certain numbered position
my $col3 = ( $row2−>look−down('_tag', 'td') )[2];