Difference between revisions of "Perl"

From Colettapedia
Jump to navigation Jump to search
Line 22: Line 22:
 
* my $tree = HTML::TreeBuilder->new_from_file($file_path_string);
 
* my $tree = HTML::TreeBuilder->new_from_file($file_path_string);
 
* $tree->delete();
 
* $tree->delete();
 +
===Traversing===
 +
* $e->tag = get element's tag name
 +
* $e->parent = get element's parent element
 +
* $e->attr('name') = get value of element's 'name' attribute
 
* $h->descendents() vs $h->lineage()
 
* $h->descendents() vs $h->lineage()
 
* $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names.
 
* $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names.
 
* $h->look_down( ...criteria... ) = THE BIG ONE
 
* $h->look_down( ...criteria... ) = THE BIG ONE
 +
** looks down at the subtree starting at the given object ($h1), looking for elements that meet criteria you provide.
 +
** arg list is (key, value, key, value, ...)
 +
** if called in scalar (or $) context, gets the first element that matches and quits
 +
** if called in the vector context, returns all matching elements
 
<pre>my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });</pre>
 
<pre>my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });</pre>
 +
** If listing more than one key/value pair, there is an implicit AND. Have to use a sub if you need an "OR" operation.
 +
===Extracting Data===
 +
* $h1->as_text = returns a string that contains all the text bits that are children (or otherwise descendants) of the given node.
 +
* $e->content_list = an array of all the elements, text and otherwise
 +
* $h1->as_HTML = return the current HTML element with all it descendants
 +
* Use index notation to pull out elements that have a certain numbered position
 +
<pre>my $col3  = ( $row2−>look−down('_tag', 'td')  )[2];</pre>

Revision as of 20:57, 23 November 2009

Using Modules

  • perldoc perllocal = shows what modules have been installed via cpan or manually on the current system
  • get cpan if necessary and install (may also need Test::Simple and Digest::SHA to aid in downloading and checking of modules.
  • sudo cpan -i Foo::Bar (get name of Foo::Bar from cpan.org)
  • perldoc Foo::Bar = see user contributed perl documentation (man pages) of the module you just downloaded

Executables

  • cpan
  • perldoc = Look up Perl documentation in Pod format.
    • -f perlfunc = look up specific builtin functions, ex: perldoc -f sprintf

Running Perl "perlrun"

  • perl -V = Summary of perl5 configuration.
  • perl -d and -D = debug options

Syntax

  • pass command line arguments using @ARGV, access using $ARGV[n], count with $#ARGV
  • indicate end of program using the "__END__" token
  • specify certain version of perl by statement "use 5.005_54;" or similar

HTML::Tree module

  • my $tree = HTML::TreeBuilder->new_from_file($file_path_string);
  • $tree->delete();

Traversing

  • $e->tag = get element's tag name
  • $e->parent = get element's parent element
  • $e->attr('name') = get value of element's 'name' attribute
  • $h->descendents() vs $h->lineage()
  • $h->find( 'tag', ... ) = returns a list of elements at or under $h that have any of the specified tag names.
  • $h->look_down( ...criteria... ) = THE BIG ONE
    • looks down at the subtree starting at the given object ($h1), looking for elements that meet criteria you provide.
    • arg list is (key, value, key, value, ...)
    • if called in scalar (or $) context, gets the first element that matches and quits
    • if called in the vector context, returns all matching elements
my @wide_pix_images = $h->look_down( "_tag", "img", "alt", "pix!", sub { $_[0]->attr(’width’) > 350 });
    • If listing more than one key/value pair, there is an implicit AND. Have to use a sub if you need an "OR" operation.

Extracting Data

  • $h1->as_text = returns a string that contains all the text bits that are children (or otherwise descendants) of the given node.
  • $e->content_list = an array of all the elements, text and otherwise
  • $h1->as_HTML = return the current HTML element with all it descendants
  • Use index notation to pull out elements that have a certain numbered position
my $col3  = ( $row2−>look−down('_tag', 'td')   )[2];