Was this page helpful?

Perl Style Guide

    Style guide

    A style guide, while some people find it constraining, eases your life as a programmer by making it easier to read your co-workers code. If you can more easily read and understand code, you will be forced to re-create the same code less often. Style is also important when using source code control systems such as CVS, RCS, SCCS because it minimizes the amount of diffs that represent purely formatting changes.

    Please contribute your ideas for discussion.

    Documentation

    Header elements

    Every Perl code should contain the following

    1. Standard executable and pragmas
    2. Name of script/package/module
      • use CVS $Id:$ if possible; consider using CVS for anything you may reuse over time
    3. Statement of function
    4. Major Revisions
      • use CVS $Log:$; previously we've had this at the top, I'm thinking it should be at bottom becasue it makes it hard to search for function names

    Standard executable and pragmas

    #!/usr/bin/perl –Tw                   # for web-server scripts
    #!/usr/bin/perl  -w                   # uses system perl; OK for other scripts
    use strict;
    use taint;
    
    • Taint should be used for CGI scripts (use #!perl -T)
    • Scripts should compile without warnings under perl -w (use #!perl -w)
    • Strict should always be used

    Name of script/package/module

    Even though it may be obvious, this can be a help when searching for particular programs. Include the name as a plain text comment or include the cvs keyword $Id:$

    #!/usr/bin/perl -Tw
    
    #------------------------------------------------------------------------------
    # $Id:$           # preferred, or
    # myscript.cgi:
    #
    # description of what this code does
    #
    # usage
    #    myscript.cgi [-h] [-t <d|r>] <sequence_file>
    #------------------------------------------------------------------------------
    use strict;
    <Your code here>
    ...
    <at end of file>
    #------------------------------------------------------------------------------
    # $Log:$           # preferred, or
    # list of revisions
    #------------------------------------------------------------------------------
    

    Statement of function

    Include a concise statement of what the program does. Include any necessary input files and if appropriate what programs create them. Explain any command line switches or options. For scripts that are more complicated, such as CGI scripts that call themselves, provide a description of the important control parameters.

    Function Headers

    Each function should be preceded by a comment block giving the function name, a description of its function and example usage

    #------------------------------------------------------------------------------
    # addPairCoord
    #
    # add edges in pair format to the edge list.  Each element of the @pairs
    # array should be a complete structure in a string. This is similar to addPair except   
    # the fact that it adds fake coordinates to each vertex
    #
    # usage
    #   $dfs->addPair( @pairs );
    #------------------------------------------------------------------------------
    • use some kind of a mark to separate the function documentation from the code.  this makes it much easier to identify the functions in big codes. i suggest a # followed by 78 -.  This also acts as a ruler and helps you know where you should break long lines in your code.

    Usage

    A usage statement should be provided in every function or subroutine. In addition, the header of the file should summarize all of the functions/subroutines/methods in the file. For stand-alone programs, the command line argument -h should return a usage statement. Use mnemonic file names and you shouldn’t need to explain the usage much, but do not hesitate to provide explanations.

    # usage
    #    findseq.pl < sequence.fasta >list_of_sequences.text
    

    Programs should reserve the –h switch for a usage message (which should also be displayed on input errors). Use the variable $USAGE for a usage string

    my $USAGE = “findseq.pl < sequence.fasta >list_of_sequences.text”;
    if ( $opt_h ) {
    	print $USAGE;
    	exit $status;
    }

    Major Revisions

    CVS will keep a revision log for your code. EAch time you commit, the comments you write in CVS will be added if you include the $Log:$ CVS tag (Be careful not to write a CVS tag in the comment, this gets crazy fast).

    # $Log:$
    

    For Perl packages, it is sometimes useful to provide the system variable $VERSION so that programs can check for the version of a package when they include it. This can also be generated from CVS as follows

    my $REVISION = '$Revision: 1.33 $';
    $REVISION =~ s/\$//g;
    my ( $VERSION ) = $REVISION =~ /Revision: ([\d.]*) /;

    HTML pages

    It is important that we can trace every HTML file, and in many cases portions of the file to their source. This can greatly speed up debugging when problems occur.

    Use comments to mark the beginning and ending of the HTML block, and give the location of the source file.

    <!—template/bogus.html start -->
    <!-- $Id:$ -->
    …
    <!—template/bogus.html end -->
    

    CVS keywords: see [1] 
    Some useful cvs keywords

    • $Id:$ - the name of the file without path - use this at the beginning of html pages
    • $Revision:$ - CVS revision
    • $Author:$ - Last editing author
      • use a string like $update = 'Last update $Date:$ by $Author:$'; to get a printable string with update information. You must use single quotes to prevent the perl interpreter interpolating the CVS tags (which look like perl variables)
    • $Date:$ - Last edited date
    • $Log:$ - revision comments - usually this should not be in HTML files, but should always be in CGI scripts.

    Perl Style

    Variables and Functions

    • Always use lexical variables (defined by my), and explicitly pass all variables to subroutines.
    • use underlines in variable names to separate words
    • use upper/lower case in function names to separate words, begin with lowercase
    • Packages begin with a capital letter
    package ProcessText;     # package name begins with capital
    $line_len = 0;           # a  variable
    sub getLineLen           # functions use upper/lower case
    my $DEFAULT_LEN = 0;     # use all uppercase for constant parameters
    • Always use @_ to get subroutine parameters not shift. The former is significantly faster
    • Always include a return statement, even if nothing is actually returned.  Maybe your function should return something, if only a true or flase value indicating it succeeded?
    sub myFunction{
        my ( $param1, $param2 ) = @_;
    ...
        return 1;
    }
    
    # end of myFunction

    Style

    • Indent 4 spaces; use spaces not tabs (your editor will handle this for you)
      if you are using vi/vim, include 'set expandtab' in your .vimrc
      Getting indenting right is critical to readability.
    • foreach my $sequence ( @list ) {
          $sequence_clean = removeJunk( $sequence );
          if ( $sequence =~ /^>/ ) {            # string begins with >
              # probably a FASTA sequence
              ...
          }
      }
    • Use cuddled braces (braces go on the line with the foreach, while, or if that starts the block)
    • foreach my $residue ( @peptide ) {    # this is cuddled braces
          # some code here;
      }
      
      # is preferred over
      
      foreach my $residue ( @peptide ) 
          {    
          # some code here;
          }
    • Align the closing brace with the statement that opened the block, indent the rest of the block
    • Line breaks: Break lines that are more than 80 characters long.  When making multiple logical tests in an if statemen, put each test on its own line.  Where reasonable, align hash elements in a logical way
    • #---------------------------------------------------------------------
      sub update{ 
          my ( $temp_dfs_1, $idf1, $jdf1, $idf2, $jdf2, 
               $translation_ptr, $used_big_row_ptr) = @_;        # long line
      
       if ( $vertex[$v1]{begin} < $vertex[$v2]{begin} &&         # multiple logical tests
            $vertex[$v1]{end}   > $vertex[$v2]{end}      ) {
      
       my $stem_hash = { left1        => $left,                  # complicated hash
                         left2        => $left+1,
                         right1       => $right,
                         right2       => $right+1,
                         vienna_left  => '(',
                         vienna_right => ')',
                       };
      my %triangle_types = (        # big hash
          '000' =>  0, '001' =>  0, '002' =>  1, '010' =>  2, '101' =>  2, '011' =>  2,
          '100' =>  3, '012' =>  4, '102' =>  4, '021' =>  5, '200' =>  5, '020' =>  6,
          '201' =>  6, '022' =>  7, '202' =>  7, '121' =>  8, '210' =>  8, '120' =>  9,
          '211' =>  9, '122' => 10, '212' => 10, '220' => 11, '221' => 11, '222' => 12,
          '110' => 13, '111' => 13, '112' => 14, '00S' => 15, '10S' => 16, '01S' => 16,
          '11S' => 17, '20S' => 18, '02S' => 18, '21S' => 19, '12S' => 19, '22S' => 20,
          );
      
      %genetic_code = ( TCA => 'S', TCC => 'S', TCG => 'S', TCT => 'S',     # another big hash.
                        TTC => 'F', TTT => 'F', TTA => 'L', TTG => 'L',     # following the format used to  
                        TAC => 'Y', TAT => 'Y', TAA => '*', TAG => '*',     # present the table in texts
                        TGC => 'C', TGT => 'C', TGA => '*', TGG => 'W',     # makes it much easier to read and
                                                                            # to verify that it is correct
                        CTA => 'L', CTC => 'L', CTG => 'L', CTT => 'L', 
                        CCA => 'P', CAT => 'H', CAA => 'Q', CAG => 'Q', 
                        CGA => 'R', CGC => 'R', CGG => 'R', CGT => 'R', 
                        ATA => 'I', ATC => 'I', ATT => 'I', ATG => 'M', 
      
                        ACA => 'T', ACC => 'T', ACG => 'T', ACT => 'T', 
                        AAC => 'N', AAT => 'N', AAA => 'K', AAG => 'K', 
                        AGC => 'S', AGT => 'S', AGA => 'R', AGG => 'R', 
                        CCC => 'P', CCG => 'P', CCT => 'P', CAC => 'H', 
      
                        GTA => 'V', GTC => 'V', GTG => 'V', GTT => 'V', 
                        GCA => 'A', GCC => 'A', GCG => 'A', GCT => 'A', 
                        GAC => 'D', GAT => 'D', GAA => 'E', GAG => 'E', 
                        GGA => 'G', GGC => 'G', GGG => 'G', GGT => 'G', 
                    );
    • Include space inside braces/parentheses in logical tests
    • Include space between braces/parentheses and preceding/following text
    • Incude a comment explaining every regular expression
    • Use spaces to line up key/value pairs in hashes
    • Use => instead of , in hashes
    • leave space around operators such as + - / * = eq == < lt > gt etc.
    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.