Was this page helpful?

2014 Genomics (AGRY60000)


    Syllabus

    See the syllabus.  Only the first section should be considered reliable at this time.


    Questions and Answers

    You can add questions and answers here.  Feel free to amplify the answers if you try them and feel they need more information.

    1.  I successfully installed Secure CRT plus Secure FX on my laptop. Unfortunately, I am unable to connect to the server. Attached is the screenshot of the error that I am receiving.
      I added a step-by-step example of setting up a new session with SecureCRT and connecting in the Windows section below

    2. I was trying to see if I could get from you some of those nice .bash_profile things you were showing in class but don’t have written down how to access them.  Where do you have them hidden?  
      See my example .bash_profile or the Unix Tutorials section for more general information.

    3. I was able to download the SecureCRT program and get it set up, and I was also able to log in to scholar.rcac.purdue.edu server. I "found" my scratch directory, but I think I wrote down the commands for accessing the data file incorrectly or incompletely.  Can you tell me how to find the data file?
      /scratch/carter/m/mgribsko/monascus/data/Monpu1.genome.rawReads.fastq.  See also the sequence data page.  To see the first part of the file type 

      more /scratch/carter/m/mgribsko/monascus/data/Monpu1.genome.rawReads.fastq
    4. I was able to get into my 'scratch' directory (/scratch/carter/n/nalbrigh) but when I try to access the monascus file it says there is no such file or directory. When I $ls -l it shows that my diretory is empty, nothing in it. Should I be seeing sub-directories and monascus or am I in the wrong directory?You will only see files or a subdirectory if you have created them with an editor (files) or mkdir (directory)

    5. How do I create and edit new pages on the wiki?
      See the help page and its included links.  It's easy, try it on your personal page.

    6. When I start secureCRT it says that the evaluation copy has expired and I have to enter a license. What do i do?
      When you downloaded SecureCRT, you should have received an email with the license information (search for "Van Dyke".  Go to "Enter license data" in the help pulldown menu in SecureCrt.

    7.  When I access the filtered reads file (Monpu1.genome.filteredReads.fastq.gz) it shows a bunch of crazy garbage i.e.
      : ?w?QKY??Q?r????u??ߚVv???????]????־??????i???[???m?1????|?s?? ?>???W????R??n?˽???W?Hy
      ?>#?w??8}?????g??X^??X?????!?/?KRĿ?OI?????c?z?????v??|?~].???޻??m??r?m???gu?W???
      This file is a compressed file.  The .gz suffix indicates a file compressed by Gzip.  You must uncompress it to use it.

    8. Tried to run seqyclean with the r1 and r2 fastq files and the job killed itself after 30 min telling me to submit using qsub.  I tried fooling with k-mer size thinking that might help but it does not.  The link explaining how to do qsub has no page attached to it anymore.  How do I submit a batch job and recover the result?
      The frontend nodes limit you to 30 min cpu.  30 min is also the default on compute nodes unless you use something like "-l walltime=24:00:00".  See this intro, and the sample job file below.  Submit the job (start it running) with the command "qsub <jobfile_name>".  Use qstat to check that it is running (or done).

      #!/bin/sh -l
      
      #PBS -N seqyclean_monpu1
      #PBS -q scholar
      #PBS -l nodes=1:ppn=16
      #PBS -l walltime=168:00:00
      
      module load seqyclean
      cd $PBS_O_WORKDIR
      pwd
      
      seqyclean -t 16 \
      -1 ../../data/Monpu1.genome.rawReads.r1.fq \
      -2 ../../data/Monpu1.genome.rawReads.r2.fq \
      -v adapter.fa \
      -qual 20 10 \
      -minimum_read_length 30 \
      -o Monpu1.genome.rawReads.seqyclean.stats
    9. Our group was wondering what exactly you wanted for thursday? We gathered that you wanted us to report on our best assembly and why we chose that. Is there anything else in particular that we should be prepared to report?
      A brief summary of the methods used including assembly, scaffolding, gap-filling
      N50
      longest contig/scaffold
      total number of bases in assembly
      if you have it, fraction of reads that map to assembly


    Computer Information

    RCAC servers - scholar.rcac.purdue.edu

    (apparently carter.rcac.purdue.edu does not work unless you have a separate carter account)

    RCAC operates a large number of servers with an aggregate of tens of thousands of CPUs.  For this course we will use a server called scholar.rcac.purdue.edu, or simply scholar.  We may use other servers at certain times, but most of our work will be done on scholar.  Scholar is actually part of another server called Carter, you can find out more about this server in the RCAC  Carter manual.  The Carter manual is long, for a quicker introduction you can look at some of the material I posted for my research group.

    Connecting from Windows

    On a PC you will need a terminal emulator program.  There are many options, here are two:

    • Secure CRT - available for free from ITAP.  See the webpage.  Be sure to get the SecureCRT + SecureFX combo pack.  Here is a step-by-step example of connecting with secureCRT.
    • putty - putty if a free terminal emulator. See the webpage

     

    My preference is SecureCRT because I find it easier to transfer files to and from the server (this is what SecureFX does)

    Connecting from Mac or Linux

    If you use a MacOS or Linux computer you do not need a terminal emulator, you have one already built in.  MacSSH is another free SSH client.  You can start a local terminal window from "Applications->Utilities". Log in using 

    ssh username@scholar.rcac.purdue.edu

    Unix

    Some UNIX commands you will need to know.  Much of this is covered in the Part 3, Essential Unix in the Unix and Perl to the rescue book.

    • ls - show the files in the current directory
      -a show all files (including those whose name begins with .)
      -l show files in long format
      -t sort files by creation date
      these can be combined, e.g. ls -alt
    • cd <directory_name> - move to the specified directory
    • mkdir <directory_name> - create a new subdirectory with the specified name
    • more <filename> - type a file to the screen page-by-page.  Use space to move ahead one page.
    • pwd - show what directory I am currently in (AKA current working directory)

     

    • wc <file_name> = count the numer of lines, words, and paragraphs in the specified file.  We usually only care about the number of lines
    • grep <search_pattern> <file_name> = grep is used to find particular words of strings of letters in a file.  You might use this to confirm a specific read is present, or to find a specific sequence in a fastq file.
    • ln -s <symbolic_name> <real_file_name> - creates a symbolic link.  a symbolic link looks like a file in you directory, but the file is really somewhere else.  Saves typing long paths.

     

    There is lots to learn about unix, but you don't need to learn it all.  See this brief introduction for more information.


    Editing Files on UNIX

    Option 1.  Edit the files on your PC or Mac using an editor you are used to.
    Option 2.  Edit files on the server using a UNIX Editor such as


    RCAC

    When you you first log in you will be in your home directory.  Your home directory is small, too small to really do any work.  The first thing you should do is move to your scratch directory.  you scratch directory is stored in the symbol $RCAC_SCRATCH, or will be something like /scratch/carter/your_initial/your_name where your_initial indicates the first letter of your username, and your_name would be your user name.  for example

    cd $RCAC_SCRATCH
    cd /scratch/carter/m/mgribsko

    There are a couple of other RCAC commands that are useful.

    • myquota - shows how much disk space you are using and have available (only works on front-end)
    • qsub <jobfile_name> - is used to start a job running.  You must use qsub for any job that runs over 30 min.
    • qlist - shows how many cpus are available on queues that you have access to (we are using the queue called scholar)
    • qstat -u - shows the process of all of your queued jobs on all queues

    See also Getting Started

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 3

    FileSizeDateAttached by 
     securecrt_problem.pdf
    securecrt error screen
    72.26 kB08:27, 27 Aug 2014gribskovActions
     securecrt_step_by_step.pdf
    step by step example of connecting with SecureCRT
    266.83 kB08:19, 27 Aug 2014gribskovActions
     syllabus_rough. v2-1.xlsx
    Current syllabus 8/25
    13.17 kB15:17, 25 Aug 2014gribskovActions
    Viewing 1 of 1 comments: view all
    Now am connected, thanks uploading the securecrt procedure, i was stuck
    Patrick
    Posted 21:37, 27 Aug 2014
    Viewing 1 of 1 comments: view all
    You must login to post a comment.