
  1. Print Strings

    Write a series of print statements that returns the following (include a blank line between each answer):

    1. Post hoc ergo propter hoc
    2. What’s up with scientists using all of this snooty latin?
    3. 'atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc'. Do this using the * operator to make 15 copies of 'atgc'.
    4. Darwin’s “On the origin of species” is a seminal work in biology.
    Expected outputs for Print Strings
  2. string Functions

    Use functions from the string module or from base Python to print the following strings.

    1. 'species' in all capital letters
    2. 'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg' with all of the occurrences of 'a' replaced with 'A'
    3. ”    Thank goodness it’s Friday” without the leading white space (i.e., without the spaces before Thank)
    4. The number of 'a's in 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'.
    5. Print the length of this dna sequence 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'
    Expected outputs for string Functions
  3. string Methods

    Use string methods to print the following strings. Remember that methods work by adding the function to the end of the object name using a ., like

    mystring = 'Hello World'
    print mystring.lower()
    1. 'species' in all capital letters
    2. 'gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg' with all of the occurences of 'a' replaced with 'A'
    3. "    Thank goodness it's Friday" without the leading white space (i.e., without the spaces before "Thank")
    4. The number of 'a's in 'gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg'.
    Expected outputs for string Methods
  4. Long Strings

    For the DNA sequence below determine the following properties and print them to the screen (you can cut and paste the following into your code, it’s a lot longer than you can see on the screen, but just select the whole thing and when you paste it into Python you’ll see what it looks like):


    1. How many occurrences of 'gagg' occur in the sequence?
    2. What is the starting position of the first occurrence of 'atta'? Report the actual base pair position as a human would understand it.
    3. How long is the sequence?
    4. What is the GC content of the sequence? The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs) Print the result as “The GC content of this sequence is XX.XX%” where XX.XX is the actual GC content. Do this using a formatted string.
    Expected outputs for Long Strings
  5. GC Content 1

    A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype = str to tell loadtxt() that the data is composed of strings.

    Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as “The GC content of the sequence is XX.XX%” where XX.XX is the actual GC content.

    Expected outputs for GC Content 1
  6. Split Strings

    You have a data file with a single taxonomy column in it. This column contains the family, genus, and species for a single taxonomic group. You need to figure out how to split that information into separate values for family, genus, and species. To solve the basic problem take a single example string, 'Ornithorhynchidae Ornithorhynchus anatinus', split it into three separate strings using a Python command, and then print the family, genus, and species, each on a separate line.

    Expected outputs for Split Strings

