[Tutor] please help

Steven D'Aprano steve at pearwood.info
Sat Mar 22 00:15:39 CET 2014


On Fri, Mar 21, 2014 at 08:31:07PM +1100, Mustafa Musameh wrote:

> Please help. I have been search the internet to understand how to 
> write a simple program/script with python, and I did not do anything. 
> I have a file that look like this
> >ID 1
> agtcgtacgt…
> >ID 2
> attttaaaaggggcccttcc
> .
> .
> .
> in other words, it contains several IDs each one has a sequence of 'acgt' letters
> I need to write a script in python where the output will be, for example, like this 
> > ID 1
> a = 10%, c = 40%,  g=40%, t = 10%
> >ID 2
> a = 15%, c = 35%,  g=35%, t = 15%
> .
> .
> .
> (i mean the first line is the ID and the second line is the frequency of each letter )
> How I can tell python to print the first line as it is and count 
> characters starting from the second line till the beginning of the 
> next '>' and so on


This sounds like a homework exercise, and I have a policy of trying not 
to do homework for people. But I will show you the features you need.

Firstly, explain what you would do if you were solving this problem in 
your head. Write out the steps in English (or the language of your 
choice). Don't worry about writing code yet, you're writing instructions 
for a human being at this stage.

Those instructions might look like this:

Open a file (which file?).
Read two lines at a time.
The first line will look like ">ID 42". Print that line unchanged.
The second line will look line "gatacacagtatta...". Count how 
    many "g", "a", "t", "c" letters there are, then print the
    results as percentages.
Stop when there are no more lines to be read.

Now that you know what needs to be done, you can start using Python for 
it. Start off by opening a file for reading:

f = open("some file")

There are lots of ways to read the file one line at a time. Here's one 
way:

for line in f:
    print(line)


But you want to read it *two* lines at a time. Here is one way:

for first_line in f:
    second_line = next(f, '')
    print(first_line)
    print(second_line)


Here's another way:

first_line = None
while first_line != '':
    first_line = f.readline()
    second_line = f.readline()


Now that you have the lines, what do you do with them? Printing the 
first line is easy. How about the second?


second_line = "gatacattgacaaccggaataccgagta"

Now you need to do four things:

- count the total number of characters, ignoring the newline at the end
- count the number of g, a, t, c characters individually
- work out the percentages of the total
- print each character and its percentage


Here is one way to count the total number of characters:

count = 0
for c in second_line:
    count += 1


Can you think of a better way? Do you think that maybe Python has a 
built-in command to calculate the length of a string?


Here is one way to count the number of 'g' characters:

count_of_g = 0
for c in second_line:
    count_of_g += 1


(Does this look familiar?)

Can you think of another way to count characters? Hint: strings have a 
count method:

py> s = "fjejevffveejf"
py> s.count("j")
3


Now you need to calculate the percentages. Do you know how to calculate 
the percentage of a total? Hint: you'll need to divide two numbers and 
multiply by 100.

Finally, you need to print the results.


Putting all these parts together should give you a solution. Good luck! 
Write as much code as you can, and come back with any specific questions 
you may have.



-- 
Steven



More information about the Tutor mailing list