[Tutor] Building dictionary from large txt file

Tue Jul 26 20:33:39 EDT 2022

On 26/07/2022 21:58, bobx ander wrote:
> Hi all,
> I'm trying to build a dictionary from a rather large file of following
> format after it has being read into a list(excerpt from start of list below)
> --------
>
> Atomic Number = 1
>     Atomic Symbol = H
>     Mass Number = 1
>     Relative Atomic Mass = 1.00782503223(9)
>     Isotopic Composition = 0.999885(70)
>     Standard Atomic Weight = [1.00784,1.00811]
>     Notes = m
> --------
>
> My goal is to extract the content into a dictionary that displays each
> unique triplet as indicated below
> {'H1': {'Z': 1,'A': 1,'m': 1.00782503223},
>               'D2': {'Z': 1,'A': 2,'m': 2.01410177812}
>                ...} etc

Unfortunately to those of us unfamiliar with your data that is as clear
as mud.

You refer to a triplet but your sample file entry has 7 fields, some of
which
have multiple values. Where is the triplet among all that data?

Then you show us a dictionary with keys that do not correspond to any of
the fields in your data sample. How do the fields correspond - the only
"obvious" one is the mass which evidently corresponds with the key 'm'.

But what are H1 and D2? Another file record or some derived value from
the record shown above? Similarly for Z, A and m. How do they relate to
the data?

You need to specify your requirement more explicitly for us to be sure we
are giving valid advice.

> My code that I have attempted is as follows:
>
> filename='ex.txt'
>
> afile=open(filename,'r') #opens the file
> content=afile.readlines()
> afile.close()

You probably don't need to read the file into a list if you
are going to process it line by line. Just read the lines
from the file and process them as you go.

> isotope_data={'Z':0,'A':0,'m':0}#start to create subdictionary for
> each case of atoms with its unique keys and values
> for line in content:
>     data=line.strip().split()
>
>     if len(data)<1:
>         pass
>     elif data[0]=="Atomic" and data[1]=="Number":
>         atomic_number=data[3]
>
>
>      elif data[0]=="Mass" and data[1]=="Number":
>         mass_number=data[3]
>
>
>
>     elif data[0]=="Relative" and data[1]=="Atomic" and data[2]=="Mass":
>         relative_atomic_mass=data[4]
>
Rather than split the line then compare each field it might be easier
(and more readable) to compare the full strings using the startswith()
method then split the string:

for line in file:

     if line.startwith("Atomic Number"):

         atomic_number = line.strip().split()[3]

    etc...

> isotope_data['Z']=atomic_number
> isotope_data['A']=mass_number
> isotope_data['A']=relative_atomic_mass
> isotope_data
>
> the output from the programme is only
>
> {'Z': '118', 'A': '295', 'm': '295.21624(69#)'}
>
> I seem to be owerwriting each dictionary 

Yes, you never detect the end of a record - you never explain how records
are separated in the file either!

You need something like

master = []   # empty dict.

for line in file:

       if line.startswith("Atomic Number")

           create variable....

      if line.startswith(....):....etc

       if <record separator detected>   # we don't know what this is...

             # save variables in a dictionary

             record = { key1:variable1, key2:variable2....}

             # insert dictionary to master dictionary

             master[key] = record

How you generate the keys is a mystery to me but presumably you know.

You could write the values directly into the master dictionary if you
prefer.

Also note that you are currently storing strings. If you want the
numeric data
you will need to convert it with int() or float() as appropriate.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos