Extracting data from ython dictionary object

Fri Feb 9 14:32:25 EST 2018

On 09/02/18 18:35, Stanley Denman wrote:
> On Friday, February 9, 2018 at 1:08:27 AM UTC-6, dieter wrote:
>> Stanley Denman <dallasdisabilityattorney at gmail.com> writes:
>>
>>> I am new to Python. I am trying to extract text from the bookmarks in a PDF file that would provide the data for a Word template merge. I have gotten down to a string of text pulled out of the list object that I got from using PyPDF2 module.  I am stuck on now to get the data out of the string that I need.  I am calling it a string, but Python is recognizing as a dictionary object.
>>>
>>> Here is the string:
>>>
>>> {'/Title': '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.:  05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
>>>
>>> What a want is the following to end up as fields on my Word template merge:
>>> MedSourceFirstName: "John"
>>> MedSourceLastName: "Milani"
>>> MedSourceLastTreatment: "05/28/2014"
>>>
>>> If I use keys() on the dictionary I get this:
>>> ['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as keys.  Seems like the key/value pair of a dictionary would translate nicely to fieldname and fielddata for a Word document merge.  Here is my  code so far.
>>
>> A Python "dict" is a mapping of keys to values. Its "keys" method
>> gives you the keys (as you have used above).
>> The subscription syntax ("<some_dict>[<some_key>]"; e.g.
>> "pdf_info['/Title']") allows you to access the value associated with
>> "<some_key>".
>>
>> In your case, relevant information is coded inside the values themselves.
>> You will need to extract this information yourself. Python's "re" module
>> might be of help (see the "library reference", for details).
> 
> Thanks for your response.  Nice to know I am at least on the right path.  Sounds like I am going to have to did in to Regex to get at the test I want.
> 

Maybe using string methods is simpler than a regex.

 >>> data = '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.: 
05/12/2014 - 05/28/2014 (9 pages)'
 >>> bits = data.split(':')
 >>> bits
['1F', '  Progress Notes  Src.', '  MILANI, JOHN C Tmt. Dt.', ' 
05/12/2014 - 05/28/2014 (9 pages)']
 >>> namebits = bits[2].split()
 >>> namebits
['MILANI,', 'JOHN', 'C', 'Tmt.', 'Dt.']
# I'll leave you to grab the names, and strip the comma from the last name.
 >>> start = bits[3].find('- ')
 >>> stop = bits[3].find('(')
 >>> date = bits[3][start + 2: stop].strip()
 >>> date
'05/28/2014'

Apologies for the variable names used, I'm sure that you can think of 
something better :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence