TypeError: expected string or Unicode object, NoneType found

Sat May 19 13:47:26 EDT 2018

On 5/19/2018 12:47 PM, Peter Otten wrote:
> subhabangalore at gmail.com wrote:
> 
>> I wrote a small piece of following code
>>
>> import nltk
>> from nltk.corpus.reader import TaggedCorpusReader
>> from nltk.tag import CRFTagger

To implement Peter's suggestion:

>> def NE_TAGGER():

def tagger(stop):

>>      reader = TaggedCorpusReader('/python27/', r'.*\.pos')
>>      f1=reader.fileids()
>>      print "The Files of Corpus are:",f1
>>      sents=reader.tagged_sents()
>>      ls=len(sents)
>>      print "Length of Corpus Is:",ls
>>      train_data=sents[:300]
>>      test_data=sents[301:350]
> 
> Offtopic: not that sents[300] is neither in the training nor in the test
> data; Python uses half-open intervals.

       train_data=sents[:stop]
       test_data=sents[stop:max+50]

>>      ct = CRFTagger()
>>      crf_tagger=ct.train(train_data,'model.crf.tagger')
>>
>> This code is working fine.
>> Now if I change the data size to say 500 or 3000 in  train_data by giving
>> train_data=sents[:500] or
>>   train_data=sents[:3000] it is giving me the following error.
> 
> What about sents[:499], sents[:498], ...?

Do a rough binary search for the first stop value that raises.

tagger(400)
tagger(350 or 450, depending)
...

You could automate with bisect module, but bisection by eye should be 
faster.

> I'm not an nltk user, but to debug the problem I suggest that you identify
> the exact index that triggers the exception, and then print it
> 
> print sents[minimal_index_that_causes_typeerror]
> 
> Perhaps you can spot a problem with the input data.
>   
> (In the spirit of the "offtopic" remark: if sents[:333] triggers the failure
> you have to print sents[332])

Or mentally subtract 1 from minimal failing stop value.

> 
>> Traceback (most recent call last):
>>    File "<pyshell#2>", line 1, in <module>
>>      NE_TAGGER()
>>    File "C:\Python27\HindiCRFNERTagger1.py", line 20, in NE_TAGGER
>>      crf_tagger=ct.train(train_data,'model.crf.tagger')
>>    File "C:\Python27\lib\site-packages\nltk\tag\crf.py", line 185, in train
>>      trainer.append(features,labels)
>>    File "pycrfsuite\_pycrfsuite.pyx", line 312, in
>>    pycrfsuite._pycrfsuite.BaseTrainer.append
>>    (pycrfsuite/_pycrfsuite.cpp:3800) File "stringsource", line 53, in
>>    vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string
>>    (pycrfsuite/_pycrfsuite.cpp:10738) File "stringsource", line 15, in
>>    string.from_py.__pyx_convert_string_from_py_std__in_string
>>    (pycrfsuite/_pycrfsuite.cpp:10633)
>> TypeError: expected string or Unicode object, NoneType found
>>>>>
>>
>> I have searched for solutions in web found the following links as,
>> https://stackoverflow.com/questions/14219038/python-multiprocessing-typeerror-expected-string-or-unicode-object-nonetype-f
>> or
>> https://github.com/kamakazikamikaze/easysnmp/issues/50
>>
>> reloaded Python but did not find much help.
>>
>> I am using Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:22:17) [MSC
>> v.1500 32 bit (Intel)] on win32
>>
>> My O/S is, MS-Windows 7.
>>
>> If any body may kindly suggest a resolution.
> 
> 

-- 
Terry Jan Reedy