TypeError: expected string or Unicode object, NoneType found

Sat May 19 12:47:34 EDT 2018

subhabangalore at gmail.com wrote:

> I wrote a small piece of following code
> 
> import nltk
> from nltk.corpus.reader import TaggedCorpusReader
> from nltk.tag import CRFTagger
> def NE_TAGGER():
>     reader = TaggedCorpusReader('/python27/', r'.*\.pos')
>     f1=reader.fileids()
>     print "The Files of Corpus are:",f1
>     sents=reader.tagged_sents()
>     ls=len(sents)
>     print "Length of Corpus Is:",ls
>     train_data=sents[:300]
>     test_data=sents[301:350]

Offtopic: not that sents[300] is neither in the training nor in the test 
data; Python uses half-open intervals.

>     ct = CRFTagger()
>     crf_tagger=ct.train(train_data,'model.crf.tagger')
> 
> This code is working fine.
> Now if I change the data size to say 500 or 3000 in  train_data by giving 
> train_data=sents[:500] or
>  train_data=sents[:3000] it is giving me the following error.

What about sents[:499], sents[:498], ...? 

I'm not an nltk user, but to debug the problem I suggest that you identify 
the exact index that triggers the exception, and then print it

print sents[minimal_index_that_causes_typeerror]

Perhaps you can spot a problem with the input data.

(In the spirit of the "offtopic" remark: if sents[:333] triggers the failure 
you have to print sents[332])

> Traceback (most recent call last):
>   File "<pyshell#2>", line 1, in <module>
>     NE_TAGGER()
>   File "C:\Python27\HindiCRFNERTagger1.py", line 20, in NE_TAGGER
>     crf_tagger=ct.train(train_data,'model.crf.tagger')
>   File "C:\Python27\lib\site-packages\nltk\tag\crf.py", line 185, in train
>     trainer.append(features,labels)
>   File "pycrfsuite\_pycrfsuite.pyx", line 312, in
>   pycrfsuite._pycrfsuite.BaseTrainer.append
>   (pycrfsuite/_pycrfsuite.cpp:3800) File "stringsource", line 53, in
>   vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string
>   (pycrfsuite/_pycrfsuite.cpp:10738) File "stringsource", line 15, in
>   string.from_py.__pyx_convert_string_from_py_std__in_string
>   (pycrfsuite/_pycrfsuite.cpp:10633)
> TypeError: expected string or Unicode object, NoneType found
>>>> 
> 
> I have searched for solutions in web found the following links as,
> https://stackoverflow.com/questions/14219038/python-multiprocessing-typeerror-expected-string-or-unicode-object-nonetype-f
> or
> https://github.com/kamakazikamikaze/easysnmp/issues/50
> 
> reloaded Python but did not find much help.
> 
> I am using Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:22:17) [MSC
> v.1500 32 bit (Intel)] on win32
> 
> My O/S is, MS-Windows 7.
> 
> If any body may kindly suggest a resolution.