Some Issues on Tagging Text
MRAB
python at mrabarnett.plus.com
Fri May 25 20:01:43 EDT 2018
On 2018-05-25 23:24, Cameron Simpson wrote:
[snip]
> You can reduce that list by generating the "wordlist" form from something
> smaller:
>
> base_phrases = ["Kilauea volcano", "government of Mexico", "Hawaii"]
> wordlist = [
> (base_phrase, " ".join([word + "/TAG" for word in base_phrase.split()]))
> for base_phrase in base_phrases
> ]
>
> You could even autosplit the longer phrases so that your base_phrases
> _automatically_ becomes:
>
> base_phrases = ["Kilauea volcano", "Kilauea", "volcano", "government of
> Mexico", "government", "Mexico", "Hawaii"]
>
That list should also include "of".
As the OP doesn't want all instances of "of" to be tagged, there could
be a separate exceptions list that contains those sub-phrases that
should not be tagged; they would be dropped from the base_phrases list
that was created.
[snip]
More information about the Python-list
mailing list