Python library to break text into words

beliavsky at aol.com beliavsky at aol.com
Thu May 31 16:26:56 EDT 2018


I bought some e-books in a Humble Bundle. The file names are shown below. I would like to hyphenate words within the file names, so that the first three titles are

a_devils_chaplain.pdf
atomic_accidents.pdf
chaos_making_a_new_science.pdf

Is there a Python library that uses intelligent guesses to break sequences of characters into words? The general strategy would be to break strings into the longest words possible. The library would need to "know" a sizable subset of words in English.

adevilschaplain.pdf
atomicaccidents.pdf
chaos_makinganewscience.pdf
dinosaurswithoutbones.pdf
essaysinscience.pdf
genius_thelifeandscienceofrichardfeynman.pdf
louisagassiz_creatorofamericanscience.pdf
martiansummer.pdf
mind_aunifiedtheoryoflifeandintelligence.pdf
noturningback.pdf
onshakyground.pdf
scienceandphilosophy.pdf
sevenelementsthatchangedtheworld.pdf
strangeangel.pdf
theboywhoplayedwithfusion.pdf
thecanon.pdf
theedgeofphysics.pdf
thegenome.pdf
thegoldilocksenigma.pdf
thesphinxatdawn.pdf
unnaturalselection.pdf
water_thefateofourmostpreciousresource.pdf
x-15diary.pdf



More information about the Python-list mailing list