[Chennaipy] Chennaipy - Monday Module - 06 Jun 2022
selvi dct
selvi.dct at gmail.com
Mon Jun 6 12:38:13 EDT 2022
Introduction:
It's always hard to parse the binary file to text. Today we will see rescue
module which will help us to convert docx to md file
Module: docx2md
Installation: pip install docx2md
About:
Converts Microsoft Word document files (.docx extension) to Markdown files.
Execution:
% python -m docx2md ~/Downloads/example.docx output.msd
# save output.msd
# save media/image1.png
# save media/image4.jpg
# save media/image3.gif
# save media/image2.png
Output:
% cat output.msd
<div class="break"></div>
# chapter 1
text of chapter 1
## section 1-1
text of section 1-1
### subsection 1-1-1
text of subsection 1-1-1
<div class="break"></div>
insert png
<img src="media/image1.png" id="image1">
insert bmp
<img src="media/image2.png" id="image2">
insert gif
<img src="media/image3.gif" id="image3">
insert jpg
<img src="media/image4.jpg" id="image4">
<div class="break"></div>
* aaaaa
* bbbbb
* ccccc
* ddddd
* eeeee
* fffff
* ggggg
* hhhhh
* iiiii
* jjjjj
<table id="table1">
<tr>
<td>a</td>
<td>b</td>
<td>c</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>f</td>
</tr>
<tr>
<td>g</td>
<td>h</td>
<td>i</td>
</tr>
</table>
Reference:
https://pypi.org/project/docx2md/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/chennaipy/attachments/20220606/83666344/attachment.html>
More information about the Chennaipy
mailing list