[Chennaipy] Chennaipy - Monday Module - 06 Jun 2022

selvi dct selvi.dct at gmail.com
Mon Jun 6 12:38:13 EDT 2022


Introduction:

It's always hard to parse the binary file to text. Today we will see rescue
module which will help us to convert docx to md file


Module: docx2md


Installation: pip install docx2md


About:

Converts Microsoft Word document files (.docx extension) to Markdown files.


Execution:

% python -m docx2md ~/Downloads/example.docx output.msd

# save output.msd

# save media/image1.png

# save media/image4.jpg

# save media/image3.gif

# save media/image2.png



Output:

% cat output.msd

<div class="break"></div>


# chapter 1


text of chapter 1


## section 1-1


text of section 1-1


### subsection 1-1-1


text of subsection 1-1-1


<div class="break"></div>


insert png


<img src="media/image1.png" id="image1">


insert bmp


<img src="media/image2.png" id="image2">


insert gif


<img src="media/image3.gif" id="image3">


insert jpg


<img src="media/image4.jpg" id="image4">


<div class="break"></div>


* aaaaa

* bbbbb

* ccccc


* ddddd

    * eeeee

* fffff

    * ggggg

* hhhhh

    * iiiii

* jjjjj


<table id="table1">

<tr>

<td>a</td>

<td>b</td>

<td>c</td>

</tr>

<tr>

<td>d</td>

<td>e</td>

<td>f</td>

</tr>

<tr>

<td>g</td>

<td>h</td>

<td>i</td>

</tr>

</table>




Reference:

https://pypi.org/project/docx2md/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/chennaipy/attachments/20220606/83666344/attachment.html>


More information about the Chennaipy mailing list