Help to find a regular expression to parse po file

MRAB python at mrabarnett.plus.com
Mon Jul 6 11:12:18 EDT 2009


gialloporpora wrote:
> Hi all,
> I would like to extract string from a PO file. To do this I have created 
> a little python function to parse po file and extract string:
> 
> import re
> regex=re.compile("msgid (.*)\\nmsgstr (.*)\\n\\n")
> m=r.findall(s)
> 
> where s is a po file like this:
> 
> msgctxt "write ubiquity commands.description"
> msgid "Takes you to the Ubiquity <a 
> href=\"chrome://ubiquity/content/editor.html\">command editor</a> page."
> msgstr "Apre l'<a href=\"chrome://ubiquity/content/editor.html\">editor 
> dei comandi</a> di Ubiquity."
> 
> 
> #. list ubiquity commands command:
> #. use | to separate multiple name values:
> msgctxt "list ubiquity commands.names"
> msgid "list ubiquity commands"
> msgstr "elenco comandi disponibili"
> 
> msgctxt "list ubiquity commands.description"
> msgid "Opens <a href=\"chrome://ubiquity/content/cmdlist.html\">the 
> list</a>\n"
> "      of all Ubiquity commands available and what they all do."
> msgstr "Apre una <a 
> href=\"chrome://ubiquity/content/cmdlist.html\">pagina</a>\n"
> "      in cui sono elencati tutti i comandi disponibili e per ognuno 
> viene spiegato in breve a cosa serve."
> 
> 
> 
> #. change ubiquity settings command:
> #. use | to separate multiple name values:
> msgctxt "change ubiquity settings.names"
> msgid "change ubiquity settings|change ubiquity preferences|change 
> ubiquity skin"
> msgstr "modifica impostazioni di ubiquity|modifica preferenze di 
> ubiquity|modifica tema di ubiquity"
> 
> msgctxt "change ubiquity settings.description"
> msgid "Takes you to the <a 
> href=\"chrome://ubiquity/content/settings.html\">settings</a> page,\n"
> "      where you can change your skin, key combinations, etc."
> msgstr "Apre la pagina  <a 
> href=\"chrome://ubiquity/content/settings.html\">delle impostazioni</a> 
> di Ubiquity,\n"
> "     dalla quale è possibile modificare la combinazione da tastiera 
> utilizzata per richiamare Ubiquity, il tema, ecc."
> 
> 
> 
> but, obviusly,  with the code above the  last string is not matched. If 
> I use re.DOTALL to match also new line character it not works because it 
> match the entire file, I would like to stop the matching when "msgstr" 
> is found.
> 
> regex=re.compile("msgid (.*)\\nmsgstr (.*)\\n\\n\\n",re.DOTALL)
> 
> is it possible or not ?
> 
You could try:

regex = re.compile(r"msgid (.*(?:\n".*")*)\nmsgstr (.*(?:\n".*")*)$")

and then, if necessary, tidy what you get.



More information about the Python-list mailing list