[moin-user] Coding of attachment file names

Volker Wysk post at volker-wysk.de
Mon Aug 8 11:35:31 EDT 2022


Am Montag, dem 08.08.2022 um 16:58 +0200 schrieb Paul Boddie:
> On Monday, 8 August 2022 16:13:53 CEST Volker Wysk wrote:
> > Sometimes I have an attachment file name with a quote in it. This quote gets
> > replaced by an underscore when the attachment is uploaded. I can still
> > refer the attachment with the quote(s), but the attachment is saved to disk
> > with underscores instead of quotes. (Why?)
> > 
> > I guess this doesn't affect just quotes, but some more characters. I need to
> > know which characters are replaced, and how the attachment file name on
> > disk occurs.
> 
> When storing the attachment in a file, Moin has to encode the filename 
> according to the limitations of the underlying filesystem and to prevent 
> potential filename processing exploits. So, you will see various characters 
> get replaced in the supplied filename.

I just assumed it was Linux.  :-)

> Looking at the code, you'll find the add_attachment function in MoinMoin/
> action/AttachFile.py where the wikiutil.taintfilename function is called. This 
> function resides in MoinMoin/wikiutil.py and performs a substitution as 
> follows:
> 
> re.sub('[\x00-\x1f:/\\\\<>"*?%|]', '_', basename)
> 
> This effectively replaces all character codes up to 31, colon, slash, 
> backslash, less than, greater than, double quote, asterisk, question mark, 
> percent, and pipe with an underscore, as you can see.
> 
> I was somewhat surprised that the substitution was less sophisticated than the 
> one used by page names which you can find in the same module in the 
> quoteWikinameFS function. That performs some kind of encoding where you will 
> see things like "(20)" in filenames, with that particular example being used 
> for spaces.

Yes, I had to deal with this too. You get a LOT of bracketed numbers, even
for things like spaces, which wouldn't need to be encoded.

> I don't know what the history is behind this functionality, but I imagine that 
> the basic functionality was done a couple of decades ago and it has been 
> refined since then, although I do think that the above differences in filename 
> processing are a bit odd. Then again, once things become established, they 
> tend to stay that way forever.

Yes, they do!  :-)

Volker

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/moin-user/attachments/20220808/aaab7c63/attachment-0001.sig>


More information about the moin-user mailing list