[Mailman-Developers] "@" in mail text gets replaced inarchives

Sun Sep 28 10:33:45 EDT 2003

At 1:37 AM -0400 2003/09/28, Barry Warsaw wrote:

>  I really really want to use something like message-ids to generate
>  message file names.

	IIRC, Earl talks about this in the FAQ.  In short, for security 
reasons, you can't trust any of the information you are given 
anywhere in the message, unless you can scrub that information and 
guarantee that it is now safe.  Otherwise, you could get a message-id 
like "<../.htaccess>" or some other equally nasty thing that could 
potentially cause other files to be over-written inappropriately.


	Moreover, given that there are a lot of people out there with 
home networks using RFC 1918 private addressing, and this information 
is being used to help generate otherwise properly formatted 
message-ids, the probability of message-id collision increases 
significantly.  This issue was recently brought to my attention 
because of my own RFC 1918 private networking here at home, and the 
information my MUA uses to generate message-ids.

	Therefore, I think we might want to be a bit more careful in how 
we generate the file names.

>                       I want to be able to generate links to archived
>  messages in the footers, but I think the best way to do that is to agree
>  on a reproducible, independent algorithm for calculating them.

	One thing that MHonArc does for messages that are not assigned a 
message-id (to help detect and eliminate duplicates) is to calculate 
an MD5 hash of the message headers and uses that as a substitute.  We 
could do the same, or perhaps even use the MD5 hash instead of the 
message-id, and then store hash/message-id mappings in a database.

>                                                                  Another
>  approach would be to put even the public archives behind a cgi and have
>  that implement a mapping between message-id derived links and the
>  sequential file names (although that won't fix the regen problem).

	One problem that most OSes have is with too many files in a 
single directory -- go much over 1000 files in a directory and 
accessing anything in that directory starts taking significantly 
longer than it used to.  If you use a sequential message numbering 
system, it's hard to break those up into smaller chunks of messages 
in a hashed directory scheme.  With MD5 hashes, it would be a lot 
easier to convert the hash into a path name, just by adding slashes 
every so often in the hash value.

-- 
Brad Knowles, <brad.knowles at skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

[Mailman-Developers] "@" in mail **text** gets replaced inarchives

[Mailman-Developers] "@" in mail text gets replaced inarchives