[Mailman-Users] [Mailman-cabal] GDPR

Thu May 31 03:31:45 EDT 2018

Grant Taylor via Mailman-Users writes:

 > What is their working definition of "thread"?

I don't know.  I gave what I think is a reasonable definition, and I
would argue that going to parents of that message is not required by
GDPR, even if for some reason you need to remove whole posts.

 > I'm afraid that the infinite wisdom of politicians will say that the 
 > entire paper needs to be shredded.

We know what the politicians said.  It's in the GDPR law.  Forget
politicians' stupidity.  What matters now is (1) what courts will say,
and (2) what courts will refuse to call frivolous (so that the party
with the uglier lawyer wins at great expense to the party with the
beautiful lawyer).

Appeals judges generally are pretty sensible in the U.S. and Japan,
and usually they do understand the issues.  I suppose it's similar in
the EU.

What I'm concerned with is where PII can enter Mailman and be stored
on the host.  Whether the law reaches that or not is not really
important here.  We look at each place, decide how easy it is to (1)
find all instances of a particular identifier, (2) determine whether
and by whom it has been accessed, and (3) redact that identifier.
Then we look at costs and start implementing the cheaper cases.

 > I think it also significantly depends on what needs to be redacted. 
 > Removing "supercalifragilisticexpialidocious" is a LOT different than 
 > removing "Grant Taylor" from the Mailman-Users archive. 

It needs to be personally identifying, and pragmatically (1) above
means either (a) it will be found in certain header fields which we
can remove entirely or redact in full or part, or (b) a full-text
search will find it.  This means that descriptions like "the US
politician known to lie 6 times a day" are out -- there are too many
ways to express that.  If GDPR requires finding and redacting that,
the list will have to fold up shop.  But I don't think it does: I
think here PII refers to numbers, names, and addresses (as we usually
understand those words!) that uniquely identify a person for purposes
such delivering goods, services and information, or as part of an
authentication process for accessing services (eg, financial or
informational).

 > I wonder if there's any correlation between the IP that authenticated 
 > and the IP that accessed data.

Not in Mailman, although it could be done.  HTTP is a stateless
protocol, so to maintain a session you need to provide a token
(typically a "cookie").  That token can be passed around in the user's
network.  It would be possible to include the IP in the data hashed to
create the auth token, and validate that, but we don't.

 > 2)  *sigh*  It sounds like GDPR is talking about specific fields that 
 > could contain PII, even if they don't, while ignoring other fields that 
 > erroneously do contain PII.

It's not GDPR.  *I* wrote that.  What I was trying to say is that
there are fields like display name and email that are normally used
for data that is PII, and so would be presumed to contain PII if
populated in a database record.

 > > However, in Mailman 2 the various list passwords are shared, and
 > > would not identify individuals in cases with multiple moderators
 > > or list owners.
 > 
 > IMHO that's an operational mis-step.

It's a FACT, and it's not going to change in Mailman 2.  We need to
work with it, or perhaps European lists simply won't be able to use
Mailman 2 with multiple admins if GDPR requires auth that identifies a
single individual.  (Mailman 3 does allow identifying a single
individual, but I don't think we log auth attempts or successes
yet.)

 > (Part of) GDPR is not about (just) knowing who has (had at the
 > time) legitimate access to data, but additionally making it more
 > difficult for other 3rd parties to gain access to the data in the
 > future.  By the fact that the data is removed from the corpus that
 > the 3rd party is subsequently given access to.

I don't think "make it difficult to access data" is a requirement in
GDPR.  I think making reconstruction of history difficult is the
*intent* of GDPR's "right to be forgotten", but that doesn't mean you
need to conceal data (such as social network "handles") that is
normally used to identify users in operation.

The access logging is about a different aspect of privacy, which is
knowing who had access to that data.

AFAICS, the privacy policy itself is up to the host and/or the
industry and its regulators.  Wikis may have zero privacy in normal
operation, but you still need to log accesses to people's profiles I
suppose.  Banking privacy is specified by banking laws, not GDPR, I
suppose, but again GDPR mandates logging of accesses.

 > I'm talking about 3rd party spam filtering services that are in the
 > path between, downstream in between Mailman and the recipient's
 > server.  They collect logs / data all the time.  Usually those logs
 > and that data are what help them be better at their job of spam
 > filtering.

The Mailman admins don't have access to that data in this scenario, I
assume.  I don't really think the Mailman host is implicated there,
even if they're the direct client of such a service.  I suspect what
the Mailman host needs to worry about most is interruption of service
if the vendor gets put out of business for GDPR violation.

Steve