[TriZPUG] RDF and Open Data (was Re: TriZPUG Digest, Vol 64, Issue 6)

Tue Aug 20 21:16:16 CEST 2013

On 8/19/2013 6:24 PM, Eric Leary wrote:
> Colin warmed every one up perfectly last month to a lot of the same
> material thats in Josh's book - so I think we are ready for a
> presentation that gets a little closer to the realities of
> implementation for coders. Recently I've gone down the rabbit hole on
> RDF, RDFa, and JASON-LD in trying to understand their future role or
> rejection. Are they dead, or do they just smell funny?

Open Data has breathed new life into RDF via two developments:

http://en.wikipedia.org/wiki/SPARQL

http://en.wikipedia.org/wiki/GeoSPARQL

My co-worker in the office next door does SPARQL for metadata ontologies.

I have my doubts about whether ontologies are ever going to be useful, 
however. If disciplines can't agree about metadata and vocabularies, who 
is going to arbitrate metadata translation? Does having a Russian to 
English dictionary translate War and Peace into English on its own?

I've never believed in the semantic web.

> Chris and James were able to point out a lot of paradoxes in principle
> and in day to day trade craft that made me realize how naive I am about
> the "power of open data" and "open anything."

Just to be open, things I pointed out to Eric:

1) Data liability is an obstacle to openness. If I provide data, and you 
provide a service on top of that data, and then your service fails 
because the data I provided you were faulty, am I liable to you even if 
I were providing the best available data in good faith? Many open data 
providers will post a policy that tries to wash off any liability. But 
the law may not recognize such policies. If open data is "use at your 
own risk," can open data every be useful for public safety? For 
investment decisions? Aren't those the kind of things we need open data 
for, the things that matter?

2) Personal privacy is an obstacle to openness and openness is an 
obstacle to personal privacy. Data about people generally needs to be 
anonymized in order protect individual privacy. Yet calls for openness 
have gone so far that the personal names, home addresses, home phone 
numbers, and *salaries* of state government employees are available 
through open data services (luckily NC teachers and university employees 
have somehow been overlooked in this particular boondoggle). I can 
freely look up how much you paid for your house and how much you had to 
borrow for a mortgage. I can look up your political party affiliation, 
your age, race, gender, home address, and which elections you voted in 
over the last several years.

3) Governments hide public data through third party vendor access. Some 
government agencies may hold public but have no legal or budgetary 
mandate to help you find or access it. There's an opportunity there for 
agencies to make money giving private companies the raw public data, and 
then the private companies will charge you for organized search and 
access to public data. Arrest records are public data but you'll need to 
fork over some dollars to private companies to look at this data, just 
enough to discourage it in most cases. Sometimes companies can get 
exclusive rights to distribute public data.

4) Governments will only go so far to allow access to data. The more 
valuable or politically sensitive data is, the more likely it is to be 
"classified" even if paid for by tax dollars. Governments also respond 
to business interests to suppress access to or defund generation of 
data, particularly scientific data. It's easy to access data from 
successful clinical trials. But it's not easy to access data from 
unsuccessful clinical trials. Even when the unsuccessful trials 
outnumber the successful ones by orders of magnitude for a particular drug.

5) Governments can and do own intellectual property which they can and 
do decide to keep proprietary rather than openly license, going so far 
as to generate revenue streams with proprietary licensing. There have 
been bills in front of Congress, so far fortunately unsuccessful, to 
restrict access to nationally financed weather data to only certain 
companies such that you would have to pay those companies to get weather 
reports. Most state university systems operate intellectual property 
offices to capture patents and copyrights for royalties.

6) Providing access to public data is an expensive public service. 
Archiving data long term is super expensive. Cataloging and classifying 
data is labor intensive. Who pays for it? If it comes from access fees, 
does that provide unfair advantages to those who can afford access over 
those who can't but who did help pay for creating the data? When 
financial times are tight and just keeping the public safety net patched 
is a challenge compared to other interests, is open data all that 
important? What if you visited the public library, and all the books 
were piled on a table without Dewey decimal numbers and there were no 
card catalog? What if there were no public library? Data requires the 
online equivalent of libraries and librarians. Is public data an 
essential governmental service?

We have this recent and rather toothless presidential executive order:

http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-

I'm witnessing the speed of at least one government agency's response to 
this order, however, and know it will be years in the making if at all. 
There are so many silos within that have to agree to all manner of 
standards to make this work. There are so many offerings of standards 
and methods to implement standards from each silo. Every department 
already has a half-assed skunk-works of an open data project already in 
operation. And the management style for most agencies making these 
decisions on how to "come together" is via consensus (so no one gets 
blamed for bad decisions). Having to make the decisions and then 
implement them also are generally not the within the agencies missions, 
so not only are the decisions by consensus, but the implementations are 
pretty much volunteer work. The public at large would be amazed to know 
just how much government function is accomplished via volunteer work by 
mid to low level government employees in addition to their regular jobs.

But it's a start. The real open data movement is occurring at individual 
municipal levels, such as what the City of Raleigh is doing, and also 
occurring by private uplift, such as Code for America in Durham. There's 
also an overlap in what people consider open data and crowd sourced 
data. Open data is more about unlocking access to already existing 
government data.

-- 
Sincerely,

Chris Calloway http://nccoos.org/Members/cbc
office: 3313 Venable Hall   phone: (919) 599-3530
mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599