At the NDF 2015 conference I gave a talk on measuring the success of your open data work. Here is text and the YouTube video for the talk.
I said in my pitch for this talk that no GLAM organisation in New Zealand provides truly open data and this makes me a little sad. Now, I’m not going to go into why you should do open data, there is a great presentation called The Future is Open by Michael Edson which I recommend you take a look at if you need to be convinced.
The good news is that there are very few if any organizations worldwide who are doing it right. So, we’re not alone.
But what is right?
Well, firstly we should define data. And sometimes it is easier to define what data isn’t. Data is not metadata, data is not numbers, data is not charts, data is not image files, data is not essays.
Data is all of the things. Data is everything that your organization outputs. To steal a proverb, one man’s essay is another man’s corpus of text mining training data. And when you think about it an image is simply data on a point in time.
So if we think about data like this, how do we make it open? Contrary to popular belief open data is not a CC license. Now NZGOAL, along with other tools are great initiatives and we hear a lot about these in the GLAM sector.
But we actually have the rather dryly titled “New Zealand Data and Information Management Principles” which came out in 2011. It’s a great framework for thinking about what open data actually means in practice.
3. 7 principles
There are 7 principles:
- Readily Available
- Trusted and Authoritative
- Well Managed
- Reasonably Priced
And when we dive into these one by one we can easily measure ourselves against the principals.
Data should be open. You need a really, really, really good reason not to release it. I’m not going to go into the OIA in 7 minutes but national security probably isn’t the reason why you are choosing to not open something, although the archivist for the GCSB may beg to differ. So what is your moral argument for not opening something?
Of course, some items are going to be personal or confidential — so how do we deal with those? At what point does a soldier’s medical record become acceptable? For our sector I’d go further and bring in issues of cultural sensitivity. The National Library has really good policies around this and maybe this is something that we can work on as a sector to come up with a starting point for all organizations to follow.
6. Readily Available:
You think about making the information accessible from day 1. You don’t give Google something and not everybody else. And you need to make sure it is well documented and easy to find. Have a page or catalogue outlining what data you have, what your policies are and list this data in data.govt.nz. Wouldn’t it be great if you could go to any natlib.govt.nz/data or tepapa.govt.nz/data or aucklandmuseum.govt.nz/data and know that you will find their open data policies and what data they have available?
Earlier today there were questions about the licensing of the Cenotaph database that are not clear on the website even though it contains a lot of reusable content and access via the Auckland Museum API.
7. Trusted and Authoritative, Well Managed:
We should have this nailed right? This is what we do, we’re memory institutions! On the flip side, don’t be afraid to open something that isn’t perfect. People will forgive if you if you are upfront about your imperfections.
8. Reasonably Priced:
A pretty binary decision here. The cost of dissemination is trending to zero, there is no reason to charge if you are a reasonable sized organization. In fact, charging can cost you money, we have yet to see an organization that makes a profit from licensing images when people‚’s time has been taken into account. Now I get the issues that small museums face with funding and selling images can help, a few hundred dollars a year can make a real difference when volunteers are fulfilling the request.
This is the nuts and bolts so let me dig deeper here.
10. Original versions:
I don’t care how good your lossy jpeg is, the source or it isn’t original. Now feel free to provide reusable derivatives, as a default but only derivatives is not original. You may protect these behind some form of key to limit the effects of network traffic but they should still be readily available if requested.
It needs to have a proper license. In New Zealand a NZGOAL license is understood and well documented. And let me reiterate, Non Commercial licenses are not truly open.
12. Machine-readable format:
Understand how coders think. If we can write a script to do something then we will. Make sure that your data can be downloaded and processed with a script. This can be as simple as dumping a CSV file on your web server or as complex as an API. A data dump of some key fields and the urn to the original images is a perfect starting point for most collections data sets.
13. With metadata:
Well documented data is critical. I can’t tell you the number of times I’ve run into an API only to find that I can’t get it to work. Just last week with the Cooper Hewitt API it took me a few tries to work out if the ‘has_images’ parameter needed a ‘yes’, ‘true’ or ‘1’ as the value. There are a bunch of tools out there now which make it really easy to document your API‚’s and datasets, use them.
14. In aggregate or modified forms if they cannot be released in their original state:
Because sometime we can’t release the originals. If you have a dataset with lots of personal information it isn’t something that you want to, or should release. But think about what you can release. Is there aggregated data that you can release? Can you strip personal information out and still release it?
15. Non-proprietary formats
Data and information released in proprietary formats are also released in open, non-proprietary formats:
Sometimes you do want to release something in a proprietary format to make it really easy to integrate with some industry standard software. That’s OK as long as you also release it in an open format as well. I’ll also go further and say that you should release the data in simple formats even if you are releasing it in an open hard to use format.
16. Digital rights technologies are not imposed on materials made available for re-use:
No watermarks, no DRM.
So there you have it. 7 principles in 7 minutes that can guide you in opening data and help you measure where you are.
So how do you measure up?