I’ve been following, of course, the revelations and debate about US and UK security services’ interventions (is that an appropriate word?) in the world of electronically-mediated communications. It’s no great surprise to many of us that the NSA and GCHQ have their fingers deeply into these pies.
If enterprise IT people discuss cloud solutions for data storage, email, and so on, the requirements of, particularly, EU Data Protection legislation come up often. If you outsource to a large cloud provider then they may assure you that your European data will be stored in the EU: in a data centre in Ireland, say. Ask them where the backup is kept, though, and the answer is sometimes Texas. Even if it isn’t, then if the company is a large US corporation it is susceptible to US government pressure: and if it’s based outside the US but does a lot of business with the US then almost the same applies. We’ve seen this in the revelations about interference with encryption standards; and it’s generally believed that this isn’t the only commercial sector where it happens.
What’s outside expectations is the scale and depth of the interference.
So today’s Guardian carries reports of editor Alan Rusbridger’s appearance at the House of Commons Home Affairs committee. In parallel, yesterday a supplement was published covering the issues, with some interesting infographics. For example: the number of deaths in the UK from terrorism-related actions is marginally higher than that from bee and wasp stings. More critically, there’s an interview with Tim Berners-Lee and, following this up, outlines showing how the so-called counter-terrorism activity blows a hole in internet security generally and cripples, therefore, any internet business relying on open standards for its confidentiality. Apparently Indian embassies have been instructed to return to using manual typewriters to prepare confidential documents.
It’s worth publicising, too, that the United Nations’ chief counter-terrorism official is launching an investigation: not into the Guardian’s publishing, but into the governments’ activities. Ben Emmerson, interestingly, is a British Queen’s Counsel, a very senior lawyer. He’s quoted in the same supplement to the effect that publishing reports of this activity is at the forefront of public interest. This is not some interest-group’s independent review. This is above government, an international review at the very highest level. In the US, very senior people accept that the issues need to be debated. Only in the UK, apparently, is there an attempt to shoot the messenger. Something Alan Rusbridger labelled yesterday as a classic diversionary tactic.
That’s background. But there’s one thing in all this reporting that is going unchallenged and shouldn’t be. It’s the use of the word “metadata”.
Any IT graduate or professional with any training in data management learns what metadata is. It’s “data about data”. That is, it describes, for example, the type of a data attribute, its name in the database, its range of permissible values, and what it means (in so far as this can be codified). Metadata says that the attributes recorded for a phone call, for example, are a date stored as an eight digit yyyymmdd literal, a time stored in 24 hour format to the nearest 0.1 second, a location stored as a Greenwich latitude and longitude pair, and so on.
Information about when and where I use my mobile phone, or the subject line of an email, is not metadata. It is data. A particular instance – the phone call I made yesterday – is a data item and the time, date, location and content of that call are attributes of the data item. I repeat: this is data. It is not metadata.
Even the Guardian itself is misusing the term. A Guardian article asks “Is metadata just billing data …”. This is the wrong question. Is billing data metadata – no, it is computed from attributes of the call (such as the start and end times, location and so on) and a tariff table. These are data, not metadata. Moreover they are personally identifiable information and should be afforded the full protection of the Data Protection Act.
This is far from a technical detail. By mis-labelling these data attributes as metadata, security agencies are making them seem less harmful. Not least, because metadata is not a term in normal everyday use, so lay people will simply adopt the definition they are offered. IT-literate people need to shout.
• Guardian editor Alan Rusbridger appears before MPs: Guardian, online, offering live coverage from 3 Dec 2013
• Edward Snowden revelations prompt UN investigation into surveillance: Guardian, online, 2 Dec 2013 (print, 3 Dec)
• Metadata: is it simply ‘billing data’, or something more personal? Guardian online, 2 Dec 2013
• Wikipedia: Metadata (this is a good definition, as reviewed today, with references to ISO standards and other sources)