History’s black hole: the Internet age

Dame Lynne Brindley, Chief Executive of the British Library (BL), spoke last week to highlight the ephemeral nature of digital media and, in consequence, the potential for large areas of knowledge to become invisible to analysts and historians in the future. A podcast of the speech is promised.

Historical research, in particular, relies on sources: public and private correspondence, official records, photographs, architectural plans and much more. The railway preservation movement relies heavily on archived drawings. We rediscover unknown music by respected composers. We pore over Cabinet and government papers (notice: papers) when they are released. And so on. The BL already has an Endangered Archives programme focussed on “pre-modern” material.

But, as Dame Lynne points out, increasingly our records are digital. And things move on: both file formats and media. The National Archive uses archaic software to read old documents (though actually, Office is pretty good at importing its precursors). I still have a small box of five-and-a-quarter-inch floppy discs, but no way to read them. Even the three-and-a-half-inch format is a problem. I have a box of research data somewhere on 8-hole paper tape … some hope!

Professor Fred Hoyle envisaged the problem decades ago in his novel The Black Cloud (1957): at the end his character wonders what to do with data in an outdated format. And it is a problem for anyone maintaining long term archives. Could you read, now, an old magnetic tape written at 200 bpi? It’s for this reason, among others, that electronic submissions to the US Food and Drugs Administration (FDA) have to include, with the data files, the computer system on which they are to be read. I believe that’s still the case!

There are a couple of examples in the news reports. President Obama’s White House website now contains no material whatsoever, so far as can be determined, from the Bush era; although the Bush version of whitehouse.gov was still up there half an hour before the new man was sworn in. And even Google’s cache contains very little, though there are copies of some material that were lifted onto blogs and other sites.

And, going back a little further: in 1986 the BBC ran the Domesday Project which recorded a wide range of facts about the state of the nation. But it was held on two 12-inch video disks, and was only rescued for posterity thanks to a team working with the last surviving player. Meanwhile William the Conqueror’s original Domesday Book, ink on parchment and nearly a thousand years old, is still readable thanks to its ancient and very conventional technology.

Or my own personal example. My Pocket Website ran a successful Year 2000 information section for about three years. When I published a note saying I proposed to take it down, I got emails asking for it to be retained because, even quite a short time after the rollover, much of the material had disappeared. (So it’s still there.)

How about a standard electronic format? A version of Adobe’s PDF has been designed for this purpose. PDF/A is an ISO standard (ISO 19005-1:2005); it is based on an archaic version of PDF (1.4) and is being updated. It adds restrictions to make the format universal, such as that all fonts must be embedded; and some to ensure a lowest-common-denominator interoperability, such as prohibiting multimedia and executable content. Its use would, of course, require existing archives whether document-based (Word, for the most part) or image-based (TIFF) to be converted. And it doesn’t say anything about media; but, as repository managers know, long term files have to be periodically copied.

And maybe another element is the increasing move of “stuff” onto the Web. Once upon a time, the family’s snapshots were in photo albums on the bookshelf or fading at the back of the cupboard. Now they’re digital, on the home PC. If they’re lost through carelessness, burglary, or deliberate decision, they’re really gone. But increasingly they’re also on Flickr or Facebook or YouTube, and likely to stay there. One up to the Web!

There’s something else important here which we might capture as “What’s a standard?”. MS Word version 2 was a “standard” when it was current. So was the “big” floppy. They aren’t now, and Microsoft has just radically changed the Word document format again. Twenty years is a long time in IT development, but not long in the lifetime of information. We need to be more aware.

• Dame Lynne Brindley challenges Government on Digital Britain British Library, 21 Jan 2009
• We’re in danger of losing our memories Lynne Brindley, Observer, 25 Jan 2009 (there’s also a news report in the same paper)
Domesday Redux: The rescue of the BBC Domesday Project videodiscs Ariadne (UK online archives and libraries journal), Jul 2003
Pocket Year 2000 part of InformationSpan’s Pocket Website
• PDF/A-1, PDF for Long-term Preservation, Use of PDF 1.4 entry at Digital Preservation, an initiative of the Library of Congress
The Endangered Archives Programme at the BL