An idea a day - indexed digital magazine library
As a participant in the Canadian small press / literary magazine community, I subscribe to a lot of magazines. But I have a big problem. My magazines pile up everywhere. I've ran out of shelf space about a year ago, and now I have magazines on the floor, behind my monitor, on the windowsill, and peaking out from between pots and pans. Since I mostly subscribe to quarterlies, I also get a glut of them all at the same time, making it really difficult to pick out the one I want to read.
I want to find a way to create an electronic index of my magazines. The ideal situation is to scan every magazine that I own into a privately held collection, and to convert the text into a machine readable format for cross referencing. I imagine public libraries have similar systems to deal with their backlog, but they usually use large expensive archival services like LexisNexis, which is out of reach of the ordinary individual. I admit, the idea is not new: Google Books does the same thing to books, and they are trying to get patent on scanning and indexing newsprint and magazines. I want something that is much more personal - something under my direct control. E-readers and magazine websites are not ideal, because they cut out the design element and instead treat content as discrete blocks: a chunk of text, a few images, a video or audio clip here and there - definately not reflective of the designed magazine package that I love.
I tried to use Abbyy, an OCR software suite, to scan a copy of Ricepaper Magazine, but that didn't do much good because of the amount of design integrated into the magazine. I need something a lot more robust - something that can handle small and large fonts mixed in with foreign languages, with design elements that are not only decorative pieces but also form the bulk of meaning. Google encountered the same problems - their software had difficulties in recognizing multi-column text, large banner style headers, etc.
This is a great opportunity for the publishing industry - no one really understands how to parse design heavy packages of content. Publishers often have access to the original digital data files for magazines and books, but the value is locked within and never used.
A much wider challenge, however, is to create a collective search engine that indexes all Canadian literary magazines with the ability to recommend content depending on search terms - then enticing the searcher to purchase the content.
This sounds like a "me-too" idea - a riff off of the recommendation engine née search engine that Google brought into the world, but with a major distinction: It is our niche. We need to do this ourselves because we understand the market and the product intimately. We need to, at all cost, maintain that intimate bond with our customers, fans, readers, authors, and community. Without those bonds, our industry wouldn't exist.
If we don't run our own services - and confront our own fears about copyright, consumer rights, authorship, and the enjoyment of the media that we produced, that we own as the collective Canadian small press - someone else will do it for us. Google is eyeing the magazine business, licking it's robotic lips as it dreams about how much ad revenue it can add to its bottom line by digitizing magazines the way it digitized books.
When that happens, we won't get a choice. So my rallying call is for every independent publisher in Canada to take a second look at their archives - to scan them, digitize them, find the original electronic copies, put them in a database, index them, even the very simple idea of going through the back issues and out-of-print items and adding keywords, tags, and metadata. Your own employees, authors, editors and volunteers can use this very valuable resource to make your business and your art a stronger presence in the digital world.
The result of this kind of effort can be a private repository or a public free-for-all - but this is an issue that we all need to take a moment in our busy day to address.
I will be building a prototype system using the magazines that I have in my possession. The deadline is BookNet Canada Technology Forum 2010 on March 25th.
If you, as a publisher, are curious about these topics, perhaps I can help by showing you how to do it yourself. I'll even work on this with you for free - if you toil under the tight budget constraints of honourariums and grants. I don't want to ignore copyrights and authorship. But I do want publishers to start taking care of their own copyrights and do their own indexing, before someone with a much heavier commercial interest takes a greedy look at our content.
David Winer posted a very similar entry in his blog titled Big change in the tech world (Scripting News). It's well worth the read - now it's up to us to do something about it. Quoted here is a truely honest opinion of the tech / media landscape:
If you're in the media industry, stop partnering with the tech industry, and hire away some of their best people and give them power to run your business. This is how your boat will stay afloat. Pretending these companies are your friends is ridiculous. They don't care about you. Look at how well they're doing monetizing your content. This is probably what you need to learn to do, and there's no time to learn. Hire their people away and get ready to compete.
Comments [0]