User:Martind/Document Log Discovery Platform
From London Hackspace Wiki
Problem Statement
We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
- Can we allow large number of interested parties (anyone really) to annotate these documents?
- What kinds of annotations do we want to make? (Information structure)
- How can we make that easy? (Tools)
- ...
Exemplary Publications
- WikiLeaks datalog dumps
- Iraq War Logs
- Embassy Cables
- http://spacelog.org/
- etc
Observations
- What constitutes an "interesting" section of a document is a matter of perspective.
- Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
- Many parties will already build browsers for such publications, with varying ways of navigating such content.
- We don't need to duplicate those efforts, but we should integrate with them.
- ...
Requirements
- Need a shared addressing scheme that works across document browsers
- Based on permalinks
- we can translate between different approaches
- ...