User:Martind/Document Log Discovery Platform
From London Hackspace Wiki
Problem Statement
We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
- Can we allow large number of interested parties (anyone really) to annotate these documents?
- What kinds of annotations do we want to make? (Information structure)
- How can we make that easy? (Tools)
- Can we identify good conventions and techniques for the above that are more generally applicable?
- Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
Exemplary Publications
- WikiLeaks datalog dumps
- Iraq War Logs
- Embassy Cables
- http://spacelog.org/
- http://news.ycombinator.com/item?id=1958292 "I would love to get an in-depth technical explanation of the requests and procedures -- how all this stuff works and insights into the troubleshooting process."
- etc
Observations
- What constitutes an "interesting" section of a document is a matter of perspective.
- Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
- Many parties will already build browsers for such publications, with varying ways of navigating such content.
- We don't need to duplicate those efforts, but we should integrate with them.
- ...
Requirements
- Need a shared addressing scheme that works across document browsers
- Based on permalinks
- we can translate between different approaches
- Need ways to link/group individual messages
- Need ways to identify level of "interestingness" (e.g. voting, cf. Q&A sites)
- Need ways to annotate content (with text, links)
- Contribution should be possible anonymously
- ...
Links
- http://booktwo.org/notebook/openbookmarks/ check these sites for bookmark/annotation conventions
- http://www.openbookmarks.org/ (focused on ebooks)