User:Martind/Document Log Discovery Platform
From London Hackspace Wiki
Problem Statement
We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
- Can we allow large number of interested parties (anyone really) to annotate these documents?
- What kinds of annotations do we want to make? (Information structure)
- How can we make that easy? (Tools)
- Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
- Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
Exemplary Publications
- WikiLeaks datalog dumps
- Iraq War Logs
- Embassy Cables
- http://spacelog.org/
- http://news.ycombinator.com/item?id=1958292 "I would love to get an in-depth technical explanation of the requests and procedures -- how all this stuff works and insights into the troubleshooting process."
- etc
Observations
- What constitutes an "interesting" section of a document is a matter of perspective.
- Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
- Relationship between "popular" and "interesting" items:
- Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc.
- "Interestingness" requires work, since it is the result of an editorial process. This makes it slower, potentially tedious to establish.
- The latter could however feed into the former: items that are widely perceived as interesting
- To best accomplish this we could simplify the workflow of an editorial process.
- Many parties will already build browsers for data log archives, with varying ways of navigating such content.
- We don't need to duplicate those efforts, but we should integrate with them.
- It seems useful to be able to link/group individual messages
- It seems useful to be able to annotate content (with text, links)
- It seems useful to be able to contribute anonymously
- ...
Addressing Schemes for Archives
- Need a shared addressing scheme that works across archives, archive browsers
- Based on permalinks
- Alternatively: need a method of translating between different addressing schemes
- Start with a review of link structures of a wide spectrum of archives
- Should publish best practises for a good addressing scheme
- Document and possibly share the structure of individual addressing schemes
- Publish recommendations for addressing schemes, terminology used: common conventions help
- ...
Links
- http://booktwo.org/notebook/openbookmarks/ check these sites for bookmark/annotation conventions
- http://www.openbookmarks.org/ (focused on ebooks)