User:Martind/Document Log Discovery Platform

From London Hackspace Wiki

Problem Statement

We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?

  • Can we allow large number of interested parties (anyone really) to annotate these documents?
    • What kinds of annotations do we want to make? (Information structure)
    • How can we make that easy? (Tools)
    • Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
  • Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?

Exemplary Publications

Observations

  • What constitutes an "interesting" section of a document is a matter of perspective.
    • Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
  • Many parties will already build browsers for such publications, with varying ways of navigating such content.
    • We don't need to duplicate those efforts, but we should integrate with them.
  • ...

Requirements

  • Need a shared addressing scheme that works across document browsers
    • Based on permalinks
    • we can translate between different approaches
  • Need ways to link/group individual messages
  • Need ways to identify level of "interestingness" (e.g. voting, cf. Q&A sites)
  • Need ways to annotate content (with text, links)
  • Contribution should be possible anonymously
  • ...

Links