User:Martind/Document Log Discovery Platform: Difference between revisions

From London Hackspace Wiki
No edit summary
Line 22: Line 22:
* What constitutes an "interesting" section of a document is a matter of perspective.
* What constitutes an "interesting" section of a document is a matter of perspective.
** Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
** Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
* Many parties will already build browsers for such publications, with varying ways of navigating such content.  
* Relationship between "popular" and "interesting" items:
** Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc.
** "Interestingness" requires work, since it is the result of an editorial process. This makes it slower, potentially tedious to establish.
** The latter could however feed into the former: items that are widely perceived as interesting
** To best accomplish this we could simplify the workflow of an editorial process.
* Many parties will already build browsers for data log archives, with varying ways of navigating such content.  
** We don't need to duplicate those efforts, but we should integrate with them.
** We don't need to duplicate those efforts, but we should integrate with them.
* It seems useful to be able to link/group individual messages
* It seems useful to be able to annotate content (with text, links)
* It seems useful to be able to contribute anonymously
* ...
* ...


== Requirements ==
== Addressing Schemes for Archives ==


* Need a shared addressing scheme that works across document browsers
* Need a shared addressing scheme that works across archives, archive browsers
** Based on permalinks
** Based on permalinks
** we can translate between different approaches
* Alternatively: need a method of translating between different addressing schemes
* Need ways to link/group individual messages
** Start with a review of link structures of a wide spectrum of archives
* Need ways to identify level of "interestingness" (e.g. voting, cf. Q&A sites)
* Should publish best practises for a good addressing scheme
* Need ways to annotate content (with text, links)
** Document and possibly share the structure of individual addressing schemes
* Contribution should be possible anonymously
** Publish recommendations for addressing schemes, terminology used: common conventions help
* ...
* ...



Revision as of 16:05, 18 December 2010

Problem Statement

We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?

  • Can we allow large number of interested parties (anyone really) to annotate these documents?
    • What kinds of annotations do we want to make? (Information structure)
    • How can we make that easy? (Tools)
    • Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
  • Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?

Exemplary Publications

Observations

  • What constitutes an "interesting" section of a document is a matter of perspective.
    • Thus, such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
  • Relationship between "popular" and "interesting" items:
    • Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc.
    • "Interestingness" requires work, since it is the result of an editorial process. This makes it slower, potentially tedious to establish.
    • The latter could however feed into the former: items that are widely perceived as interesting
    • To best accomplish this we could simplify the workflow of an editorial process.
  • Many parties will already build browsers for data log archives, with varying ways of navigating such content.
    • We don't need to duplicate those efforts, but we should integrate with them.
  • It seems useful to be able to link/group individual messages
  • It seems useful to be able to annotate content (with text, links)
  • It seems useful to be able to contribute anonymously
  • ...

Addressing Schemes for Archives

  • Need a shared addressing scheme that works across archives, archive browsers
    • Based on permalinks
  • Alternatively: need a method of translating between different addressing schemes
    • Start with a review of link structures of a wide spectrum of archives
  • Should publish best practises for a good addressing scheme
    • Document and possibly share the structure of individual addressing schemes
    • Publish recommendations for addressing schemes, terminology used: common conventions help
  • ...

Links