User:Martind/Document Log Discovery Platform: Difference between revisions
From London Hackspace Wiki
No edit summary |
|||
Line 20: | Line 20: | ||
== Observations == | == Observations == | ||
=== Editorial Functions === | |||
* It seems useful to be able to link/group individual messages | |||
* It seems useful to be able to annotate content (with text, links) | |||
* It seems useful to be able to contribute anonymously | |||
* It seems useful to be able to annotate/qualify editorial contributions by others | |||
=== Interestingness, Popularity === | |||
* What constitutes an "interesting" section of a document is a matter of perspective. | * What constitutes an "interesting" section of a document is a matter of perspective. | ||
** | ** Such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X") | ||
* Relationship between "popular" and "interesting" items: | * Relationship between "popular" and "interesting" items: | ||
** Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc. | ** Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc. | ||
** "Interestingness" requires work, since it is the result of an editorial process. This makes it slower, potentially tedious to | ** "Interestingness" requires more work, since it is the result of an editorial process. This makes it slower, potentially tedious to demonstrate, and error prone. | ||
** The latter could however feed into the former: items that are widely perceived as interesting | ** The latter could however feed into the former: items that are widely perceived as interesting | ||
** To best accomplish this we | *** To best accomplish this we should attempt to simplify the workflow of an editorial process. | ||
=== Interoperability === | |||
* Many parties will already build browsers for data log archives, with varying ways of navigating such content. | * Many parties will already build browsers for data log archives, with varying ways of navigating such content. | ||
** We don't need to duplicate those efforts, but we should integrate with them. | ** We don't need to duplicate those efforts, but we should integrate with them. | ||
=== Content licenses === | |||
* A good discovery platform may want to republish content from linked archives to be able to present them in a coherent manner | |||
* | |||
** In the case of data published by governments and NGOs the data may either be in the public domain, or will have an explicit license | ** In the case of data published by governments and NGOs the data may either be in the public domain, or will have an explicit license | ||
** In the case of document leaks the legal status of this data may not be clear obvious, and there may not be an explicit license | ** In the case of document leaks the legal status of this data may not be clear obvious, and there may not be an explicit license |
Revision as of 16:17, 18 December 2010
Problem Statement
We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
- Can we allow large number of interested parties (anyone really) to annotate these documents?
- What kinds of annotations do we want to make? (Information structure)
- How can we make that easy? (Tools)
- Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
- Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
Exemplary Publications
- WikiLeaks datalog dumps
- Iraq War Logs
- Embassy Cables
- http://spacelog.org/
- http://news.ycombinator.com/item?id=1958292 "I would love to get an in-depth technical explanation of the requests and procedures -- how all this stuff works and insights into the troubleshooting process."
- etc
Observations
Editorial Functions
- It seems useful to be able to link/group individual messages
- It seems useful to be able to annotate content (with text, links)
- It seems useful to be able to contribute anonymously
- It seems useful to be able to annotate/qualify editorial contributions by others
Interestingness, Popularity
- What constitutes an "interesting" section of a document is a matter of perspective.
- Such annotations become more useful if they're linked to a context (e.g. "this cable relates to news story X")
- Relationship between "popular" and "interesting" items:
- Much easier to establish "popularity" via simple (implicit, explicit) voting mechanisms: Q&A sites, collaborative news sites, click tracking etc.
- "Interestingness" requires more work, since it is the result of an editorial process. This makes it slower, potentially tedious to demonstrate, and error prone.
- The latter could however feed into the former: items that are widely perceived as interesting
- To best accomplish this we should attempt to simplify the workflow of an editorial process.
Interoperability
- Many parties will already build browsers for data log archives, with varying ways of navigating such content.
- We don't need to duplicate those efforts, but we should integrate with them.
Content licenses
- A good discovery platform may want to republish content from linked archives to be able to present them in a coherent manner
- In the case of data published by governments and NGOs the data may either be in the public domain, or will have an explicit license
- In the case of document leaks the legal status of this data may not be clear obvious, and there may not be an explicit license
Addressing Schemes for Archives
- Need a shared addressing scheme that works across archives, archive browsers
- Based on permalinks
- Alternatively: need a method of translating between different addressing schemes
- Start with a review of link structures of a wide spectrum of archives
- Should publish best practises for a good addressing scheme
- Document and possibly share the structure of individual addressing schemes
- Publish recommendations for addressing schemes, terminology used: common conventions help
- ...
Links
- http://booktwo.org/notebook/openbookmarks/ check these sites for bookmark/annotation conventions
- http://www.openbookmarks.org/ (focused on ebooks)