Anonymous

User:Martind/Document Log Discovery Platform: Difference between revisions

From London Hackspace Wiki
Line 1: Line 1:
== Problem Statement ==
== Problem Statement ==


We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
We're seeing an increase in the publication of vast corpuses of document logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?


* Can we allow large number of interested parties (anyone really) to annotate these documents?
* Can we allow large number of interested parties (anyone really) to annotate these documents?
Line 8: Line 8:
** Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
** Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
* Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
* Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
Note:
* These notes are limited to text document corpuses, and won't attempt to incorporate numerical/statistical/other data repositories.


== Exemplary Publications ==
== Exemplary Publications ==