User:Martind/Document Log Discovery Platform: Difference between revisions

User:Martind/Document Log Discovery Platform (view source)

144 bytes added , 18 December 2010

1,496

edits

@@ Line 1: / Line 1: @@
 == Problem Statement ==
-We're seeing an increase in the publication of vast corpuses of data logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
+We're seeing an increase in the publication of vast corpuses of document logs, often in the form of message archives, usually in a structured message format. They're all quite overwhelming: how to make sense of such a vast amount of text? How to identify sections that are relevant?
 * Can we allow large number of interested parties (anyone really) to annotate these documents?
@@ Line 8: / Line 8: @@
 ** Can we identify good conventions and techniques for the above that are more generally applicable? (Patterns of use)
 * Finally, can we think of these functions as a layer on top of mere archives, and construct them as a physically separate service?
+Note:
+* These notes are limited to text document corpuses, and won't attempt to incorporate numerical/statistical/other data repositories.
 == Exemplary Publications ==