User:Martind/Document Log Discovery Platform: Difference between revisions

User:Martind/Document Log Discovery Platform (view source)

Revision as of 19:19, 18 December 2010

646 bytes added , 18 December 2010

no edit summary

Martind

Administrators

1,496

edits

@@ Line 11: / Line 11: @@
 Note:
-* These notes are limited to (text) document corpuses, and won't attempt to incorporate numerical/statistical/other data repositories.
+* These notes are limited to text document corpuses, and won't attempt to incorporate numerical/statistical/other data repositories.
 * Specifically no attempt is made to address information within a document, or to address information aggregated across documents, if such derivative forms don't already exist.
 == Exemplary Publications ==
-The ultimate goal: being able to construct links between multiple representations of the same documents
+The ultimate goal:
-* Across mirrors (same addressing scheme, different location)
+* Being able to construct links between multiple representations of the same documents
-* Across types of archive browsers (may have different addressing schemes, will have different locations)
+** Across mirrors (same content, same addressing scheme, different location)
+** Across types of archive browsers (some may have further annotations, all may have different addressing schemes, all will have different locations)
+* Being able to identify existing references by detecting such links
+** E.g. via Twitter/Google search
 Look out for:
@@ Line 27: / Line 30: @@
 * Additional presentation information, e.g. reading offset
 * How to construct canonical URLs from document IDs
+* Existing services that interact with this archive
 '''WikiLeaks Iraq War Logs'''
 * Document URL: http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ ([http://web.archive.org/web/20101030053024/http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ archive.org])
 * Document ID: BCD499A0-F0A3-2B1D-B27A2F1D750FE720
+* Document also has a "Tracking number": 20091223210038SMB
 * To construct a canonical document URL: Base URL + document ID
+* Other services:
+** URLs are actively being shared (and annotated) on Twitter
+* Other observations:
 '''WikiLeaks Embassy Cables'''
@@ Line 38: / Line 46: @@
 * Browsers: [http://cablesearch.org/ cablesearch.org],
 * To construct a canonical document URL: Base URL + document ID
+* Other services:
+** URLs are actively being shared (and annotated) on Twitter
 '''SpaceLog'''
@@ Line 50: / Line 60: @@
 ** Document range URL: URL template, corpus ID, document IDs, reading offset
 ** No means to query corpus ID, reading offset for a document ID
+* Other services:
+** URLs are actively being shared (and annotated) on Twitter
 * Other observations:
 ** Has rel="canonical"
@@ Line 64: / Line 76: @@
 * To construct canonical document URL: base URL + corpus ID + document ID
 ** Can query corpus ID (username) via e.g. http://dev.twitter.com/doc/get/statuses/show/:id
+* Other services:
+** ExquisiteTweets allows to group tweets
 '''Eur-Lex'''
@@ Line 75: / Line 89: @@
 * Other observations:
 ** Corpus ID could be understood as part of the document ID, we may not need to treat them separately
+* Other services:
+** TODO. They are likely to exist.
 '''TODO'''