1,496
edits
No edit summary |
|||
Line 11: | Line 11: | ||
Note: | Note: | ||
* These notes are limited to | * These notes are limited to text document corpuses, and won't attempt to incorporate numerical/statistical/other data repositories. | ||
* Specifically no attempt is made to address information within a document, or to address information aggregated across documents, if such derivative forms don't already exist. | * Specifically no attempt is made to address information within a document, or to address information aggregated across documents, if such derivative forms don't already exist. | ||
== Exemplary Publications == | == Exemplary Publications == | ||
The ultimate goal: | The ultimate goal: | ||
* Across mirrors (same addressing scheme, different location) | * Being able to construct links between multiple representations of the same documents | ||
* Across types of archive browsers (may have different addressing schemes, will have different locations) | ** Across mirrors (same content, same addressing scheme, different location) | ||
** Across types of archive browsers (some may have further annotations, all may have different addressing schemes, all will have different locations) | |||
* Being able to identify existing references by detecting such links | |||
** E.g. via Twitter/Google search | |||
Look out for: | Look out for: | ||
Line 27: | Line 30: | ||
* Additional presentation information, e.g. reading offset | * Additional presentation information, e.g. reading offset | ||
* How to construct canonical URLs from document IDs | * How to construct canonical URLs from document IDs | ||
* Existing services that interact with this archive | |||
'''WikiLeaks Iraq War Logs''' | '''WikiLeaks Iraq War Logs''' | ||
* Document URL: http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ ([http://web.archive.org/web/20101030053024/http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ archive.org]) | * Document URL: http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ ([http://web.archive.org/web/20101030053024/http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ archive.org]) | ||
* Document ID: BCD499A0-F0A3-2B1D-B27A2F1D750FE720 | * Document ID: BCD499A0-F0A3-2B1D-B27A2F1D750FE720 | ||
* Document also has a "Tracking number": 20091223210038SMB | |||
* To construct a canonical document URL: Base URL + document ID | * To construct a canonical document URL: Base URL + document ID | ||
* Other services: | |||
** URLs are actively being shared (and annotated) on Twitter | |||
* Other observations: | |||
'''WikiLeaks Embassy Cables''' | '''WikiLeaks Embassy Cables''' | ||
Line 38: | Line 46: | ||
* Browsers: [http://cablesearch.org/ cablesearch.org], | * Browsers: [http://cablesearch.org/ cablesearch.org], | ||
* To construct a canonical document URL: Base URL + document ID | * To construct a canonical document URL: Base URL + document ID | ||
* Other services: | |||
** URLs are actively being shared (and annotated) on Twitter | |||
'''SpaceLog''' | '''SpaceLog''' | ||
Line 50: | Line 60: | ||
** Document range URL: URL template, corpus ID, document IDs, reading offset | ** Document range URL: URL template, corpus ID, document IDs, reading offset | ||
** No means to query corpus ID, reading offset for a document ID | ** No means to query corpus ID, reading offset for a document ID | ||
* Other services: | |||
** URLs are actively being shared (and annotated) on Twitter | |||
* Other observations: | * Other observations: | ||
** Has rel="canonical" | ** Has rel="canonical" | ||
Line 64: | Line 76: | ||
* To construct canonical document URL: base URL + corpus ID + document ID | * To construct canonical document URL: base URL + corpus ID + document ID | ||
** Can query corpus ID (username) via e.g. http://dev.twitter.com/doc/get/statuses/show/:id | ** Can query corpus ID (username) via e.g. http://dev.twitter.com/doc/get/statuses/show/:id | ||
* Other services: | |||
** ExquisiteTweets allows to group tweets | |||
'''Eur-Lex''' | '''Eur-Lex''' | ||
Line 75: | Line 89: | ||
* Other observations: | * Other observations: | ||
** Corpus ID could be understood as part of the document ID, we may not need to treat them separately | ** Corpus ID could be understood as part of the document ID, we may not need to treat them separately | ||
* Other services: | |||
** TODO. They are likely to exist. | |||
'''TODO''' | '''TODO''' |