Anonymous

User:Martind/Document Log Discovery Platform: Difference between revisions

From London Hackspace Wiki
Line 15: Line 15:


Look out for:
Look out for:
* Canonical URLs (implicit, or explicit via rel="canonical")
* Canonical document URLs (implicit, or explicit via rel="canonical")
* Corpus IDs
* Document IDs
* Document IDs
* Document ranges for timeline browsers
* Document ranges for timeline browsers
* Additional presentation information, e.g. reading offset
* Additional presentation information, e.g. reading offset
* How to construct canonical URLs from document IDs


'''WikiLeaks Iraq War Logs'''
'''WikiLeaks Iraq War Logs'''
* Document URL: http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ ([http://web.archive.org/web/20101030053024/http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ archive.org])
* Document URL: http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ ([http://web.archive.org/web/20101030053024/http://warlogs.wikileaks.org/id/BCD499A0-F0A3-2B1D-B27A2F1D750FE720/ archive.org])
* Document ID: BCD499A0-F0A3-2B1D-B27A2F1D750FE720
* Document ID: BCD499A0-F0A3-2B1D-B27A2F1D750FE720
* To construct a canonical document URL: Base URL + document ID


'''WikiLeaks Embassy Cables'''
'''WikiLeaks Embassy Cables'''
Line 28: Line 31:
* Document ID: 10COPENHAGEN69
* Document ID: 10COPENHAGEN69
* Browsers: [http://cablesearch.org/ cablesearch.org],  
* Browsers: [http://cablesearch.org/ cablesearch.org],  
* To construct a canonical document URL: Base URL + document ID


'''SpaceLog'''
'''SpaceLog'''
Line 35: Line 39:
* Document ID: 01:06:43:11
* Document ID: 01:06:43:11
** not enough to form a URL. Need to know corpus ID to construct permalink
** not enough to form a URL. Need to know corpus ID to construct permalink
* Reading offset ID: #log-line-110591 ("log-line-110591" alone doesn't seem to suffice to construct a link)
* Reading offset ID: #log-line-110591
* has rel="canonical"
* has rel="canonical"
* To construct canonical URLs:
** Document URL: URL template, corpus ID, document ID, reading offset
** Document range URL: URL template, corpus ID, document IDs, reading offset
** No means to query corpus ID, reading offset for a document ID


'''Twitter'''
'''Twitter'''
Line 44: Line 52:
* Corpus ID: wikileaks
* Corpus ID: wikileaks
* Document ID: 15975805188317184  
* Document ID: 15975805188317184  
** not enough to form a URL. Need to query corpus ID (username) via e.g. http://dev.twitter.com/doc/get/statuses/show/:id and then manually construct permalink
* To construct canonical document URL: base URL + corpus ID + document ID
** Can query corpus ID (username) via e.g. http://dev.twitter.com/doc/get/statuses/show/:id


== Observations ==
== Observations ==