Meshups- embedding content from other websites, mostly maps: In netarchive: no map – just a...

7
Meshups- embedding content from other websites, mostly maps: http://dinby.dk In netarchive: no map – just a ”black hole” – no solution netarkivet

Transcript of Meshups- embedding content from other websites, mostly maps: In netarchive: no map – just a...

Page 1: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

Meshups- embedding content from other websites, mostly maps:http://dinby.dk

In netarchive: no map – just a ”black hole” – no solution netarkivet

Page 2: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

Flash:Ex.:http://www.b.dk/billedeserier/skoejtekongernehttp://viborg-folkeblad.dk/foto/galleri-volume-and-dance-ix-2011-i-tinghallen

In netarchive: flash player tries loading (ongoing), only thumbnails are visible netarkivet

Page 3: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

Sound (radio)Streaming does not workmp3-files: test-harvesting with new template: Default_order_with_xml-extraction_10levels (2 levels would be enough)Ex: den2radio.dk -> http://feed.podcastmachine.com/podcasts/70/mp3.rssMidifiles.dk -> http://www.midifiles.dk/articlelist.51

Solution: creation of a new template netarkivet

Page 4: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

videoHarvested- ex.: Trier.gyldendal.dk ->http://lmp.lynxmedia.dk/Trier/intro_til_temaplayer/1181299957075374541/export_popup?serverinfo=1181299892540916715&skin=Trier/Trier&mode=clip# Not harvested – ex: Folketinget (http://www.ft.dk/webtv/video/20111/salen/14.aspx?as=1 ) -> (live) streaminghttp://kino.dk -> streaming Jp.dk -> http://jp.dk/jptv/ more and more news sites display videos on their sites.

netarkive

Page 5: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

More than 50% of the problems we have are about video cntent

Another big issue: login/password-content

1. scenario: the website ownergives acces for our harvesters IP-adresses -> that works fine (ex.: mediawatch.dk)

2. Scenario: the website owner delivers login and password -> a developper task *)

Last not least: Facebook and Twitter….. Never ending stories **)

netarkive

Page 6: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

*) Password content3 methods

with cookies (does not work any more)html-login: no solutionhttp-login: addition to the template

Ex. http login: finanswatch.dk – template addition: <newObject name="finanswatch_login_1" class="org.archive.crawler.datamodel.credential.HtmlFormCredential"> <string name="credential-domain">finanswatch.dk/login</string> <string name="login-uri">https://secure.finanswatch.dk/mainLogin?hidden=true&mode=</string> <string name="http-method">POST</string> <map name="form-items"> <string name="j_username">[email protected]</string> <string name="j_password">netarkivet</string> <string name="_spring_security_remember_me"/> <string name="loginButton">Log ind</string> <string name="spring-security-redirect">https://secure.finanswatch.dk/mainLogin?hidden=true&mode=</string> </map> </newObject> netarkive

Page 7: Meshups- embedding content from other websites, mostly maps:  In netarchive: no map – just a ”black hole” – no solution netarkivet.

**) Facebook: Special template, under ongoing revision (following Facebook changes)

Twitter: Next button does not work in the archive.Solution: harvesting 6 times a day (frontpage). Twitter harvests do not at all work any more scince Twitter began putting !# into the url’s of the Twitter profiles.

netarkive