Aerial Imagery Update for Digimap for Schools

We have recently updated our aerial imagery in the Digimap for Schools service.  This has been quite a major event in our calendar with a huge amount of data being updated.  The update consisted of approximately 80,000 individual 1km tiles, all of which were captured in 2015, which is approximately 30% of the country.

Prior to this update just over 50% of the data was from 2013 or later, this now has increased to 77% of the data now being from 2013 or later.

This means that more up to date imagery is now available in Digimap for Schools for a significant part of the country.  The map below shows the approximate distribution of the updated data.  http://digimap.blogs.edina.ac.uk/files/2017/03/2015_aerial_update.png

2015_aerial_update

Click on the map to view a larger version

This is the first update we have received from Getmapping, and we are expecting another update later this year containing data captured in 2016.  This data will obviously be introduced as quickly as possible into the service, ensuring that the most up to date data is always available to Digimap for Schools users.

We’ve included a couple of nice images we happened to stumble upon whilst playing around with the new aerial imagery.  The first is an image of a Cruise liner in the Firth of Forth.  This is particularly nice as it illustrates the quality of the imagery where you can literally measure the basketball court.  

cruiseliner

Cruise Liner on the Firth of Forth

We also found another fantastic example, one which surprised the entire Digimap for Schools team as it has been built with such precision it looks somewhat other worldy…

Solar FarmCanworthy Solar Farm: Canworthy Solar Farm, which became operational in 2014 and covers approximately 55 hectares (~67 football fields)

NOTE: If you want to find it yourself, search for Canworthy in Digimap for Schools,  then use buffer tool to measure 1 mile from the T-junction at Canworthy Water, slide to Aerial or AerialX and you’ll see it to the NE of Canworthy Water just beyond the buffer circle.

Please feel free to have a good dig around as there are undoubtedly plenty of other hidden gems out there.  Do let us know if you do find anything of interest, we like to let our users know about these little gems.

 

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the service over the past week. The dates displayed indicate when files were received by SUNCAT.

  • Bath University (01 Jun 17)
  • British Library (23 Jun 17)
  • CONSER (Not UK Holdings) (21 Jun 17)
  • Dundee University (01 Jun 17)
  • Edinburgh Napier University (01 Jun 17)
  • Exeter University (02 Jun 17)
  • Kingston University (01 Jun 17)
  • Lancaster University (01 Jun 17)
  • Leeds University (16 Jun 17)
  • Nottingham University (01 Jun 17)
  • Royal College of Music (16 Jun 17)
  • Royal College of Nursing (01 Jun 17)
  • School of Oriental and African Studies (SOAS) (12 Jun 17)
  • Southampton University (17 Jun 17)
  • York University (01 Jun 17)

To check on the currency of other libraries on SUNCAT please check the updates page for further details.


New Updated User Guide.

Darren Bailey from the Ordnance Survey recently contacted us with an updated Digimap for Schools User Guide (thank you Darren!).  This User Guide is incredibly comprehensive and covers every element of the service.  This updated version does a fantastic job of instructing users on how to use some of our newly released features i.e. the Map Manager and Geograph functionality.

This User Guide is a fantastic resource and gives clear and simple instructions on how to use the full functionality of the service.  We recommend that all users refer to this document if they have any issues or problems when using the service.

 

To download this resource follow this link: http://digimapforschools.edina.ac.uk/schools/Resources/allstages/userguide.pdf

 

Screen Shot 2017-06-23 at 13.04.25

 

 

Guardian Teacher Network Seminar: Technology in schools: money saver or money waster? – Belated Liveblog

Last Thursday I attended the Guardian Teacher Network Seminar: Technology in schools: money saver or money waster? at Kings Place, London.The panel was chaired by Kate Hodge (KH), head of content strategy at Jaywing Content and former editor of the Guardian Teacher Network, and featured:

  • John Galloway (JG), advisory teacher for ICT/special educational needs and inclusion, Tower Hamlets Council.
  • Donald Clark (DC), founder, PlanB Learning and investor in EdTech companies with experience of teaching maths and physics in FE in the UK and US.
  • Michael Mann (MM), senior programme manager, education team, Nesta Innovation Lab.
  • Naureen Khalid (NK), school governor and co-founder of @UkGovChat.

These are my live notes from the event – although these are a wee bit belated they are more or less unedited so comments, corrections, additions etc. are welcomed. 

The panel began with introductions, mainly giving an overview of their background. The two who said a wee bit more were:

John Galloway, specialist on technologies for students with special needs and inclusion, I work half time at Tower Hamlets with students but also a lot of training. It’s the skills of adults that is often the challenge. The rest of my time I consult, I’m a freelance writer, I am a judge of the BETT awards.

Michael Mann (MM), NESTA, our interest is that we don’t think EdTech has reached its potential yet… Our feeling is that we haven’t seen that impact yet. And since our report five years ago we’ve invested in companies and charities who focus on impact. Also do research with UCL, and work with teachers to trial things in real classrooms.

All comments below are credited to the speakers with their initials (see above), and audience comments and questions are marked as such… 

KH: What’s the next big thing in tech?

DC: It’s AI… It’s the new UI no matter what you use really… I only invest in AI now… Education is curiously immune from this at the moment but it won’t be… It is perfect for providing feedback and improving the eLearning experience – that crappy gamification or read then quiz experience… We are in a funny transitionary phase..

MM: There has been an interesting trend recently where specialist kit is becoming mainstreams… touch screens for instance, or speech to text… So, I think that is closing the gap between our minds and our machines… The gap is closing… The latest thing in special education needs have been eye games – your eyes are the controller… That is moving into mainstream gaming so that will become bigger… So I see a bigger convergence there… And the other thing I see happening is VR. That will allow children to go places they can’t go – for all kids but that has particular benefits and relevance for, say a child in a wheelchair. For autistic children you put them in environments so they can understand size, lights, noise, and deal with the anxiety… before they visit…

KH: What are the challenges of implementing that in the classroom

JG: The tech – and costs, the space… But also the creativity… A lot of what’s created are not particularly engaging or educational. I’d like to see teachers able to make things themselves… And then we need to think about pedagogy… But that’s the big issue…

DC: I can give you an example in the context of teaching Newton’s Laws with kids… We downloaded a bunch of VR apps… And NASA apps there was great for understanding and really feeling Newton’s three audience… Couldn’t do that with a blackboard… And that’s all free…

KH: How accessible is that… ?

DC: Almost every kid has a smartphone… Google Cardboard is maybe £5… It’s very cheap… It won’t replace a teacher, at least not yet. I wouldn’t teach basic mathematics with VR, but I wouldn’t teach Newton’s three laws any other way…

MM: We are piloting a thing called RocketFund and one of the first people to use VR used it in history… After that ran we have about 10 projects because they’d seen what was possible…

DC: “Fieldtrips” can be free… I’ve also seen a brilliant project with a 360 degree camera in a classroom used in a teaching space – a £250 camera – and brilliant for showing issues with behaviour, managing the classroom etc.

NK: Now if something is free, I would have no objection at all!

KH: How do you measure impact?

NK: Well if someone has a really old PC and it runs slow… that’s a quick and clear impact. But it’s about how they will use it, what studies are there and are they reliable… Could you do this any other way? What’s different?

MM: A lot of these technologies do not have evidence on them… But you will have toolkits, ideas that are well grounded on peer instruction, or tutoring… If you can take pedagogical approaches and link it to a tool you are using, that’s great. There’s work on online tutoring, and there is a company which provides tutoring from India… And I want to know how they ensure that they follow established criteria…

DC: I think we’ve had a lot of device fetishism… We’ve seen huge amounts of tablets imposed… and abandoned… You have to regard tech as a medium – not a gadget or a school. I think we’ve had disastrous experiences with iPads in secondary schools… They work in primary schools but actually writing on iPads doesn’t work well… It’s a disaster… And it’s a consumer devices not enabling higher order writing, coding, creation skills… I recommend that you look at Audrey Mullen’s work – she was a school kid when she started a company called Kite Reviews… She said we don’t want tablets or mobiles, that laptops were better…

Comment: What about iPads in schools… I did a David Hockney project with Year 10 students, that riffed off his use of iPads and the students really engaged with it… I’ve also used it in a portrait project as well… And one of the things I’m interested it is how you use it in more than writing and literacy…

JG: I just want to come back to measuring impact… It depends what you want to use it for… Donald gave us an example of using an iPad for the wrong thing, and from the audience that example of using iPads in the right ways… No-one in industry would code on an iPad… We have to use technology appropriate to the context and the wider world.

KH: How would you know that?

JG: As a teacher you have to gain expertise and transfer that to your teaching…

KH: You might be an expert in history but not in ITT…

JG: As a teacher you have to understand the technology you are being given to use… You have to understand the pedagogy… And you have to prove to teachers that the technology will improve their practice… I’m not sure any teacher has ever taught the perfect lesson, you always can think of ways to improve that… And that’s how you consider your work… One of the best innovations in teaching have been TeachMeets – informal exchanges of practice, experiences, etc. The reason technology in classrooms is not as successful as it should be are complex…

NK: I know of someone who purchased an app, brought into it, send people off to training… But it was the wrong app or what you are trying to do… So do the research first before you purchase anything…

DC: I think that the key word here is procurement… And teachers shouldn’t be doing that with hardware… You have to start with teaching needs, but actually general school software too – website, comms with parents, VLEs etc… It’s back end stuff… Take the art example… I know lots of artists… none using iPads… They use more sophisticated computers that enable the same stuff and more… It’s not David Hockney, that’s the tail wagging the dog… It’s general needs… Most kids have devices… I’d spend money on topping up for inclusion… And you have to do that cost benefit analysis first…

MM: Cost benefit analysis and expert approaches isn’t realistic in many schools… Often it’s more realistic to do small scale trialling… If it works, guide their peers, if not, then quite there… Practical experimentation, test and learn is the way forward I would say…

JG: I think that the challenge is often the enthusiast… You need to give things to the cynic!

DC: There is a role for sensible professional advice. In Higher Ed we have Jisc, we are quite sensible… But we don’t have that advice available for schools… It all goes a bit odd… It’s all anecdotal rather than evidence based… Otherwise we are just pottering about… And we end up with the lowest common denominator in terms of skills and understanding…

JG: I’m getting a bit nostalgic for BECTA, and NESTA FutureLab… doing interesting stuff. A lot of research now is funded by companies engaged in the research…

MM: I agree… but there is no evidence for white boards, tablets, whatever as they don’t work on their own… Has to be evidence informed…

DC: Cost effectiveness is always about tech as an intervention in education… The evidence for schools is that writing accuracy goes down 31% and is a huge problem on tablets… Unless…

NK: There’s good evidence that typing notes in class doesn’t work

DC: Absolutely… Although there is plenty of evidence that lectures don’t work ad we still do that… They have power devolved and in my view they are not really teachers… That happens every day…

Comment from audience: That doesn’t happen every day…

MM: We have to be careful about how we use the word evidence… Lectures may not be correlated with success but that may be to do with the quality of teaching staff, of lecturers…

KH: One of you talked about giving technology to the cynic… How do you overcome this…

JG: I think that the doubter, the cynic… will ask all the questions, find all the faults… But also see what works if it works…

KH: Often use of tech comes down to the enthusiasts and evangelists… But teachers lack space to be creative… How can we adopt technology if we lack that time and opportunity…

JG: We have so much more technology now, it has permeated our lives more… Our thinking, our discussion, potentially our classrooms… But I haven’t seen smartphones in schools much yet… We haven’t talked about bring your own device… There is an element of risk.. potential for videoing, for sharing bad practice, for bullying and harassment… But there is a lot of nervousness there…

DC: I think we have to move away from just thinking about technology in the classroom. I’m dead against it. Bring tech into a room in a one-to-many context… I’d rather use learner technology… Good teachers are teachers in the classroom… Kids really use tech at home, with homework… When you struggled when I was a kid you got stuck… but now you can use devices… to find the answer but also the method… And we have adaptive learning that can tailor to every kid. I think learner technology and away from the classroom is where it needs to be… Rather than the smart board debacle… Where one minister brought that in, Promethean made millions…

JG: I don’t recognise the classroom you are describing… I see teachers using technology, with big changes over the last twenty years… It is the appropriate use of technology in the appropriate places in learning… And thinking about the right technology for the job… If we took technology out of the classroom we’d just have lectures wouldn’t we?!

DC: The issue of collaboration is interesting… There is work from Stanford that many group works/collaborative technological driven things in the classroom… That most kids aren’t doing anything, but it looks collaborative… versus a good teacher doing the Socratic thing…

MM: I don’t think the in/outside the classroom thing is as important as the issue of what works, how things adapt, immediate feedback to with FitTech…. But it all comes back to pedagogy….

NK: It all comes back to what the problem is that you are trying to solve…

KH: What about the right way to do this… There’s the start-up like run fast, fail fast approach… Then the procurement approach…

NK: We want evidence based procurement… I don’t want to fund trials… Schools are poor…

KH: Start ups don’t throw it and see if it works… They use data to change their approach…And that’s what I’m talking about… Trialling then using evidence to inform decisions…

DC: The last thing I want to do is to waste time or money with start ups going into schools… I think taking risks in schools like that is very risky… I’m also not sure governors should be procuring… The senior team should… But often there is no digital strategy… It needs to be tactical not strategic…

JG: Suppose we get the kids to assess the start up product… There is a great project called Apps For Good… It gets kids to engage in the idea, the design process, the entrepreneurial aspect… There is a role for start ups for teaching kids about how this happens… I think education is a risky business anyway… We think something good will happen, kids have to trust the teacher… I think risk can be quite a healthy thing, and managing risk… Introducing something new can be edgy and can be quite invigorating…

NK: As a governor I don’t want my school going into the red financially… We need to operate within our means…

KH: It wasn’t about start ups in the classrooms… Even a small spend…. Can be risky…

MM: Isn’t there a risk of a big roll out of something that doesn’t work for your school? Some risks will feel riskier than others… School culture and character all mater…

JG: We do have examples of technologies that didn’t work but now do… VLEs didn’t take off… Schools don’t use them… It was an expensive risk… But many use Google Classroom which is essentially the same thing… It’s free but needs maintenance…

DC: Actually with new start ups… you want evidence, you want research to prove the usefulness. 50% of start ups fail, and you don’t want to adopt stuff that will fail…

JG: But someone has to try things first, to try new things, to bring something new into the classroom.

KH: How do we take Ed Tech forward… ?

DC: At risk of repeating myself… Professional procurement, technology strategy, strategic leadership in this…

Comment from crowd: Where do you get the evidence if you don’t test it in the classroom…

DC: I am involved in a big adaptive learning company… We are doing research with Cambridge University…

Comment from crowd: so for the schools taking part, that is a risk!

DC: No, it’s all carefully set up, with control groups… Not just by recommendation by colleagues…

JG: Setting up trials in schools in incredibly difficult, especially with control groups… Even if you do that you have to look at who was teaching, who was unwell then, etc. It’s very very hard to compare… And if it is showing improvement then morally should you withhold that technology from some pupils… One of the trials I can think of was around use of iPads… Give them own budget for apps.. But give them free choice… And then have them talk about that… It’s a trial but it’s very low cost, it’s very effective, it’s judging fit of tech to the space…

NK: I’ve known schools go for the iPad whether or not it works… Why go for the most expensive tablets… to try them!

DC: In the US there was a 1.3bn deal with Apple in California… And iPads are not there now… They now use Chrome Books…

JG: But that was imposed from the top.. And that’s an important issue…

Comment: I want to take issue with something Donald was talking about… I am all in favour of evidence based research and everything… But it is hard to find time to find the research, and a lot of effort to actually read through it… 3 pages of methodology before the conclusion… By the time it’s published it’s out of date anyway… I write about evidence on my website and often no firm conclusions come out of this… Ultimately anecdotal evidence matters… Asking questions of what was this trying to solve, what worked, what didn’t… Question: does Donald agree with me.

DC: No!

Comment: We all know the digital age is coming, kids have to work with computers, how can schools prepare children for that work and keep traditional teaching too..?

MM: For me there are two aspects: digital skills like codeclubs, programming… The other side is that when we are in this world with automation, what sort of jobs will survive… We have a report at Nesta called Creativity vs Robots… Skills that are most robust are creative, collaborative, dexterous… Preparing kids for the future still requires factual knowledge but also collaborative and problem solving skills… It’s not that it doesn’t exist, we just really need to focus on that…

JG: Maybe controversially I will say that we don’t… We should teach flexibility and to learn. A few years back I wrote for TimeEd… I visited Harrow- relatively unlimited funding… They don’t teach computing… They don’t get there until Year 9… Prep schools don’t teach it… Not “academic” enough fpr A-level or GCSE. They do some ICT skills… I guess they will get jobs, good ones…But they don’t prepare them for that… They prepare them to be leaders and the elite… I’m not necessarily sold on the idea that you have to prepare kids to be the makers… We teach reading and writing, but not digital literacy… Or how to read a film or a computer game, why failure is important… We don’t teach that… We might teach them how to create the game… So in part “don’t” and in part “expand the curriculum”

Comment: For Mr Galloway… Why did you go to Harrow not Eton… They invest in innovation and you get to be amused at top hats and tails?

JG: Tube ride!

DC: It would be madness to ignore technology in schools… But coding is this year’s thing… ! Kids need skills when they leave school…

NK: I have great problems with the idea of 21st Century skills… We can’t train kids for jobs they don’t exist… Jobs from hundreds of years ago….

MM: There is a social justice aspect here… Mark Zuckerberg went to one of the top schools… If we don’t expose all children to technology opportunities they can miss out…

JG: In Harrow they don’t impose technology on teachers… but they get it if they ask for it. They also give kids Facebook account sand teach them how to use it…

Comment: When we think about technology in schools, when do we think about teachers perspective… can we motivate and engage students with 21st century skills and possibilities…

NK: With all the money in the world, yes. We are in the position where schools can barely afford the teachers… We have to live within their means…

DC: Are teachers the right people to teach these skills… Is that what teachers are best suited to that… Not sure subject orientated teachers are well placed for that.

JG: Teachers do teach collaboration. Social media is about relationships… It’s just a form of that… CPD for teachers is outside of school time and that means keen teachers engage there…

MM: Having some teachers into smartphones. Some who are not… Some teachers are into outdoor education and camping… Others are not… You would’t want to exclude kids from the experience of camping… That’s how you can think about the ideas of digital literacy here… Finding the enthusiasm and route in…

Comments: A lot of what we, in this room, know of technology is through past exposure and experience of technology. Children are sponges.. They can often teach the teachers, with scaffolding from the teachers, about this era of technology… The kids are often better and quicker at using the technology… We have to think about where this might lead them…

Comment: On procurement and evidence… Michael talked about small trials… Do we think specific and unique contexts with schools not justify that type of small scale trialling…

MM: I think context is key in trials… Even outside of tech… Approaches like peer learning have great evidence… But the actual implementation can make a big difference… But you have to weigh up whether your context is as unique as you think…

DC: That can also be an excuse… Having been involved in procurement in tech… You don’t throw tech about… You think about what the context is, do serious homework before spending the money… You need the strategy and change management to roll things out and sustaining the effort… That’s almost invariably absent in the school context… Quite haphazard… “everyone’s unique… Let’s just play with this stuff”

Comment, I’m the director of a startup empowering primary aged girls and augmented reality to encourage routes into STEM subjects.: In terms of costs and being a governor… Start ups are obsessed with evidence. One of the best things you can do is work with start ups, they really want that evidence… If you are worried about costs you can trial things… But it is a risk when you are teaching… You were also talking about jobs that don’t exist at the moment… That means new jobs in new fields… One thing that strikes me this evening is that no one has talked about science, technology, arts and maths…. And teachers don’t come in from that route into schools… We’ve been talking to Jim Knight. In primary schools you don’t get labs but you can use AR to do experiments… to look in this area… My point it you’ve been talking about technology, is it worth it… Would have been great to hear someone from positive experiences, or an Ed Tech company… This feels like a lot of slamming down of technology…

JG: Can I talk about positive experiences… Technology is life changing and amazing… removing technology from classrooms is a horrendous… Your example in not having enough good qualified science teachers is an important one…

DC: I am not sure about AR and VR… I’d be careful with some of these things… Hololens isn’t there yet… Leading edge tech is a bit of a honeytrap… I raise VR as its on every phone… and free…

Commenter: AR is on phones… !

KH: Thank you for a really lively discussion!

And with that the rather spirited discussions came to an end! Some interesting things to consider but I felt like there was so much that wasn’t discussed properly because of the direction the conversation took – issues like access to wifi; measures to use but make technology safe – and what they mean for information literacy; technology beyond devices… So, I’d love to hear your comments below on Ed Tech in Schools.

Share/Bookmark

SUNCAT updated

SUNCAT has been updated. Updates from the following libraries were loaded into the service over the past week. The dates displayed indicate when files were received by SUNCAT.

  • Aberystwyth University (01 Jun 17)
  • British Library (15 Jun 17)
  • City, University of London (24 May 17)
  • CONSER (Not UK Holdings) (14 Jun 17)
  • Courtauld Institute of Art (07 Jun 17)
  • King’s College London (01 Jun 17)
  • National Archives (01 Jun 17)
  • National Library of Scotland (01 Jun 17)
  • National Library of Wales (01 Jun 17)
  • Natural History Museum (01 Jun 17)
  • Northumbria University (01 Jun 17)
  • St. Andrews University (09 Jun 17)
  • Sheffield University (01 Jun 17)
  • Southampton University (10 Jun 17)
  • Sussex University (01 Jun 17)
  • Swansea University (01 Jun 17)

To check on the currency of other libraries on SUNCAT please check the updates page for further details.


IIPC WAC / RESAW Conference 2017 – Day Three Liveblog

It’s the final day of the IIPC/RESAW conference in London. See my day one and day two post for more information on this. I’m back in the main track today and, as usual, these are live notes so comments, additions, corrections, etc. all welcome.

Collection development panel (Chair: Nicola Bingham)

James R. Jacobs, Pamela M. Graham & Kris Kasianovitz: What’s in your web archive? Subject specialist strategies for collection development

We’ve been archiving the web for many years but the need for web archiving really hit home for me in 2013 when NASA took down every one of their technical reports – for review on various grounds. And the web archiving community was very concerned. Michael Nelson said in a post “NASA information is too important to be left on nasa.gov computers”. And I wrote about when we rely on pointing not archiving.

So, as we planned for this panel we looked back on previous IIPC events and we didn’t see a lot about collection curation. We posed three topics all around these areas. So for each theme we’ll watch a brief screen cast by Kris to introduce them…

  1. Collection development and roles

Kris (via video): I wanted to talk about my role as a subject specialist and how collection development fits into that. AS a subject specialist that is a core part of the role, and I use various tools to develop the collection. I see web archiving as absolutely being part of this. Our collection is books, journals, audio visual content, quantitative and qualitative data sets… Web archives are just another piece of the pie. And when we develop our collection we are looking at what is needed now but in anticipation of what we be needed 10 or 20 years in the future, building a solid historical record that will persist in collections. And we think about how our archives fit into the bigger context of other archives around the country and around the world.

For the two web archives I work on – CA.gov and the Bay Area Governments archives – I am the primary person engaged in planning, collecting, describing and making available that content. And when you look at the web capture life cycle you need to ensure the subject specialist is included and their role understood and valued.

The CA.gov archive involves a group from several organisations including the government library. We have been archiving since 2007 in the California Digital Library initially. We moved into Archive-It in 2013.

The Bay Area Governments archives includes materials on 9 counties, but primarily and comprehensively focused on two key counties here. We bring in regional governments and special districts where policy making for these areas occur.

Archiving these collections has been incredibly useful for understanding government, their processes, how to work with government agencies and the dissemination of this work. But as the sole responsible person that is not ideal. We have had really good technical support from Internet Archive around scoping rules, problems with crawls, thinking about writing regular expressions, how to understand and manage what we see from crawls. We’ve also benefitted from working with our colleague Nicholas Taylor here at Stanford who wrote a great QA report which has helped us.

We are heavily reliant on crawlers, on tools and technologies created by you and others, to gather information for our archive. And since most subject selectors have pretty big portfolios of work – outreach, instruction, as well as collection development – we have to have good ties to developers, and to the wider community with whom we can share ideas and questions is really vital.

Pamela: I’m going to talk about two Columbia archives, the Human Rights Web Archive (HRWA) and Historic Preservation and Urban Planning. I’d like to echo Kris’ comments about the importance of subject specialists. The Historic Preservation and Urban Planning archive is led by our architecture subject specialist and we’d reached a point where we had to collect web materials to continue that archive – and she’s done a great job of bringing that together. Human Rights seems to have long been networked – using the idea of the “internet” long before the web and hypertext. We work closely with Alex Thurman, and have an additional specially supported web curator, but there are many more ways to collaborate and work together.

James: I will also reflect on my experience. And the FDLP – Federal Library Program – involves libraries receiving absolutely every government publications in order to ensure a comprehensive archive. There is a wider programme allowing selective collection. At Stanford we are 85% selective – we only weed out content (after five years) very lightly and usually flyers etc. As a librarian I curate content. As an FDLP library we have to think of our collection as part of the wider set of archives, and I like that.

As archivists we also have to understand provenance… How do we do that with the web archive. And at this point I have to shout out to Jefferson Bailey and colleagues for the “End of Term” collection – archiving all gov sites at the end of government terms. This year has been the most expansive, and the most collaborative – including FTP and social media. And, due to the Trump administration’s hostility to science and technology we’ve had huge support – proposals of seed sites, data capture events etc.

2. Collection Development approaches to web archives, perspectives from subject specialists

As subject specialists we all have to engage in collection development – there are no vendors in this space…

Kris: Looking again at the two government archives I work on there is are Depository Program Statuses to act as a starting point… But these haven’t been updated for the web. However, this is really a continuation of the print collection programme. And web archiving actually lets us collect more – we are no longer reliant on agencies putting content into the Depository Program.

So, for CA.gov we really treat this as a domain collection. And no-one really doing this except some UCs, myself, and state library and archives – not the other depository libraries. However, we don’t collect think tanks, or the not-for-profit players that influence policy – this is for clarity although this content provides important context.

We also had to think about granularity… For instance for the CA transport there is a top level domain and sub domains for each regional transport group, and so we treat all of these as seeds.

Scoping rules matter a great deal, partly as our resources are not unlimited. We have been fortunate that with the CA.gov archive that we have about 3TB space for this year, and have been able to utilise it all… We may not need all of that going forwards, but it has been useful to have that much space.

Pamela: Much of what Kris has said reflects our experience at Columbia. Our web archiving strengths mirror many of our other collection strengths and indeed I think web archiving is this important bridge from print to fully digital. I spent some time talking with our librarian (Chris) recently, and she will add sites as they come up in discussion, she monitors the news for sites that could be seeds for our collection… She is very integrated in her approach to this work.

For the human rights work one of the challenges is the time that we have to contribute. And this is a truly interdisciplinary area with unclear boundaries, and those are both challenging aspects. We do look at subject guides and other practice to improve and develop our collections. And each fall we sponsor about two dozen human rights scholars to visit and engage, and that feeds into what we collect… The other thing that I hope to do in the future is to do more assessment to look at more authoritative lists in order to compare with other places… Colleagues look at a site called ideallist which lists opportunities and funding in these types of spaces. We also try to capture sites that look more vulnerable – small activist groups – although it is nt clear if they actually are that risky.

Cost wise the expensive part of collecting is both human effort to catalogue, and the permission process in the collecting process. And yesterday’s discussion of possible need for ethics groups as part of the permissions prpcess.

In the web archiving space we have to be clearer on scope and boundaries as there is such a big, almost limitless, set of materials to pick from. But otherwise plenty of parallels.

James: For me the material we collect is in the public domain so permissions are not part of my challenge here. But there are other aspects of my work, including LOCKSS. In the case of Fugitive US Agencies Collection we take entire sites (e.g. CBO, GAO, EPA) plus sites at risk (eg Census, Current Industrial Reports). These “fugitive” agencies include publications should be in the depository programme but are not. And those lots documents that fail to make it out, they are what this collection is about. When a library notes a lost document I will share that on the Lost Docs Project blog, and then also am able to collect and seed the cloud and web archive – using the WordPress Amber plugin – for links. For instance the CBO looked at the health bill, aka Trump Care, was missing… In fact many CBO publications were missing so I have added it as a see for our Archive-it

3. Discovery and use of web archives

Discovery and use of web archives is becoming increasingly important as we look for needles in ever larger haystacks. So, firstly, over to Kris:

Kris: One way we get archives out there is in our catalogue, and into WorldCat. That’s one plae to help other libraries know what we are collecting, and how to find and understand it… So would be interested to do some work with users around what they want to find and how… I suspect it will be about a specific request – e.g. city council in one place over a ten year period… But they won’t be looking for a web archive per se… We have to think about that, and what kind of intermediaries are needed to make that work… Can we also provide better seed lists and documentation for this? In Social Sciences we have the Code Book and I think we need to share the equivalent information for web archives, to expose documentation on how the archive was built… And linking to seeds nad other parts of collections .

One other thing we have to think about is process and document ingest mechanism. We are trying to do this for CA.gov to better describe what we do… BUt maybe there is a standard way to produce that sort of documentation – like the Codebook…

Pamela: Very quickly… At Columbia we catalogue individual sites. We also have a customised portal for the Human Rights. That has facets for “search as research” so you can search and develop and learn by working through facets – that’s often more useful than item searches… And, in terms of collecting for the web we do have to think of what we collect as data for analysis as part of a larger data sets…

James: In the interests of time we have to wrap up, but there was one comment I wanted to make.which is that there are tools we use but also gaps that we see for subject specialists [see slide]… And Andrew’s comments about the catalogue struck home with me…

Q&A

Q1) Can you expand on that issue of the catalogue?

A1) Yes, I think we have to see web archives both as bulk data AND collections as collections. We have to be able to pull out the documents and reports – the traditional materials – and combine them with other material in the catalogue… So it is exciting to think about that, about the workflow… And about web archives working into the normal library work flows…

Q2) Pamela, you commented about permissions framework as possibly vital for IRB considerations for web research… Is that from conversations with your IRB or speculative.

A2) That came from Matt Webber’s comment yesterday on IRB becoming more concerned about web archive-based research. We have been looking for faster processes… But I am always very aware of the ethical concern… People do wonder about ethics and permissions when they see the archive… Interesting to see how we can navigate these challenges going forward…

Q3) Do you use LCSH and are there any issues?

A3) Yes, we do use LCSH for some items and the collections… Luckily someone from our metadata team worked with me. He used Dublin Core, with LCSH within that. He hasn’t indicated issues. Government documents in the US (and at state level) typically use LCSH so no, no issues that I’m aware of.

 

Share/Bookmark

IIPC WAC / RESAW Conference 2017 – Day Two (Technical Strand) Liveblog

I am again at the IIPC WAC / RESAW Conference 2017 and, for today I am

Tools for web archives analysis & record extraction (chair Nicholas Taylor)

Digging documents out of the archived web – Andrew Jackson

This is the technical counterpoint to the presentation I gave yesterday… So I talked yesterday about the physical workflow of catalogue items… We found that the Digital ePrints team had started processing eprints the same way…

  • staff looked in an outlook calendar for reminders
  • looked for new updates since last check
  • download each to local folder and open
  • check catalogue to avoid re-submitting
  • upload to internal submission portal
  • add essential metadata
  • submit for ingest
  • clean up local files
  • update stats sheet
  • Then inget usually automated (but can require intervention)
  • Updates catalogue once complete
  • New catalogue records processed or enhanced as neccassary.

It was very manual, and very inefficient… So we have created a harvester:

  • Setup: specify “watched targets” then…
  • Harvest (harvester crawl targets as usual) –> Ingested… but also…
  • Document extraction:
    • spot documents in the crawl
    • find landing page
    • extract machine-readable metadata
    • submit to W3ACT (curation tool) for review
  • Acquisition:
    • check document harvester for new publications
    • edit essemtial metaddta
    • submit to catalogue
  • Cataloguing
    • cataloguing records processed as neccassry

This is better but there are challenges. Firstly, what is a “publication?”. With the eprints team there was a one-to-one print and digital relationship. But now, no more one-to-one. For example, gov.uk publications… An original report will has an ISBN… But that landing page is a representation of the publication, that’s where the assets are… When stuff is catalogued, what can frustrate technical folk… You take date and text from the page – honouring what is there rather than normalising it… We can dishonour intent by capturing the pages… It is challenging…

MARC is initially alarming… For a developer used to current data formats, it’s quite weird to get used to. But really it is just encoding… There is how we say we use MARC, how we do use MARC, and where we want to be now…

One of the intentions of the metadata extraction work was to proide an initial guess of the catalogue data – hoping to save cataloguers and curators time. But you probably won’t be surprised that the names of authors’ names etc. in the document metadata is rarely correct. We use the worse extractor, and layer up so we have the best shot. What works best is extracting the HTML. Gov.uk is a big and consistent publishing space so it’s worth us working on extracting that.

What works even better is the gov.uk API data – it’s in JSON, it’s easy to parse, it’s worth coding as it is a bigger publisher for us.

But now we have to resolve references… Multiple use cases for “records about this record”:

  • publisher metadata
  • third party data sources (e.g. Wikipedia)
  • Our own annotations and catalogues
  • Revisit records

We can’t ignore the revisit records… Have to do a great big join at some point… To get best possible quality data for every single thing….

And this is where the layers of transformation come in… Lots of opportunities to try again and build up… But… When I retry document extraction I can accidentally run up another chain each time… If we do our Solaar searches correctly it should be easy so will be correcting this…

We do need to do more future experimentation.. Multiple workflows brings synchronisation problems. We need to ensure documents are accessible when discocerale. Need to be able to re-run automated extraction.

We want to iteractively ipmprove automated metadat extraction:

  • improve HTML data extraction rules, e.g. Zotero translators (and I think LOCKSS are working on this).
  • Bring together different sources
  • Smarter extractors – Stanford NER, GROBID (built for sophisticated extraction from ejournals)

And we still have that tension between what a publication is… A tension between established practice and publisher output Need to trial different approaches with catalogues and users… Close that whole loop.

Q&A

Q1) Is the PDF you extract going into another repository… You probably have a different preservation goal for those PDFs and the archive…

A1) Currently the same copy for archive and access. Format migration probably will be an issue in the future.

Q2) This is quite similar to issues we’ve faced in LOCKSS… I’ve written a paper with Herbert von de Sompel and Michael Nelson about this thing of describing a document…

A2) That’s great. I’ve been working with the Government Digital Service and they are keen to do this consistently….

Q2) Geoffrey Bilder also working on this…

A2) And that’s the ideal… To improve the standards more broadly…

Q3) Are these all PDF files?

A3) At the moment, yes. We deliberately kept scope tight… We don’t get a lot of ePub or open formats… We’ll need to… Now publishers are moving to HTML – which is good for the archive – but that’s more complex in other ways…

Q4) What does the user see at the end of this… Is it a PDF?

A4) This work ends up in our search service, and that metadata helps them find what they are looking for…

Q4) Do they know its from the website, or don’t they care?

A4) Officially, the way the library thinks about monographs and serials, would be that the user doesn’t care… But I’d like to speak to more users… The library does a lot of downstream processing here too..

Q4) For me as an archivist all that data on where the document is from, what issues in accessing it they were, etc. would extremely useful…

Q5) You spoke yesterday about engaging with machine learning… Can you say more?

A5) This is where I’d like to do more user work. The library is keen on subject headings – thats a big high level challenge so that’s quite amenable to machine learning. We have a massive golden data set… There’s at least a masters theory in there, right! And if we built something, then ran it over the 3 million ish items with little metadata could be incredibly useful. In my 0pinion this is what big organisations will need to do more and more of… making best use of human time to tailor and tune machine learning to do much of the work…

Comment) That thing of everything ending up as a PDF is on the way out by the way… You should look at Distil.pub – a new journal from Google and Y combinator – and that’s the future of these sorts of formats, it’s JavaScript and GitHub. Can you collect it? Yes, you can. You can visit the page, switch off the network, and it still works… And it’s there and will update…

A6) As things are more dynamic the re-collecting issue gets more and more important. That’s hard for the organisation to adjust to.

Nick Ruest & Ian Milligan: Learning to WALK (Web Archives for Longitudinal Knowledge): building a national web archiving collaborative platform

Ian: Before I start, thank you to my wider colleagues and funders as this is a collaborative project.

So, we have a fantastic web archival collections in Canada… They collect political parties, activist groups, major events, etc. But, whilst these are amazing collections, they aren’t acessed or used much. I think this is mainly down to two issues: people don’t know they are there; and the access mechanisms don’t fit well with their practices. Maybe when the Archive-it API is live that will fix it all… Right now though it’s hard to find the right thing, and the Canadian archive is quite siloed. There are about 25 organisations collecting, most use the Archive-It service. But, if you are a researcher… to use web archives you really have to interested and engaged, you need to be an expert.

So, building this portal is about making this easier to use… We want web archives to be used on page 150 in some random book. And that’s what the WALK project is trying to do. Our goal is to break down the silos, take down walls between collections, between institutions. We are starting out slow… We signed Memoranda of Understanding with Toronto, Alberta, Victoria, Winnipeg, Dalhousie, SImon Fraser University – that represents about half of the archive in Canada.

We work on workflow… We run workshops… We separated the collections so that post docs can look at this

We are using Warcbase (warcbase.org) and command line tools, we transferred data from internet archive, generate checksums; we generate scholarly derivatives – plain text, hypertext graph, etc. In the front end you enter basic information, describe the collection, and make sure that the user can engage directly themselves… And those visualisations are really useful… Looking at visualisation of the Canadan political parties and political interest group web crawls which track changes, although that may include crawler issues.

Then, with all that generated, we create landing pages, including tagging, data information, visualizations, etc.

Nick: So, on a technical level… I’ve spent the last ten years in open source digital repository communities… This community is small and tightknit, and I like how we build and share and develop on each others work. Last year we presented webarchives.ca. We’ve indexed 10 TB of warcs since then, representing 200+ M Solr docs. We have grown from one collection and we have needed additional facets: institution; collection name; collection ID, etc.

Then we have also dealt with scaling issues… 30-40Gb to 1Tb sized index. You probably think that’s kinda cute… But we do have more scaling to do… So we are learning from others in the community about how to manage this… We have Solr running on an Open Stack… But right now it isn’t at production scale, but getting there. We are looking at SolrCloud and potentially using a Shard2 per collection.

Last year we had a solr index using the Shine front end… It’s great but… it doesn’t have an active open source community… We love the UK Web Archive but… Meanwhile there is BlackLight which is in wide use in libraries. There is a bigger community, better APIs, bug fixees, etc… So we have set up a prototype called WARCLight. It does almost all that Shine does, except the tree structure and the advanced searching..

Ian spoke about dericative datasets… For each collection, via Blacklight or ScholarsPortal we want domain/URL Counts; Full text; graphs. Rather than them having to do the work, they can just engage with particular datasets or collections.

So, that goal Ian talked about: one central hub for archived data and derivatives…

Q&A

Q1) Do you plan to make graphs interactive, by using Kebana rather than Gephi?

A1 – Ian) We tried some stuff out… One colleague tried R in the browser… That was great but didn’t look great in the browser. But it would be great if the casual user could look at drag and drop R type visualisations. We haven’t quite found the best option for interactive network diagrams in the browser…

A1 – Nick) Generally the data is so big it will bring down the browser. I’ve started looking at Kabana for stuff so in due course we may bring that in…

Q2) Interesting as we are doing similar things at the BnF. We did use Shine, looked at Blacklight, but built our own thing…. But we are looking at what we can do… We are interested in that web archive discovery collections approaches, useful in other contexts too…

A2 – Nick) I kinda did this the ugly way… There is a more elegant way to do it but haven’t done that yet..

Q2) We tried to give people WARC and WARC files… Our actual users didn’t want that, they want full text…

A2 – Ian) My students are quite biased… Right now if you search it will flake out… But by fall it should be available, I suspect that full text will be of most interest… Sociologists etc. think that network diagram view will be interesting but it’s hard to know what will happen when you give them that. People are quickly put off by raw data without visualisation though so we think it will be useful…

Q3) Do you think in few years time

A3) Right now that doesn’t scale… We want this more cloud-based – that’s our next 3 years and next wave of funded work… We do have capacity to write new scripts right now as needed, but when we scale that will be harder,,,,

Q4) What are some of the organisational, admin and social challenges of building this?

A4 – Nick) Going out and connecting with the archives is a big part of this… Having time to do this can be challenging…. “is an institution going to devote a person to this?”

A4 – Ian) This is about making this more accessible… People are more used to Backlight than Shine. People respond poorly to WARC. But they can deal with PDFs with CSV, those are familiar formats…

A4 – Nick) And when I get back I’m going to be doing some work and sharing to enable an actual community to work on this..

 

Share/Bookmark

Digital Conversations @BL: Web Archives: truth, lies and politics in the 21st century (part of IIPC/RESAW 2017)

Following on from Day One of IIPC/RESAW I’m at the British Library for a connected Web Archiving Week 2017 event: Digital Conversations @BL, Web Archives: truth, lies and politics in the 21st century. This is a panel session chaired by Elaine Glaser (EG) with Jane Winters (JW), Valerie Schafer (VS), Jefferson Bailey (JB) and Andrew Jackson (AJ). 

As usual, this is a liveblog so corrections, additions, etc. are welcomed. 

EG: Really excited to be chairing this session. I’ll let everyone speak for a few minutes, then ask some questions, then open it out…

JB: I thought I’d talk a bit about our archiving strategy at Internet Archive. We don’t archive the whole of the internet, but we aim to collect a lot of it. The approach is multi-pronged: to take entire web domains in shallow but broad strategy; to work with other libraries and archives to focus on particular subjects or areas or collections; and then to work with researchers who are mining or scraping the web, but not neccassarily having preservation strategies. So, when we talk about political archiving or web archiving, it’s about getting as much as possible, with different volumes and frequencies. I think we know we can’t collect everything but important things frequently, less important things less frequently. And we work with national governments, with national libraries…

The other thing I wanted to raise in

T.R. Shellenberg who was an important archivist at the National Archive in the US. He had an idea about archival strategies: that there is a primary documentation strategy, and a secondary straetgy. The primary for a government and agencies to do for their own use, the secondary for futur euse in unknown ways… And including documentary and evidencey material (the latter being how and why things are done). Those evidencery elements becomes much more meaningful on the web, that has eerged and become more meaningful in the context of our current political environment.

AJ: My role is to build a Web Archive for the United Kingdom. So I want to ask a question that comes out of this… “Can a web archive lie?”. Even putting to one side that it isn’t possible to archive the whole web.. There is confusion because we can’t get every version of everything we capture… Then there are biases from our work. We choose all UK sites, but some are captured more than others… And our team isn’t as diverse as it could be. And what we collect is also constrained by technology capability. And we are limited by time issues… We don’t normally know when material is created… The crawler often finds things only when they become popular… So the academic paper is picked up after a BBC News item – they are out of order. We would like to use more structured data, such as Twitter which has clear publication date…

But can the archive lie? Well material is much easier than print to make an untraceable change. As digital is increasingly predominant we need to be aware that our archive could he hacked… So we have to protect for that, evidence that we haven’t been hacked… And we have to build systems that are secure and can maintain that trust. Libraries will have to take care of each other.

JW: The Oxford Dictionary word of the year in 2016 was “post truth” whilst the Australian dictionary went for “Fake News”. Fake News for them is either disinformation on websites for political purposes, or commercial benefit. Mirrium Webster went for “surreal” – their most searched for work. It feels like we live in very strange times… There aren’t calls for resignation where there once were… Hasn’t it always been thus though… ? For all the good citizens who point out the errors of a fake image circulated on Twitter, for many the truth never catches the lie. Fakes, lies and forgeries have helped change human history…

But modern fake news is different to that which existed before. Firstly there is the speed of fake news… Mainstream media only counteracts or addresses this. Some newspapers and websites do public corrections, but that isn’t the norm. Once publishing took time and means. Social media has made it much easier to self-publish. One can create, but also one can check accuracy and integrity – reverse image searching to see when a photo has been photoshopped or shows events of two things before…

And we have politicians making claims that they believe can be deleted and disappear from our memory… We have web archives – on both sides of the Atlantic. The European Referendum NHS pledge claim is archived and lasts long beyond the bus – which was brought by Greenpeace and repainted. The archives have also been capturing political parties websites throughout our endless election cycle… The DUP website crashed after announcement of the election results because of demands… But the archive copy was available throughout. Also a rumour that a hacker was creating an irish language version of the DUP website… But that wasn’t a new story, it was from 2011… And again the archive shows that, and archive of news websites do that.

Social Networks Responses to Terrorist Attacks in France – Valerie Schafer. 

Before 9/11 we had some digital archives of terrorist materials on the web. But this event challenged archivists and researchers. Charlie Hebdo, Paris Bataclan and Nice attacks are archived… People can search at the BNF to explore these archives, to provide users a way to see what has been said. And at the INA you can also explore the archive, including Titter archives. You can search, see keywords, explore timelines crossing key hashtags… And you can search for images… including the emoji’s used in discussion of Charlie Hebdo and Bataclan.

We also have Archive-It collections for Charlie Hebdo. This raises some questions of what should and should not be collected… We did not normally collected news papers and audio visual sites, but decided to in this case as we faced a special event. But we still face challenges – it is easiest to collect data from Twitter than from Facebook. But it is free to collect Twitter data in real time, but the archived/older data is charged for so you have to capture it in the moment. And there are limits on API collection… INA captured more than 12 Million tweets for Charlie Hebdo, for instance, it is very complete but not exhaustive.

We continue to collect for #jesuischarlie and #bataclan… They continually used and added to, in similar or related attacks, etc. There is a time for exploring and reflecting on this data, and space for critics too….

But we also see that content gets deleted… It is hard to find fake news on social media, unless you are looking for it… Looking for #fakenews just won’t cut it… So, we had a study on fake news… And we recommend that authorities are cautious about material they share. But also there is a need for cross checking – the kinds of projects with Facebook and Twitter. Web archives are full of fake news, but also full of others’ attempts to correct and check fake news as well…

EG: I wanted to go back in time to the idea of the term “fake news”… In order to understand from what “Fake News” actually is, we have to understand how it differs from previous lies and mistruths… I’m from outside the web world… We are often looking at tactics to fight fire with fire, to use an unfortunate metaphor…  How new is it? And who is to blame and why?

JW: Talking about it as a web problem, or a social media issue isn’t right. It’s about humans making decisions to critique or not that content. But it is about algorithmic sharing and visibility of that information.

JB: I agree. What is new is the way media is produced, disseminated and consumed – those have technological underpinnings. And they have been disruptive of publication and interpretation in a web world.

EG: Shouldn’t we be talking about a culture not just technology… It’s not just the “vessel”… Isn’t the dissemination have more of a role than perhaps we are suggesting…

AJ: When you build a social network or any digital space you build in different affordances… So that Facebook and Twitter is different. And you can create automated accounts, with Twitter especially offering an affordance for robots etc which allows you to give the impression of a movement. There are ways to change those affordances, but there will also always be fake news and issues…

EG: There are degrees of agency in fake news.. from bots to deliberate posts…

JW: I think there is also the aspect of performing your popularity – creating content for likes and shares, regardless of whether what you share is true or not.

VS: I know terrorism is different… But any tweet sharing fake news you get 4 retweets denying… You have more tweets denying than sharing fake news…

AJ: One wonders about the filter bubble impact here… Facebook encourges inward looking discussion… Social media has helped like minded people find each other, and perhaps they can be clipped off more easily from the wider discussion…

VS: I think also what is interested is the game between social media and traditional media…You have questions and relationship there…

EG: All the internet can do is reflect the crooked timber of reality… We know that people have confirmation bias, we are quite tolerant of untruths, to be less tolerant of information that contradicts our perceptions, even if untrue.You have people and the net being equally tolerant of lies and mistruths… But isn’t there another factor here… The people demonised as gatekeepers… By putting in place structures of authority – which were journalism and academics… Their resources are reduced now… So what role do you see for those traditional gatekeepers…

VS: These gatekeepers are no more the traditional gatekeepers that they were…. They work in 24 hour news cycles and have to work to that. In France they are trying to rethink that role, there were a lot of questions about this… Whether that’s about how you react to changing events, and what happens during election…. People thinking about that…

JB: There is an authority and responsibiity for media still, but has the web changed that? Looking back its suprising now how few organisations controlled most of the media… But is that that different now?

EG: I still think you are being too easy on the internet… We’ve had investigate journalism by Carrell Cadwalladar and others on Cambridge Analytica and others who deliberately manipulate reality… You talked about witness testimony in relation to terrorism… Isn’t there an immediacy and authenticity challenge there… Donald Trump’s tweets… They are transparant but not accountable… Haven’t we created a problem that we are now trying to fix?

AJ: Yes. But there are two things going on… It seems to be that people care less about lying… People see Trump lying, and they don’t care, and media organisations don’t care as long as advertising money comes in… A parallel for that in social media – the flow of content and ads takes priority over truth. There is an economic driver common to both mediums that is warping that…

JW: There is an aspect of unpopularity aspect too… a (nameless) newspaper here that shares content to generate “I can’t believe this!” and then sharing and generating advertising income… But on a positive note, there is scope and appetite for strong investigative journalism… and that is facilitated by the web and digital methods…

VS: Citizens do use different media and cross media… Colleagues are working on how TV is used… And different channels, to compare… Mainstream and social media are strongly crossed together…

EG: I did want to talk about temporal element… Twitter exists in the moment, making it easy to make people accountable… Do you see Twitter doing what newspapers did?

AJ: Yes… A substrate…

JB: It’s amazing how much of the web is archived… With “Save Page Now” we see all kinds of things archived – including pages that exposed the whole Russian downing a Ukrainian plane… Citizen action, spotting the need to capture data whilst it is still there and that happens all the time…

EG: I am still sceptical about citizen journalism… It’s a small group of narrow demographics people, it’s time consuming… Perhaps there is still a need for journalist roles… We did talk about filter bubbles… We hear about newspapers and media as biased… But isn’t the issue that communities of misinformation are not penetrated by the other side, but by the truth…

JW: I think bias in newspapers is quite interesting and different to unacknowledged bias… Most papers are explicit in their perspective… So you know what you will get…

AJ: I think so, but bias can be quite subtle… Different perspectives on a common issue allows comparison… But other stories only appear in one type of paper… That selection case is harder to compare…

EG: This really is a key point… There is a difference between facts and truth, and explicitly framed interpretation or commentary… Those things are different… That’s where I wonder about web archives… When I look at Wikipedia… It’s almost better to go to a source with an explicit bias where I can see a take on something, unlike Wikipedia which tries to focus on fact. Talking about politicians lying misses the point… It should be about a specific rhetorical position… That definition of truth comes up when we think of the role of the archive… How do you deal with that slightly differing definition of what truth is…

JB: I talked about different complimentary collecting strategy… The Archivist as a thing has some political power in deciding what goes in the historical record… The volume of the web does undercut that power in a way that I think is good – archives have historically been about the rich and the powerful… So making archives non-exclusive somewhat addresses that… But there will be fake news in the archive…

JW: But that’s great! Archives aren’t about collecting truth. Things will be in there that are not true, partially true, or factual… It’s for researchers to sort that out lately…

VS: Your comment on Wikipedia… They do try to be factual, neutral… But not truth… And to have a good balance of power… For us as researchers we can be surprised by the neutral point of view… Fortunately the web archive does capture a mixture of opinions…

EG: Yeah, so that captures what people believed at a point of time – true or not… So I would like to talk about the archive itself… Do you see your role as being successors to journalists… Or as being able to harvest the world’s record in a different way…

JB: I am an archivist with that training and background, as are a lot of people working on web archives and interesting spaces. Certainly historic preservation drives a lot of collecting aspects… But also engineering and technological aspects. So it’s poeple interested in archiving, preservation, but also technology… And software engineers interested in web archiving.

AJ: I’m a physicist but I’m now running web archives. And for us it’s an extension of the legal deposit role… Anything made public on the web should go into the legal deposit… That’s the theory, in practice there are questions of scope, and where we expend quality assurance energy. That’s the source of possible collection bias. And I want tools to support archivists… And also to prompt for challenging bias – if we can recognise that taking place.

JW: There are also questions of what you foreground in Special Collections. There are decisions being made about collections that will be archived and catalogued more deeply…

VS: In BNF my colleagues are work in an area with a tradition, with legal deposit responsibility… There are politics of heritage and what it should be. I think that is the case for many places where that activity sits with other archivists and librarians.

EG: You do have this huge responsibility to curate the record of human history… How do you match the top down requirements with the bottom up nature of the web as we now talk about i.t.

JW: One way is to have others come in to your department to curate particular collections…

JB: We do have special collections – people can choose their own, public suggestions, feeds from researchers, all sorts of projects to get the tools in place for building web archives for their own communities… I think for the sake of longevity and use going forward, the curated collections will probably have more value… Even if they seem more narrow now.

VS: Also interesting that archives did not select bottom-up curation. In Switzerland they went top down – there are a variety of approaches across Europe.

JW: We heard about the 1916 Easter Rising archive earlier, which was through public nominations… Which is really interesting…

AJ: And social media can help us – by seeing links and hashtags. We looked at this 4-5 years ago everyone linked to the BBC, but now we have more fake news sites etc…

VS: We do have this question of what should be archived… We see capture of the vernacular web – kitten or unicorn gifs etc… !

EG: I have a dystopian scenario in my head… Could you see a time years from now when newspapers are dead, public broadcasters are more or less dead… And we have flotsom and jetsom… We have all this data out there… And kinds of data who use all this social media data… Can you reassure me?

AJ: No…

JW: I think academics are always ready to pick holes in things, I hope that that continues…

JB: I think more interesting is the idea that there may not be a web… Apps, walled gardens… Facebook is pretty hard to web archive – they make it intentionally more challenging than it should be. There are lots of communication tools that disappeared… So I worry more about loss of a web that allows the positive affordances of participation and engagement…

EG: There is the issue of privatising and sequestering the web… I am becoming increasingly aware of the importance of organisations – like the BL and Internet Archive… Those roles did used to be taken on by publicly appointed organisations and bodies… How are they impacted by commercial privatisation… And how those roles are changing… How do you envisage that public sphere of collecting…

JW: For me more money for organisations like the British Library is important. Trust is crucial, and I trust that they will continue to do that in a trustworthy way. Commercial entities cannot be trusted to protect our cultural heritage…

AJ: A lot of people know what we do with physical material, but are surprised by our digital work. We have to advocate for ourselves. We are also constrained by the legal framework we operate within, and we have to challenge that over time…

JB: It’s super exciting to see libraries and archives recognised for their responsibility and trust… But that also puts them at higher risk by those who they hold accountable, and being recognised as bastions of accountability makes them more vulnerable.

VS: Recently we had 20th birthday of the Internet Archive, and 10 years of the French internet archiving… This is all so fast moving… People are more and more aware of web archiving… We will see new developments, ways to make things open… How to find and search and explore the archive more easily…

EG: The question then is how we access this data… The new masters of the universe will be those emerging gatekeepers who can explore the data… What is the role between them and the public’s ability to access data…

VS: It is not easy to explain everything around web archives but people will demand access…

JW: There are different levels of access… Most people will be able to access what they want. But there is also a great deal of expertise in organisations – it isn’t just commercial data work. And working with the Alan Turing Institute and cutting edge research helps here…

EG: One of the founders of the internet, Vint Cerf, says that “if you want to keep your treasured family pictures, print them out”. Are we overly optimistic about the permanence of the record.

AJ: We believe we have the skills and capabilities to maintain most if not all of it over time… There is an aspect of benign neglect… But if you are active about your digital archive you could have a copy in every continent… Digital allows you to protect content from different types of risk… I’m confident that the library can do this as part of it’s mission.

Q&A

Q1) Coming back to fake news and journalists… There is a changing role between the web as a communications media, and web archiving… Web archives are about documenting this stuff for journalists for research as a source, they don’t build the discussion… They are not the journalism itself.

Q2) I wanted to come back to the idea of the Filter Bubble, in the sense that it mediates the experience of the web now… It is important to capture that in some way, but how do we archive that… And changes from one year to the next?

Q3) It’s kind of ironic to have nostalgia about journalism and traditional media as gatekeepers, in a country where Rupert Murdoch is traditionally that gatekeeper. Global funding for web archiving is tens of millions; the budget for the web is tens of billions… The challenges are getting harder – right now you can use robots.txt but we have DRM coming and that will make it illegal to archive the web – and the budgets have to increase to match that to keep archives doing their job.

AJ: To respond to Q3… Under the legislation it will not be illegal for us to archive that data… But it will make it more expensive and difficult to do, especially at scale. So your point stands, even with that. In terms of the Filter Bubble, they are out of our scope, but we know they are important… It would be good to partner with an organisation where the modern experience of media is explicitly part of it’s role.

JW: I think that idea of the data not being the only thing that matters is important. Ethnography is important for understanding that context around all that other stuff…  To help you with supplementary research. On the expense side, it is increasingly important to demonstrate the value of that archiving… Need to think in terms of financial return to digital and creative economies, which is why researchers have to engage with this.

VS: Regarding the first two questions… Archives reflect reality, so there will be lies there… Of course web archives must be crossed and compared with other archives… And contextualisation matters, the digital environment in which the web was living… Contextualisation of web environment is important… And with terrorist archive we tried to document the process of how we selected content, and archive that too for future researchers to have in mind and understand what is there and why…

JB: I was interested in the first question, this idea of what happens and preserving the conversation… That timeline was sometimes decades before but is now weeks or days or less… In terms of experience websites are now personalised and our ability to capture that is impossible on a broad question. So we need to capture that experience, and the emergent personlisation… The web wasn’t public before, as ARPAnet, then it became public, but it seems to be ebbing a bit…

JW: With a longer term view… I wonder if the open stuff which is easier to archive may survive beyond the gated stuff that traditionally was more likely to survive.

Q4) Today we are 24 years into advertising on the web. We take ad-driven models as a given, and we see fake news as a consequence of that… So, my question is, Minitel was a large system that ran on a different model… Are there different ways to change the revenue model to change fake or true news and how it is shared…

Q5) Teresa May has been outspoken on fake news and wants a crackdown… The way I interpret that is censorship and banning of sites she does not like… Jefferson said that he’s been archiving sites that she won’t like… What will you do if she asks you to delete parts of your archive…

JB: In the US?!

Q6) Do you think we have sufficient web literacy amongst policy makers, researchers and citizens?

 

Share/Bookmark

Posted in Uncategorized

IIPC WAC / RESAW Conference 2017 – Day One Liveblog

From today until Friday I will be at the International Internet Preservation Coalition (IIPC) Web Archiving Conference 2017, which is being held jointly with the second RESAW: Research Infrastructure for the Study of Archived Web Materials Conference. I’ll be attending the main strand at the School of Advanced Study, University of London, today and Friday, and at the technical strand (at the British Library) on Thursday.

I’m here wearing my “Reference Rot in Theses: A HiberActive Pilot” – aka “HiberActive” – hat. HiberActive is looking at how we can better enable PhD candidates to archive web materials they are using in their research, and citing in their thesis. I’m managing the project and working with developers, library and information services stakeholders, and a fab team of five postgraduate interns who are, whilst I’m here, out and about around the University of Edinburgh talking to PhD students to find out how they collect, manage and cite their web references, and what issues they may be having with “reference rot” – content that changes, decays, disappears, etc. We will have a webpage for the project and some further information to share soon but if you are interested in finding out more, leave me a comment below or email me: nicola.osborne@ed.ac.uk.

These notes are being taken live so, as usual for my liveblogs, I welcome corrections, additions, comment etc. (and, as usual, you’ll see the structure of the day appearing below with notes added at each session). 

Opening remarks: Jane Winters and Nicholas Taylor

Opening plenary: Leah Lievrouw – Web history and the landscape of communication/media research Chair: Nicholas Taylor

Share/Bookmark

Welcome to the University of Wales Trinity Saint David!

SUNCAT is very pleased to announce that the University of Wales Trinity Saint David has become our newest Contributing Library. This is the sixth Welsh institution to join SUNCAT. We invited John Dalling, Head of Collections, to write a few words about the University’s Library and Learning Resources service and its collections.

______________________________________________

The exterior of Carmarthen Campus Library

Carmarthen Campus Library. (© UWTSD)

The University of Wales Trinity Saint David (UWTSD) was formed in November 2010 through the merger of the University of Wales Lampeter and Trinity University College Carmarthen, under Lampeter’s Royal Charter of 1828. On the 1 August 2013, Swansea Metropolitan University became part of UWTSD.  The University’s main campuses are situated in various locations in and around Swansea’s city centre as well as in the rural towns of Lampeter and Carmarthen in South West Wales.

Interior of the Carmarthen Campus Library.

Interior of the Carmarthen Campus Library. (© UWTSD)

UWTSD Library and Learning Resources has seven campus libraries in Carmarthen, Lampeter, Swansea and London, which include a collection of over 500,000 printed volumes and provide access to approximately 20,000 electronic books and 50,000 electronic journals.  The University’s special collections are held in the Roderic Bowen Library and Archives at Lampeter, which include over 35,000 printed works, featuring several medieval and post-medieval manuscripts.

Library_interior

Library interior. (© UWTSD)

Our libraries support programmes of study and research covering a wide range of subject areas, with particular strengths in education, humanities, art and design, architecture, engineering, and business.  The University subscribes to approximately 500 individual journals and holds archives for many more titles including a number in the Welsh language and of local interest to South and West Wales.  In addition, we also provide access to many more journals electronically through subscription packages.

We are delighted to be able to contribute our holdings to SUNCAT and hope that widening access to our periodical collections will prove beneficial to researchers throughout Wales and the whole of the UK.

_______________________________________________

SUNCAT would like to thank John for introducing UWTSD Library and Learning Resources and its journal collection. If you would like to write a post on your SUNCAT Contributing Library and its serials collections or would like to join SUNCAT please contact us at suncat@ed.ac.uk.