welcome everybody to the help I’m an
accidental government information librarian webinar series which is
brought to you by North Carolina libraries education government resources
section and I’m my name is Lynda Kellam and I organize these webinars for our
group. We have four presenters today we have Shari Lasiter who is the
government information librarian and data services librarian at the University
of California Santa Barbara. James R. Jacobs who is the US government
information librarian at stanford university libraries and we have James A. Jacobs who is a data services librarian emeritus from the University
of California San Diego and then finally Lori Allen who is the assistant director
for digital scholarship at Penn Libraries. Thank you very much everyone. Alright well thank you all so much for joining us and thanks to Lynda
for hosting as today’s webinar. It is I think a little bit different from
what has been hosted in the past.It is a little bit of an experiment for us to get to talk
about different projects that are happening and do so in a way to give you
some context about this projects and avenues to participate so i want to begin with a
little round introduction so Lori do you want to go ahead? This is Lori Allen. I work with
the Penn Libraries and have become involved in data refuge here. Hi I’m
James A. Jacobs. You can call me Jim so it’ll be completely clear that I’m a
different person from James R. Jacobs. This is James R. Jacobs. You can call me
James and I am indeed different from Jim. Now that we’ve established that, my name is Shari Lasiter. Our conversation today we’re calling this a conversation with
the future and the reason that i like to think about collections this way is that
when we’re collecting materials we’re doing so on behalf of people that both
we work with every day and the people that are not working with us yet but
will be someday will be relying on use of these collections and the collections
tell our users a lot about what our priorities are what we care about what
we may be overlooking or what we may be making up
for from the past and at the same time it’s a two-way conversation because
we’re also trying to anticipate what it is that people will want to access and
use and interact with in the future so. With that in mind we came up with five
questions and we’re going to go through these and share our perspectives as
well as current work that’s happening so Our questions are: why state government
data and why do so right now? What is it that we are talking about when we’re talking
about data? What’s happened over the last six months and what’s happening right as
we’re speaking? What’s the sustainability? What’s the future outlook for these
kinds of projects and finally what can everybody do to have a chance to
participate in this work and be involved. so beginning with our question why
rescue data I’m going to turn over to Jim and then James talk about this
context I thanks Carrie and thank you all for joining us this morning first
thing to understand about what we’re doing is that digital government
information is not at risk because of politics alone it’s at risk because
there’s no plan no policy no budget for preserving government information with
few exceptions there’s no law or regulation that requires government
agencies to preserve their own information or to provide free access to
it digital information is fragile preservation of that information
requires conscious and planned action such action requires people in
organizations that have preservation is a mission and is a priority government
agencies are with a few exceptions cast with collecting and creating information
but not with the preservation of that information what that means is that
almost all digital government information is at risk of being lost
such losses can be intentional or unintentional they can be given directly
by political policy or bureaucratic procedures they
driven by budgetary priorities they can be caused by technological decisions or
even just in attention for inaction in short the need for data rescue every
face today is not a new nothing has changed to make data more at risk today
than it was six months ago or four years ago two things have sentence first more
people are aware of the risk today than they ever have been before and second
many of those who understand the risks are in communities of users they’re not
just librarians their users who are controlled that explicit and announced
changes in government policy could result in decisions that would threaten
the data on which they rely and I’ll turn it over now to jake yeah and I just
wanted to make a couple of points to to sort of jump on to what Tim said first
yes digital information is fragile this is not a new thing this is not a
inherently political thing you have things like link rot and content drift
and a quick Google search on those terms will will get you some more
understanding of that and and how fragile government your web based
information is to to my understanding and to my perspective I think control is
the key a key here both historically as well as currently as we often like to
say on over on pre government information linking is not preserving
and so since web based information means apparently fragile it behooves us as
librarians to collect it in order to preserve it in order to control it in
order to describe it and get access to it the silver lining I think I like to
look for silver linings especially in this political climate one silver
linings are really important i think the the as Jim pointed out the public is
suddenly aware and the age for Dia preservation outside of the dot gov
domain and some of us in the federal depository
library program have been arguing for this this concept for for many years I
think you’ll find that Jim wrote an article Oh probably in the mid-90s about
preserving information from from bulletin boards so this is not a it’s
not a new a new idea but preservation requires action action requires actors
and government agencies working together the current system is primarily centered
around creation and access not preservation preservation isn’t
happening system-wide and you need to figure out ways to build and sustain the
government information systems and just to give you an image of what it is that
we’re talking about this is a graph that Jim did 4014 report for the center for
research libraries called born-digital US federal government information
preservation and access and I think it’s it really shows what the issue what the
problem is at the five thousand foot level when you see is the number of
tangible documents documents meeting government documents that that were
distributed to SDLP libraries in 2011 and it’s a small sliver there and a
slightly larger sliver sliver in the middle is all fpl p items that have been
distributed to FDL p libraries throughout the 200-plus years of of the
depository program that number is somewhere between 3 and 5 million and
then the the third large piece there is the number of URLs that were collected
for the 2008 end of term crawl this is somewhere around 160 million on the 2012
and the 2016 call will be exponentially bigger than than that but you see that
even even if you know some of those quote unquote URLs were
werd spacer gifts and images and oddities of the web you still have a lot
of information / data that’s being produced far larger quantities than has
ever been produced before so this is the nature of the problem we’re dealing with
so I’m glad that you mentioned data James what what are we talking about
when we talk about beta so this is him again I’ll start off I just want to make
one quick point the word data is sometimes usually remove anything
digital but when climatologists and geographers and demographers and
economists and scientists use the word data any one very specific kind of given
information when they talk about day that they are usually talking about what
we might call data steps these data sets contain information that is collected
using things like public opinion surveys social survey satellites and other kinds
of instrumentation the data is stored in files and databases in a highly
structured way so that the information or data can be analyzed in some
statistical software this is raw information raw data if you will it’s
often just a bunch of numbers like measurements and codes of various kinds
and normally not intended for direct reason by humans now when we talk about
rescuing anything and everything digital we use that definition of data a lot of
us are sensing about this stuff that you see on the web web pages PDF files
simple spreadsheets images audio and video files and so for this information
unlike the data and data set is very definitely intended for direct human
consumption reading and viewing and listen and this kind of information is
relatively easy to find and identify and capture this is what the Internet
Archive does every day got other kinds of data the data in data
sets have so many scientists use and analyze and in their research is not
always easy to either find or identify or download and a lot of those data sets
are not visible on the web its data sets often require special attention and
tactics to identify them in download and so when what we’re talking about what
we’re talking about data is both but it’s a it’s a broad field with
information and with that I’ll turn it back to life in the data refuge projects
that we started as we wanted to think about how could we reckon federal
information federal data we were I think at the beginning really most concerned
with the kind of research data but as we thought about learned about and talk to
people about what federal data how federal data lives they really there
they really kind of run the gamut between data that in research data sets
all the way to the sax HTML and this position is really important the
distinction between web pages and data set it’s really important for how we
must find and reuse data there are standards for making it sharing data
sets and various disciplines and they are really different gender can practice
your solar sharing web pages and PDF and that sort of thing but what we found
over and over again is that was caught attention to the preservation or use of
data many of the datasets that are made available on government websites are
treated even by their producers more like web pages that like traditional
data sets so they’re conceived of a turn of the interfaces that provide access to
them rather than the way we would want to be able to data sets that can travel
with all their own metadata there very often not packaged with on federal
website the data that we would want the data set is very often not packaged for
the company data dictionary interment data methodological overviews download
tool that we would generally expect from data sets within probably disciplines in
academia instead the metadata that we want
to include in a set of files describing a dataset are just sprinkled through web
pages or on a related page but aren’t really connected to or evilly
connectable to the data themselves completion of web sites of data
visualization of datasets and informational sites is one of the
reasons that addressing problem is so tricky the problem backing things up
becomes more complicated when the lines are blurred as they are when we talk
about the data were thinking of four da refuge that meaning of data needs can be
pretty clearly spelled out at various times during the conversation we need to
be open to accepting the data producers futures journalist open data advocates
software developers ego librarians and others all need more or less different
things when we talk about data but to share an interest is hoping to ensure
that each community has access to the data they need and so we may need to
develop new vocabulary a new way of talking about these problems with that
kind of context let’s talk about what kinds of projects are happening right
now we will have Laurie and games talk about two different major projects data
rescue and the end of term archive so first of James can you just give a quick
overview of what these current efforts are at the 50,000 foot level so a lot of
a lot of efforts are already happening there’s a lot of crawling of the
government web in bits and pieces there’s a variety of organizations that
are collecting bits and pieces of the government web domain but no one
organization is mandated to do it unlike in other countries where national
libraries have a mandate to collect their country’s domain so you have you
have organizations like a library of congress tpo nara internet archive all
fairly large organizations all doing web capture of preservation and access you
have agencies using our commit and other web tools
to archive their own domains for various reasons whether it’s to deposit into the
national archives or work for their own preservation efforts you have
universities like the University of North Texas Stanford and other libraries
that are that are building topical gov collections focusing on all things from
folia to crs reports to other sorts of subject based of collections and then
you give more recently gotten these these community efforts this is this is
more from the from the fear that government data or government
information is going to disappear on it so you have a lot of these citizen
driven grassroots efforts like they arrest you and climate Muir and
Absolut’s and those kinds of efforts as well as larger long longer-term efforts
like the end of term and so with that I’ll give it over to Lori I’m talk a
little bit about the data rescue event not have been happening around the
country the data rescue events are supported by data refuge and by the
environmental data and governance initiative which is a separate
organization and really they’re driven by the people within often within
libraries but also elsewhere who come together to to form an event that is
designed to refute data and so goes up in the process that we go through in the
data refuge the rescue event which folks watch to move data sets of the serpent
we were talking about earlier in to our catalog and associated storage space
that we have for the SCADA refugee org that there’s about 180 datasets in there
but also a data rescue events there is a really really close connection between
understanding the ways that the data are so bringing together people of or
teaches and bringing together people tell stories about how data impact
federal data impacts their lives how time in an environmental data is
connected you’ve policy and especially in MC and
then we are hoping and encouraging folks to really experiment with different
kinds of data raku events with wit event some events are continuing to route data
into our workflow but we want people to sort of continue and sherry key you may
be pursued couldn’t apply their thing so we’re hoping that people will think of
new ways to tackle this problem the data rescue oven basics beyond so good so far
for the most part on moving data into one of two locations one is for the
first several months we were we were feeding the end of term service project
so we were directing URL to that that and if you think of this continuum of
basic HTML pages all the way through to research data sets the goal was for
those things that the Internet Archive can capture that our web archive able
like basic HTML pages as well as directories files and FTP servers
natural thing that there are events were designed to sort of systematically push
content to the wayback machine to the Internet Archive one at ten and then for
those things that can’t comfortably or that sort of be sucked into a web
harvest acting is like the data that’s behind a query interface or a research
data set that’s already has prepared and laid out that we have a system probably
knows into wild minding chain of custody and being aware of the quality of the
data in queue Gator refuge which is a good refuge Borg so that’s been the kind
of approach going to call out there there is that embedded content which
doesn’t have an arrow and that’s a kind of standard for all of the various kinds
of interfaces or visualizations that where we might be able to capture the
data behind them but some of those interfaces there that can’t be
comfortably web archives or we don’t yet have explained to capture them I’m just
was calling out i think i’ll spend a just a moment on the workflow an event
and that goes but mostly I was made to
really encourage folks hoping I think differently about this so in the
original one killer that we looked out here at Penn for our event in January
the idea was there would be people who are who are figuring out what can go to
the Internet Archive and pushing it there and what can’t those are the
feeding and serving and then once one site has been identified as containing
data that can be pushed through the internet archive advantage to process of
research and then harvesting that data whether it’s download or using an API or
scraping an interface to get the data behind it and then we take some measure
of preservation staff basically baguette being a library of congress protocol and
then it gets contrived and share through our second instance there are some
exceptions to the where where they’re being shared when you haven’t figured
out yet how to get some of the bigger data sets there or haven’t done it yet
but for the most part that’s the process and the more we learned though the more
we tried we are sort of learning what we and then in the larger data community
you have known for a long time which is for proper preservation describing
really can’t combat late in the process a huge part of the work of adjustable to
be describing and researching and creating data sets out of the web pages
and associated interfacer show that they can be better harvested and preserved in
the future hoping we can create plans create an understanding of what
constitutes a data set for sharing that we are hoping to see some experiments
and pilots and libraries retail take on that project and yes to that next slide
just to illustrate what we mean there many cases expect from which we want to
see he returned package of days the appropriate data needs to be me I
described a little bit data files are pulled from the interface or ideally
that works with the agency to get them without scraping but actually from the
agency and then the associated contacts that might come from those HTML pages or
data that is actually created to describe the information is added and
research data set is composed and then we can back it up and think about it in
that way so that’s kind of the direction that we’re we’re we’re hoping that
events will turn and it’s certainly the direction that we think is the most
productive going forward I’ll just talk briefly about the end of carom crawl
which we lovingly call eot so we’ve been the end of term crawl has been happening
since 2008 every four years obviously at the end of each presidential turn a
group of interested folks have gotten together with no budget and no direction
really including the internet archive library Congress University of North
Texas California digital library George Washington University Stanford and
several others we’ve we basically start you know six or eight months before the
the term is going to end and we collect seeds both from prior crawls as well as
asking for nominations of seeds in order to crawl the web and do as much of that
as we can this time around we are focusing on also HTTP HTTPS but also FTP
and social media which we didn’t do in the 2012 crawl the George Washington
University is focusing on social media this is you probably don’t need to know
the technical aspects of it but we’re not crawling social media per se but
more like querying api’s to collect social media Facebook and Twitter
YouTube those kinds of things we’ve got hundreds of volunteers this time
nominating seeds many from data refuge and data rescue events which i think is
is really amazing that these grassroots efforts came together and and fit found
a found a space to fit with the end of term which is sort of
throwing a big wide net around the gov milk domain so this time around we’re
we’re looking at collecting somewhere around 250 terabytes of harvested
content data we’ve got 9,000 social media accounts lots of domains lots of
gov as well as non dot gov domains that are government organizations things like
commissions and those kinds of things that might have a dot org or whatever
the crowd sort of crowd-sourced nominations of seeds is particularly
amazing this time around as you can see in 2008 we had 26 nominators and 457
URLs this time around we had 393 nominators some of those individuals
some of those events as quote unquote nominators and over 11,000 seeds were
given to the to the end of term crawl were nominated to the crawl this time so
it’s been a it’s been a whirlwind eight months of capturing lots of lots of
government information lots of government data we will put the entire
crawl texture movie will be indexed and searchable from the end of terms website
which CDL hosts California digital library hosts you can currently get
access to the 2012 and 2008 crawls and 2016 will be coming to all of the data
will also be hosted in internet archive but as well there’ll be a copy at
university of north texas and i believe library congress so there will be
redundancy and and multiple access avenues to this data so i think this is
a lot of exciting project that comes together in at a particular point in
time and I’d like to ask both James and Bari to talk about what’s next for each
of these Jack so for a data rescue me you know as
we were having an event and it was so exciting and as I said you started to
see that you know what what everyone already knows which is that we need a
more sustainable plan we need to take advantage of the tremendous work that’s
been going on within the government documents librarian also within library
development but beyond that one of the amazing opportunities in this has been
our connections that we’ve been able to make with open source software
developers and especially with people who have been really active in the open
governance government since people who made data.gov the people who have been
working on open data effort in cities and states around the country and the
work that they’ve been doing connecting federal federal state and local data
producers to to get them to share and make public their data and I was we’ve
seen that that effort has been really great cessful you know data gov on the
federal government side has hundreds of hundreds of thousands of Records for
open data and has made that data available and searchable that said it
those efforts are not made generally with an eye to the preservation beyond
the data producers interest in maintaining the data and that’s where we
are moving towards of what we’re now calling the libraries plus network this
is a collaboration right now with Association research library can you
expect to bring other collaborators on students that the first thing that we’re
going to do is an event that is quickly filling up that where we really want to
bring together some of the people from the open data and open government
communities together with people and research libraries to begin thinking
about what a revived federal depository library program with that is complete of
as really for the kind of form digital data kinda different understand that the
chances in the current environment for federal regulations which what would
help project casa we need to be advocated for
so we often making plans to help and support the long-term sustainable access
to this data without assuming that the federal deficit for behavior of colorful
folks coming to that meeting includes Mozilla Foundation agency producers from
NOAA like NASA and then folks from within a few of the library that
contributed early on but also from California digital library as well as a
lot of like data burst and dls and haughty trust at icpsr at agenda for
open science and on and on so we’re really excited for the comedian where we
hope to sort of coalesced a group of people who are able to take your
solution can say to resume plans that might pull together some as Laurie I
said and we said in this whole webinar this is a huge issue it’s it’s not only
it’s great that we’ve had so much energy to do this at this point in time at this
point in history but how do you how do you sustain efforts going forward this
is a huge issue both technically and socially politically and one in which
there’s plenty of space and frankly plenty of need for collaborative action
so I’m really glad that they are l and all these other groups are coalescing
there’s another group called Peggy which is the preservation of electronic
government information that held two summits at the fall and last spring TNI
meetings to get some other to get interested people in the same room
discussing these kinds of issues so I think those the libraries plus network
and Peggy will probably coalesce into into one larger movement as you know and
the term has always been an ad hoc project among interested people in
organizations but there’s never been direct funding for the test
so I’m hopeful that direct funding will be coming but I think there’s there’s
two other things that are that are key to these efforts going forward one thing
is we must we must base our work on accepted standards and guiding
principles this is this is critical for any effort going forward so these are
these are just the guiding principles that that we’ve sort of written about on
on frig of info before these these these governing principles are come out of OA
is which is an accepted digital library standard and by the way is a close
relative at least in my mind to rain that owns five laws of library science
which our books are for use every reader of your book every book its reader save
the time of the reader and library is a growing organization so this is just
sort of an updated guiding principle I think of renegade items I take ideas and
ideals also since this is such a huge issue any effort going forward must I
think include the publishing agencies the people and the organizations which
use government information like politicians and policy experts
researchers think tanks government watchdogs and even the public and
therefore we need to assure that non librarians and non library organizations
are aware of library issues and are able to cooperate in efforts going forward
but also that those as library efforts are informing policy going forward so
we’ve spent a lot of time chasing the chasing the data tail for the last eight
months of both data refuge and these other grassroots efforts and the term
crawl and the Internet Archive frankly has been chasing that tail for a long
time so I think what’s what we really need at the policy level is is a way to to structure government policy so that
these these efforts are not always chasing the tail but that preservation
becomes part of data publication and and gov information publication in general
so Jim and I have have written a little bit on the library’s dot network web
page sorry I didn’t give the link here but I can I could send it later about
the idea of information management plans this comes out of the idea for data
management plans which I’m hoping that many of you only know about this on this
channel the idea that if you receive federal funding from a government agency
any data that you create in the research process must have a plan for
preservation going forward so we’ve sort of turned that on our head on its head
and said that government agencies should have information management plans so
that they may parse forward-thinking and obvious about what they’re going to do
in terms of preservation and access going forward I should note that Peggy
is also having a meeting in May the week after libraries network so and there’s
many people in eggy that are going to both including myself making so our
final question was what I think is the biggest question that our discussion we
based on which is what can everyone what can anyone what can all of us do to
participate in this work and to move it forward so I’d like to start with Lori’s
answer to that question thank you and I think this and I’ll answer the question
that was in the chat box as well but but the basic answer as far as I’m concerned
at this point is experiment your problem is really really huge it has so many
angle and I think our experience has did you know I have learned so much and my
community has learned so much in trying to tackle actually solving a
chink of it and so I think when when people have been asking of how to get
involved certainly you can go to the data refuse website at pph south with
the upper ter Ellen the chat thing just an event but really um the more people
just take a stab I think we hear so many questions about how do we make sure that
we’re not how do we make sure that we’re not duplicating efforts I can say is
that we community so much to learn something that then actually I’m not
sure duplication of effort is a problem we duplicate effort all the time we all
learn how to do some of the same things because they are so needed and so
learning how what does it take what is it take for your organization to to
attempt to tackle a piece of this problem what does it take for what’s the
sort of technical needs whether the social will edit your kratika he’d wear
this resource me and how much does it cost these are all questions I think our
community we get to need lots of more evidence from experiments before we know
what the best factors are going to be and so I would say start small you know
take we we’ve talked to people who you know I’m thinking of a librarian who
work to be american institute of architecture he said you know what
architect us but I work with need they need this data so I’m going to make sure
that my library has this data because that’s my job and whether someone else
already have it that’s not that ok I’m going to make sure that my library is
this data and I’m going to build a collection that includes web archive
data set that needs this need I think we do need a much more broad hoping plan
some of which we’ve been talking about another call and I think that will come
up with you various Navy Incas forward but more than that we need everyone to
just jump on board with tying something and we’re trying to do it with like
health library feed this work as our work so my that’s that’s my kind of plea
to experimentation hi thank you Carrie um I i think this amazing is that
a lot has already been done to rescue data in this last few couple of months
but data preservation can’t be done by just a few volunteers and a lot of this
has been done by volunteers and it also can’t be done by communities of data
users by themselves a lot of it’s been done by users so far this is a time of
libraries I think need to accept their traditional roles that is selecting
acquiring organizing and preserve and information and providing services for
those collections of information and I think there’s a place for everybody to
work on whether you work in a library or not whether your library is tiny or huge
whether your institution or Management Authority supports this kind of activity
or not there’s a lot that can be done you’re already aware of the issues on
your first job should be to keep informed because things are changing
fast particularly as long as we experiment and develop some new
techniques and tactics then you can share what you know and understanding of
your colleagues and your constituents is your users and you can participate
individually when possible and also with the help of your own institution you can
also work with your professional groups to forge alliances for action I had a
brief conversation with a librarian College just this week who is telling me
about how libraries just won’t work together they’re competing against each
other schools are competing for for students and tuition and reputation
other libraries are competing for grants and they all want to work by themselves
this area that we’re looking at today if government information can’t be done if
we work by ourselves individually we have to collaborate on this and there’s
a lot of room no matter what your skills are whether you’re
your skills are political and pushing forward activities or organizational to
get groups together to work or technical to do things or government information
skills where you can help identify collections that can they need to be
preserved we all have to act to preserve government information this is James
again yeah just one a quick point to to jump onto Lori’s idea that duplication
of effort I don’t see that as a as a problem I I actually think that is an
amazing part of of what has happened in the last eight months because you have
all of these communities of interests or the oais parlance designated communities
who have jumped in and started acting and this has caused a far better and
more thorough preservation effort and then the end of term could do or that
any one organization could do so i really i really think duplication of
effort is a is not a bug it’s a feature and it’s an amazing thing to watch going
forward there are some avenues of participation going forward that that
you could start doing now there’s the University of North Texas has put up an
interface to they pulled out a subset of the PDFs that end of term has crawled
and they’re they’re doing an experiment in cataloging so if you’re if you’re
interested in cataloging you could go in there and help catalog PDF this is this
is a metadata is a critical and and often underfunded what a piece of this
whole function of preservation you can start planning now for the end of term
nomination for 2020 we’ve stopped receiving nominations for this crawl but
2020 is coming pretty quickly you can also start preserving what
pages that you are interested in for for yourself or for your users or for your
research communities and the Internet Archive has written a nice long post
called see something save something on ways that you can assure that web pages
and web content is already in the internet archive and if it’s not how to
get it into the internet archive I’ve had some conversations as well with that
some folks at University of Michigan and some other folks about this idea of
using these universe which is a crowdsourcing public science interface
and using using Zooniverse to help metadata and description of of crawls
content that’s in the discussion phase now but it’s something that that
hopefully will be happening soon so there’s lots of ways to pitch in now
even if even if you don’t have support from your administration even if you
only have an hour a week there’s there’s lots of work to be done and we hope that
will jump in sorry can I just jump in there before we go to the questions and
discussion just to also say you know events are still happening I think I’m
bang we’re really experiment with day rescue of it but they have been very
special and seeing people out I’m certain encourage you if this is push
data rescue has been in the segments that will work in your community do it
you’re welcome to use it once loaded subscribe in our pages but we also i
would love take a stab at doing some other things in the collection and
another lady and then I also just want to call attention to the endangered data
week efforts at pls and supporting the sponsoring that it again another way to
do event our my experience with with running of this event and seeing what
happened in communities around wit raising awareness of what this means is
shown has been shown empowering and amazing to see you know our libraries
over on by people who are there not just to
learn but to actually take action so um just once again I would really encourage
people to think of trying to think of ways that people can help with this work
because they want to and it’s a great way to help them learn about what’s
important and get better at data management themselves thanks Gloria
that’s a great point too because so much of the work that’s been happening has
inspired a larger part of our community to be engaged in these issues and as we
move between the building developing these workflows and collections to both
sustaining and changing the way that as libraries were managing and thinking
about this content but then also increasing our voices when it comes to
the policy side that’s a bigger group of people who can help us do that work and
as Jim and James have pointed out in the long run that is going to be crucial for
it for the folks for access to what we were working on right now and of course
making things better in the future so thank you to all three of you poor for
bringing these ideas I’d like to open up to the whole group to see if there are
questions or discussion points you’d like to raise either for individual
members of the panel or for the group at large there was one person that came in
a few saying that duplication of effort is good however the first question from
an administration is this a data is this data available elsewhere so in terms of
administrative support how do you make a case for duplication my experience has
been here has been first of all there is great duplication and it certainly will
check item vs neuro check other big repositories but right now they’re the
problem is so incredibly huge that that data is pinned by a community you know
fair to note you just are in a position to TJ yeah that that way is fine except
for I would say so to my perspective except for icpsr
I would say if something is you know an different from gate of refuge and I’m
very proud of the work that we’ve done and I think we’ve got that’s a really
really important thing but it’s not like this stuff is so safe that no one needs
to worry about it you know if we captured sings at a particular moment
and so my experience they may convince my it haven’t been a convincing issue at
all it’s been a question of we are working with faculty who care about this
data they want to see it back up it is true that it is vulnerable and so making
the case that it’s vulnerable of a me because it’s true it’s not true that
it’s vulnerable for silly political reasons for dr. K it’s still vulnerable
vulnerable for all the reasons that change engine we’re talking about
earlier and perhaps the fact that this has been a project in collaboration with
the program and environmental humanity here that it’s been a project in
collaboration with scientific researchers has made that alot easier
for us this is Jim I’ll say two quick things about it first of all redundancy
is good so it’s you don’t want to you don’t want to duplicate effort but you
do want to have more than one copy of things so as James was pointing at it
that the end of term archive is going to be stored and at least three different
locations that’s a good thing you do want to avoid redundancy of effort to
get that stuff but that we took my second point which is this isn’t just
about getting the stuff and storing it it’s about discoverability by users and
usability by users different users will want different collections of
information for different reasons and they will search for information
differently and they will use the information that they get differently
and that means that in some cases will actually need the same information in
different collections some of that can be done to raise the eyes and interfaces
to the single stored file or group of files but some of it will mean that we
want to build collections at organized collections for communities of
users and that may mean it will have the same information in more than one of
those collections and that’s okay that’s a good thing I think there’s also a
really good argument to be made in terms of building local capacity for certain
kinds of works so if we are participating in these larger efforts
and joy whether it’s taking a little piece of a very large project or even
learning the same steps are being used in a workflow that’s been developed
somewhere else that improves our ability to then capture and preserve a provide
access to content that no one else is going to be managing in a lot of cases
for for research libraries maybe local government information so data from
local local organizations local governments at other kinds of content
where we are the only or that we are the primary interested party and the only
potential sewage for that content and just one last point I think my argument
that i always make is that format doesn’t matter just like if the library
is going to the library’s users are interested in us in a in a book you’re
not going to say well they can get it on Amazon where they can get it through
interlibrary loan if your users need a book and it behooves you as a library of
a librarian to collect and give access to that book by putting it in your
catalog you don’t know if your users will find that book in your catalog or
will find it through a general web search or some other kind of search or
through a friend and so having more avenues of access is always a is always
a good thing whether it’s a book or whether it’s a data set oh and I’d like
to add one more thing very quickly if you’re in a conversation with an
administration that’s saying I don’t want you to do this because somebody
else might do it there’s a second tack that you can take which is collaboration
is important between everybody that’s doing work so if your administration is
saying I don’t want you to start getting stuff because somebody else might get it
which your administration might support is your work on helping collaborate to
make sure that there’s not unnecessary redundancy and identifying things a way
to make sure that there’s sharing of the experimentation of getting stuff a way
to make sure that libraries are working together collaboratively and not
individually and your administration might support that one should be there
kind of if you’ve done if you come of it how do we know the time not even done
before and you’ve done a kind of person research chances are you’re Curson
research is better that your users cursory search and so if you didn’t find
a safe excitable backup copy version of this in a collection other than than the
original then chances are your division leader and so so that’s kind of a easy
like I didn’t heat out there so they will either so we have a question about
if a group wants to preserve for work on data specifically is there a
step-by-step guide to get started or specifically how can a group get started
on something like hosting a data rescue event I’m sure so show there’s I put the
link in there to the data rescue the data rescue page also which PB eh labs
org and then solve the data refuse page and there’s lots of information there
about how to host an event I would say the first thing to do is make
connections within your community whether you know whoever is within the
library outside the library but make sure that you have connections that
abroad son of collaborators who agree about what they want this event to be
and what’s important about it and then and then there’s also a URL there and I
will take it there again but there’s also i put a link in there to the way
that an event in portland worked and this is a it’s a little it done to have
and it might not appeal to some but it’s actually a really
cool way that they actually went throughout Canada an event that where
instead of servicing data they actually have people make metadata about data
sets that exist which is I can say from lots of conversations people inside
government something that they would really like some help with and so
there’s also we have some blog post up at the library’s network which i’ll put
up after which are written by james in gym and we would really welcome more so
you feel like you know a way to get into a new way of tackling this problem we we
really excited to hear about it and put it to the website we’re trying to sort
of gather information about what folks are doing and working right now on a
little bit of a who’s doing what in terms of the efforts that are going to
dinosaur tracks but yes you have a really specific workflow that’s like
exactly what to do that’s linked from the data refuge page okay it looks like
we are out of time here I want to thank everyone for attending if we’re
listening to this and also for your interest and excitement about this work
and particularly thank you to Lori Allen Jim Jacobs and james jacobs for being
willing to talk about this work today and to linda kelvin for hosting us we
look forward to working with you and please send questions to any and all of
us we’re always happy to talk about this some more thanks and we’ll post some
some some links in our slides over up for you have info dot info