John Unsworth at CLIR Camp plus Libraries and Scholars Interacting Over Time

John Unsworth spoke at the CLIR-DLF postdoctoral camp last week. I thought he presented some interesting ideas about the emerging cross-connections between libraries and publishers, as well as some speculations about the development of scholarly data communities around the existence of large-scale data resources.

Unsworth works at Brandeis as a vice provost, CIO, and university librarian. He also serves on the executive committee of the HathiTrust Research Center. As part of the HathiTrust he has been directly involved in the development of a 11 million digital collection of books and journals scanned from the collections of member libraries. Providing access to these data is a challenge which the HathiTrust is currently working to solve.

On the one hand there are the persistent problems associated with intellectual property. The ownership of books when publishers have disappeared or authors are deceased is a complex problem. Access to works in the public domain is a bit easier from a legal standpoint but is still a challenge for the technical designers. The most likely solution will be a system which allows researchers to upload batch programs to HathiTrust database. Once uploaded the system will disconnect from the researcher, connect to the HathiTrust database, run the research program, and then send the results back to the researcher. A potentially cumbersome process that is determined by the legal challenges of copyright as much as the technical challenges of doing research on such a massive amount of data.

The discussion of the HathiTrust led to one of the most interesting comments Unsworth made during his presentation. He proposed that research will increasingly be focused around large data sets, such as the HathiTrust database, which will lead to the development of research communities surrounding accumulated data sets. The idea of data communities is intriguing and may represent a transformation for the way libraries currently relate to academic researchers.

Before his presentation Unsworth distributed a link to a Windsor lecture delivered by Andrew Abbot in 2008 at the University of Illinois. Abbot discussed “Library Reserch and its Infrastructure in the Twentieth Century.” In the lecture Abbot reflects upon his experience as chairperson of a task force on the future of the library at his university. He turned the administrative assignment into a research problem about why expert academic researchers ignore the detailed bibliographic tools and databases provided or compiled in libraries for a more ad-hoc search, evaluate, and search again process.

that library researchers have projects with clear designs is a myth. A few library researchers may actually have such clear designs. And the rest of us pretend to have them before the fact. And we all force dissertation students to pretend to have them before the fact. But it’s all a myth. We don’t have clear questions ahead of time. The logical sequence of our articles is unrelated to the chronological sequence of our investigations. Our graduate students’ pretend questions in their proposals are not the ones their dissertations will end up answering. Not only is known item searching a relatively minor part of expert library research, precisely structured research questions are also a relatively minor part of expert library research. They are its result, not its beginning.

The polemical purpose of his lecture was to show that the digital library is not a novel invention. Powerful library tools have existed for many years, even before the development of the internet, in the form of specialized reference works. So the question began to morph into why these tools were not being used and when they became less attractive for scholars to consult.

Maybe they were too powerful for their own good. Maybe the academic knowledge system had broken down before the internet. If the crucial problem of the internet is welter - the sheer availability of too much stuff - surely we had gotten to that stage long ago. The 1970s looked like the disaster point to me - the time when the demographically-driven rise in hiring standards pushed publishing to epidemic proportions, when the social sciences and humanities citation indexes brought the most ephemeral and third-rate publications onto the same page as the elite core, when it became impossible to do a comprehensive bibliography of anything.

Abbot goes on to demonstrate the wandering process through which he went to modify and expand his research question to find out when scholars and the academic libraries began to go separate ways. His insight comes as he investigates the history of departmental libraries.

A huge shoe drops as I am looking around here. The topic that I am struggling with is related to departmental libraries. I figure this out as a problem in bibliography…I see that departmental libraries are a metonym of the argument I should be making. Departmental libraries are where scholars wanted to do most of their work - because of the relative density of tools. Everything was at hand, just as it is the wonder world of the internet, only departmental libraries were better because all the tool and only the tools were immediately in your hands… Departmental libraries are limited but highly ordered and highly particular subsets of the library. It was the librarians’ contention that there ought to be one master index, but the research scholars always want partial indexes, indexes slanted their way, organized by their way of seeing the world, not by a generic view from nowhere.

From this realization Abbot concludes that major change for humanists and humanistic social scientists occurred in the 1920s when departmental libraries across the United States were consolidated into large centralized buildings. The result was a divide between research and teaching. Going to another building, the library, became necessary in order for researchers to do their work. Searching for material that was merged into one large grouping instead of walking down the hall to a departmental library where everything was neatly organized in a single collection with no extraneous material.

I believe that library research was in fact already in something of a crisis before the arrival of the internet and the digital library. The mechanism of that crisis are rooted in processes continual since the 1920s. The librarians have pushed for centralize reference tools and bibliographic structures, counting on indexing to save the day and guide the investigator through the welter that comes with increasing power to locate and access material. Their central metaphors have always been scientific, their poster children for success have been the natural sciences and in particular chemistry, and their model has basically been to make the library a universal identification, location, and access machine. The digital library world is in that sense simply the latest version of a quite familiar paradigm.

By contrast, library researchers started withdrawing from this universalist project in the 1920s and gradually erected a system of specialty tools and a set of research practices that enabled them to bypass the hugely inefficient searches that were the only possibility under the universal bibliographic system. By the 1950s and 1960s this alternate system of specialty tools and practices was mature. It could therefore survive the race to the bottom that culminated in the ISI databases on the one hand and WorldCat on the other. But I am not sure it has survived the current mass of work and the degradation of its crucial database of published citations of high quality articles.

Abbot makes no predictions about the future, nor does he claim that the past system was a utopia for the researcher, instead the shifting environments of the library make some research questions easier while making other methods more difficult. Computational methods may become easier in the future, especially as researchers become more adept at building digital tools to mine the welter of information provided by the internet and other resources. But other scholars will continue to use older methods which depend on wondering the proverbial stacks in search of better formulated questions.

The danger for library design is that we will design for some idealized version of research and destroy the environmental niches which are still fertile for many scholars. I think this is a big motivation why some scholars fear the digital humanities. They fear another enclosure like movement where the discourses and power of science become synonyms for good research and are then implemented in the infrastructure of the digital future making it impossible to go back.

I recently moved to the University of Alberta and I’ve been adjusting to the new library research tool. According to a poster I found in the hallway UofA recently moved to using an integrated search system provided by Ebsco. I’m already frustrated by the system because it lumps everything into one giant bucket - journals, books, archives, theses, dissertations, and more. When I go the library catalog I go because I want to look up a book. If I want to find an article then I’ll go to a subject database like Web of Knowledge or LITA. But newer searchers, especially undergraduates, only have the experience of using Google and expect to find everything lumped together in a single search box.

At the CLIR-DLF postdoctoral workshop there were a number of times when the goal of unification and centralization raised its head, but mostly it remained a tacit assumption beneath the surface of our discussions. Projects like the ARL effort to connect digital repositories, DataONE, and many others are focused on bringing collections together into centralized databases. Is this centralization really a good thing?

At the same time the idea of data communities focused around particular data sets or resources pushes the library and information world back toward a research model that last made sense in the 1920s. My argument, if I have one, is that we should accept a diversity of research models. Not every database needs to be interconnected with other resources, sometimes disconnections can be beneficial. Development and growth often occur on the margins where the freedom to experiment is greater and the threat of central control lesser.