Betsy Kruger’s Notes from the Open Content Alliance Workshop
October 20, 2006 – San Francisco, CA
1. Highlights of Brewster Kahle’s presentation
- Google continues to “keep the mindshare” around online books; Brewster feels that if Google would move 5 degrees to the left, we’d have one project.
- Microsoft, which has funded much OCA scanning without any restrictions on content use, is now talking of starting to impose some restrictions, although these will still be fewer than Google’s restrictions. The OCA is trying to ramp up fund seeking from non-commercial foundation sources.
- Big challenge—Delivering services with our content to show “the power of open.”
2. Presentation by Robert Miller, Director of Books, Internet Archive
- Reviewed OCA’s activities during its first year, 2005. Experimented with sending over 100K books to India for scanning, but problems with QA, workflow, metadata creation, and lack of willingness on the part of participants to have their materials sent overseas for scanning led them change direction. 2006 saw the opening of two scanning centers in the U.S. (both in California) and one in Toronto, Canada.
- Discussed the factors needed for success in these centers, as well as the financial, staffing and facility related requirements. Described deliverables and types of materials that can be scanned with OCA Scribes
- Discussed book flow and quality assurance issues.
- Said OCA is becoming “fanatical about quality;” wants material to look really good online; have instituted statistical sampling and an immediate pass/fail system in their quality control workflow.
- Link to Robert’s presentation: Books: Looking Back, Looking Forward http://ono.cdlib.org/shimenawa/oca/rmiller%20oca%20200610.ppt
3. Presentation by Robin Chandler, California Digital Library
- Dawn of the Embedded Library--Understanding our Users in their Space: Tools and Collections and where we need to be
- “Touchless Curation: Mass Digitization Collection Management--What’s required by libraries
- Reflections on the Future--Tools and Collections needed by Users
- There’s LOTS of excellent information in Robin’s presentation, which can be found at Picas to Pixels: Collection Management and Mass Digitization http://ono.cdlib.org/shimenawa/oca/rchandler%20oca%20200610.ppt
4. Short presentations by vendors and others:
5. Presentation by Bill Carney, OCLC on WorldCat Synchronization Program
- Registration of digital collections via synchronization--compiling libraries' digital collections and pushing them out to the network
- Described and showed diagrams of how synchronization process will work
- OCLC is working with Google Book Project, OCA, Microsoft Live Books, and other mass digitization projects
- Design meetings are occurring now; followed by pilot project and a phased implementation
- Bill's complete power point is at http://ono.cdlib.org/shimenawa/oca/wcarney%20oca%20200610.ppt Very interesting--check it out.
6. Presentation by IA staff member Raj on the Open Book Factory
- A web-based application developed by IA to perform automatic skew correction and cropping for digital images scanned from microfilm
- Can upload scanned images from microfilm to the Open Book Factory, then use this tool to crop and de-skew very rapidly. This might be something we'd want to look into more.
7. Presentation by Sayeed Choudhury, Johns Hopkins University WHITHER THE OPEN LIBRARY?
- Sayeed gathered feedback from the group on the service layer products on display at the workshop, as well as other products scholars want and need.
- Group also discussed relationship between strictly academic and marginal for-profit exploitation, and the implications of both for OCA.
- Funding for digitization projects through faculty projects/grants
- OCA needs not just ideas but a structure
- How do you search a book in any useful way-- XTF from California Digital Library is designed for book searching http://www.cdlib.org/inside/projects/xtf/
- IA now has an API to the archive
- OCA--joint collections with unified service layers on top -- "Killer applications;" Brewster wants IA to be a backbone and a facilitator
- Question: Sustainability of "openness"--openness needs strong advocates right now. Need to reach out to sources like the Wellcome Trust, which provided funding to keep the human genome project from going down a propriety road
8. Other bits and pieces of information:
- OCA file storage estimated at 4MB per page (2MB of that is JPEG2000 archival file)
- Michael and I were approached by Thomas Garnett of Smithsonian and the Biodiversity Heritage Library, re: content our library might contribute. I've contacted Bryan Heidorn, Diane Schmidt, and Beth Wohlgemuth about this.
- CDL folks (per Robin Chandler) are testing a METS feeder to grab their files from the IA website and bring them into Melvyl
- CDL does QA on their OCA content at the point of ingesting it into their digital preservation repository
- Need to load our OCA metadata with as many "handles" as we can (IA #, OCLC #, Voyager ID, etc.)
- In conversation with Mary Elings in UC's Bancroft Library, learned that of 4,200 potential candidates for OCA scanning, 40% had to be rejected due to tight bindings, inadequate space in gutters, condition issues, presence of foldouts (although OCA is working on being able to accept books with the latter). Drove home the need to me for us to begin VERY SOON on selection for OCA. We are making some progress there, but really need to ramp up.
Comments (0)
You don't have permission to comment on this page.