Collaborative Electronic Records Project (CERP) finalized - Email Parser
The three-year Collaborative Electronic Records Project (CERP) of the Smithsonian Institution Archives and the Rockefeller Archive Center concluded in December 2008. Among the project outcomes, the CERP Email Parser was produced and we are pleased to offer it to the archival and related communities as an open source software tool for the preservation of email accounts. The Email Parser ( http://siarchives.si.edu/cerp/parserdownload.htm ) migrates an email account and its messages into a single XML file using the Email Account XML Schema developed in collaboration with the North Carolina State Archives and the EMCAP project.
The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.
The parser runs on a workstation in a virtual machine environment compatible with Windows, Macintosh, Linux, and some Unix platforms. CERP testing was limited to the Windows XP environment. The CERP Email Parser is licensed as open source software so that it may be used, supported, and enhanced by all organizations that adopt it.
The Email Parser is designed to address the task of preserving bodies of email, such as an account, without requiring access to the original email systems. Still, email accounts from active email systems may also be preserved using this tool. The CERP Email Parser will be featured in the pre-conference workshop “Achieving Email Account Preservation With XML” at the Society of American Archivists 2009 Annual Meeting this August.
For more information and to download the parser, visit http://siarchives.si.edu/cerp/parserdownload.htm. For more on the Collaborative Electronic Records Project, visit http://siarchives.si.edu/cerp/. Please direct email inquiries to FerranteR@si.edu.
Riccardo Ferrante
IT Archivist and Electronic Records Program Director
Smithsonian Institution Archives
600 Maryland Ave SW MRC 507
Washington, DC 20013-7012
The CERP Email Parser migrates an email account in MBOX format into XML, using the schema to preserve the full body of messages, together with their attachments, and keeps intact the account’s internal organization (e.g., an Inbox containing subfolders labeled Policies, Special Events, and Projects). The CERP team successfully preserved email accounts from a variety of applications including Microsoft Outlook, AppleMail, LotusNotes, and Netscape. All email messages retain their full header content, in contrast to some tools produced in earlier research efforts.
The parser runs on a workstation in a virtual machine environment compatible with Windows, Macintosh, Linux, and some Unix platforms. CERP testing was limited to the Windows XP environment. The CERP Email Parser is licensed as open source software so that it may be used, supported, and enhanced by all organizations that adopt it.
The Email Parser is designed to address the task of preserving bodies of email, such as an account, without requiring access to the original email systems. Still, email accounts from active email systems may also be preserved using this tool. The CERP Email Parser will be featured in the pre-conference workshop “Achieving Email Account Preservation With XML” at the Society of American Archivists 2009 Annual Meeting this August.
For more information and to download the parser, visit http://siarchives.si.edu/cerp/parserdownload.htm. For more on the Collaborative Electronic Records Project, visit http://siarchives.si.edu/cerp/. Please direct email inquiries to FerranteR@si.edu.
Riccardo Ferrante
IT Archivist and Electronic Records Program Director
Smithsonian Institution Archives
600 Maryland Ave SW MRC 507
Washington, DC 20013-7012
jhagmann - 20. Jul, 09:50