TEI as an archival format - Aix-Marseille Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

TEI as an archival format

Résumé

The adoption of the TEI as a common storage format for digital resources in the Humanities has many consequences for those wishing to interchange, integrate, or process such resources. The TEI community is highly divers, but there is a general feeling that all of its members share an understanding of the best way to use the TEI Guidelines, and that those Guidelines express a common understanding of how text formats should be documented and defined. There is also (usually) a general willingness to make resources encoded according to the TEI Guidelines available in that format, as well as in whatever other publishing or distribution format has been adopted by the project. The question arises as whether such TEI-encoded resources are also suitable for long term preservation purposes : more specifically, if a project wishes to ensure long term preservation of its resources, should it archive them in a TEI format? And if so, what other components (schema files, stylesheets, etc.) should accompany the primary resource files when submitting them for long term preservation in a digital archive? TEI encoded resources typically contain mostly XML-encoded text, possibly with links to files expressed using other commonly encountered web formats for graphics or audio; is there any advantage to be gained in treating them any differently from any other such XML encoded resource? This is not an entirely theoretical question : as more and more digitization projects seek to go beyond simply archiving digital page images, the quantity of richly encoded TEI XML resources representing primary print or manuscript sources continues to increase. In France alone, we may cite projects such as the ATILF, OpenEditions, BVH, BFM, Obvil and many more for all of which the TEI format is likely to be seen as the basic storage format, enabling the project to represent a usefully organised structural representation of the texts, either to complement the digital page images, or even to replace them for such purposes as the production of open online editions. When such resources are deposited in a digital archive, how should the archivist ensure that they are valid TEI and will continue to be usable ? One possibility might be to require that such resources are first converted to some other commonly recognised display format such as PDF or XHTML; and indeed for projects where the TEI form is considered only as a means to the end of displaying the texts, this may well be adequate. But since TEI to HTML or TEI to PDF are lossy transformations, in which the added value constituted by TEI structural annotation is systematically removed this seems to us in general a less than desirable solution. We would like to be able to preserve our digital resources without loss of information, so as to facilitate future use of that information by means of technologies not yet in existence. Such data-independence was, after all, one of the promises XML (and before it SGML) offered.
Fichier principal
Vignette du fichier
TEI as an archival format.pdf (56.8 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02153026 , version 1 (13-06-2019)

Licence

Paternité

Identifiants

  • HAL Id : hal-02153026 , version 1

Citer

Lou Burnard, Nicolas Larrousse. TEI as an archival format. TEI (Text Encoding in the Web ) Conference, Oct 2013, Rome, Italy. ⟨hal-02153026⟩
133 Consultations
72 Téléchargements

Partager

Gmail Facebook X LinkedIn More