About this Dictionary
This electronic Dictionary of the Irish Language (eDIL) is a digital edition of the complete contents of theDictionary of the Irish Language based mainly on Old and Middle Irish materials. Publication of the Academys Dictionary began in 1913 with the appearance of the first fascicle (D-degóir) under the editorship of Carl Marstrander. The next fascicle (E) did not appear until 1932, and in 1936 the Academy moved to expedite publication under the revised title, Contributions to a Dictionary of the Irish Language. However, work on several other fascicles was then already at an advanced stage and despite the changes in title and format (with 86 lines to the column rather than 65 as in the original scheme) the Contributions closely followed the original plan. Subsequent fascicles appeared at more or less regular intervals thereafter, and the Dictionary was completed under the general editorship of E.G. Quin with the publication of H in 1976. In all, the Dictionary comprises 2,525 pages in 23 fascicles and approximately 35,000 entries.
Despite justified criticism, the Dictionary has been an invaluable tool to scholars and students since its publication began, and it is the most comprehensive and ambitious dictionary of the Irish language ever compiled. There can be no doubt that, were it not for DIL, Irish textual scholarship would now be in a much more parlous state than it currently is. The decision to include supporting citations has provided a sound scholarly foundation for each entry, while simultaneously supplying the tools for its own critical evaluation and future development.
The difficulties in using the paper edition are widely recognised. It contains many inconsistencies and inaccuracies (some fascicles more than others); even headwords are not consistently rendered, and cross-referencing is less that full. It is the result of the work of generations of scholars and this reveals itself in varying editorial approaches from fascicle to fascicle. These editorial problems are compounded by the huge chronological span covered by the Dictionary (over one thousand years), the variations in spelling in the sources, the complexity of the grammar of the language and its impact on word forms, and the lack of adequate textual editions from which to work. It would have been desirable to arrange forms and senses chronologically, thereby illustrating the historical development of the lexicon, but the problem of dating Irish texts was, and remains, huge, and the editors were no doubt correct in avoiding this hurdle.
This digital edition will ameliorate many of these problems and for the first time users will be able to make complex searches of discrete data types such as translations, citations, grammatical descriptions and sources. It is hoped that the completed work will be of use to a wide range of students and scholars interested in medieval Ireland including linguists, historians, archaeologists, and geographers, as well as those working in Modern Irish. Students and non-specialists will find it a considerable advantage to be able to find the meaning of a word they encounter in a text without having to necessarily know beforehand which headword it will be found under. Editors will be able to search for matches for words which are only partially legible in the manuscript; linguists will be able instantly to compile lists of particular forms of words.
Nevertheless, the digital edition is somewhat restricted by the format of the original hard copy as our concern was with producing a searchable DIL rather than a revised edition. Thus, if DIL is inconsistent in its treatment of headwords, so is eDIL (although the fact that the user can now search the whole dictionary for a particularly spelling and employ fuzzy searching helps considerably). Similarly, definitions, senses, division of entries, etc. are as in the original dictionary and whatever errors occur there will be repeated here. Nor have we been able to address the problem of chronology of forms or senses. Some things have been improved. Cross-references, which often lead to a dead end or are vague in the hard copy, are corrected here. Parts of speech, usually absent in DIL, have been added throughout. While we have not attempted to verify external references, the automatic linking to CELT's corpus of texts will be of considerable use in tracking down forms.
The eDIL team is now beginning the task of revising the content of the Dictionary itself. This project will provide a supplement to the Dictionary based on lexicographical work published since 1932 (when the second fascicule of DIL appeared). This research will be integrated into eDIL and published as the work progresses, and we hope to publish it as a supplement to DIL when the project is completed in 2012.
The text of eDIL is identical to that of the Academys Dictionary, except that obvious errors have been corrected where possible and that the Additions and Corrections for the letters A-C and F have been incorporated. The original format of the Dictionary has been preserved throughout, and the original column and line numbers have been retained so as to allow references given in this form to be located in the electronic version.
In order to permit meaningful searches of the Dictionary, the digital text has been marked up in Extensible Mark-up Language (XML) following the guidelines of the Text Encoding Initiative (TEI) for Print Dictionaries. Adherence to TEI guidelines is intended to ensure non-dependence on proprietary software so that the Dictionary will remain accessible in the future regardless of technological developments.
The following discrete data types were identified and tagged accordingly:
Parts of speech are not routinely given in DIL and these have been added where they can be determined. It would have been impractical to check back to the original sources to ascertain the part of speech, so we have followed the internal evidence of each entry and indications given by the editors of DIL. In most cases, there is little doubt about a part of speech but we have often had to rely on the context provided by one or two citations. Where we have been unable to determine the part of speech with any certainty, for example where the citations provide no diagnostic evidence, the part of speech is given as indeterminate.
We have generally distinguished between definitions of headwords (including sub-senses) and translations of citations given in the text as examples. Users wishing to find the medieval Irish equivalent of an English word can, therefore, search through definitions alone, as this will lead them to the equivalent headword. Searching on the translation will not produce a direct equivalent of the search term but will enable the user to consider a wider range of equivalents from among the Dictionarys many citations. The single exception to this approach is in those entries which provide no formal definition or meaning for the headword; rather, the meaning of the headword is to be inferred from the translation of citations. In such cases, we have selected a word, words or phrase from the translation to stand for a definition. Where no translation or definition appears in the printed Dictionary none is offered here.
History of the Project
Funding for the digitisation was provided by the. This award allowed us to commission outside contractors to capture the text and build the search engine, and to employ two full-time research associates, Dr Maxim Fomin (2003-07) and Dr Tom Torma (2003-05), the latter being succeeded by Dr Grigory Bondarenko (2006-07). The Royal Irish Academy generously gave permission to digitally capture the text of the Dictionary of the Irish Language and copyright of the original text resides with it.
The text of the Dictionary was digitally captured by, an external agency with expertise in this area. The text was both scanned and triple-keyed (that is, typed in full by three separate typists), and the output of each of the three typists was simultaneously compared with each other and with the scanned version and any discrepancies flagged for further attention. This method, which is commonly used for capturing legal documents, produced an accuracy rate of 99.992%, that is, less than one error in every 10,000 characters. Many remaining errors, including errors in the original Dictionary, were corrected during the subsequent mark-up stage. During final editing, the text was digitally compared with the original captured text to ensure that no additional errors had unintentionally been introduced during the mark-up phase.
A structural analysis of DIL revealed that typefaces were used in a sufficiently consistent manner as to allow automatic XML tagging of a significant portion of the text. Formatting and structural layout of the hard copy, including fonts, line breaks, and column and line numbers were coded as HTML tags during the capture phase so that these could be used to automate some of the generation of more meaningful XML tags. Bold print is used in the hard copy for headwords and to mark section letters/numbers, so it was a relatively straightforward task to convert the HTML tag for bold to the XML tag for headword once the section markers had been converted. Similarly, italic is used in the hard copy almost exclusively for definitions, translations and lemmas, so once the definitions had been marked manually, we were able to tag translations and lemmas automatically. A certain degree of manual manipulation was then required, for example, to expand those headwords which are partially contracted in the hard copy. Much of the grammatical information (gender, stem, case, person, number, tense, mood etc.) was automatically tagged at this stage, and again visual inspection and manual correction were often required.
During the second phase of the project, Old Irish citations and variants of the headword in the body of the entry were manually tagged, along with definitions. Parts of speech, largely absent from the original, were also added at this stage.
In the third phase, translations of citations were tagged as described above and the language of the text was marked where this deviated from the norm (Irish for citations, English for translations, and Latin for lemmas). In order to allow searches on source and to enable the generation of links to the CELT corpus, sources and accompanying page references had to be marked appropriately. This proved to be a more difficult task to automate than might appear at first because of the considerable inconsistency of the abbreviations used in DIL, the bewildering array of possible formats of the page/line references, and the breaking of abbreviation and reference over two lines. Nevertheless, Julianne Nyhan of CELT was able to use pattern matching to successfully identify in excess of 95% of references and tag them appropriately. Her program also tagged orphans, that is, page references for which the source was elided, and linked them to the previous title through a unique identifier. The remaining references and sources were tagged manually. The inconsistency of the abbreviations has been allowed to stand in eDIL; rather, each source has been assigned a standardised abbreviation which is read by the search engine but is not visible to the user.
In the final phase of the project, outstanding problems were addressed and both the data and the mark-up were systematically checked. Each file was digitally compared to the original files to ensure that no errors had crept in during the mark-up phases. Common mark-up and layout errors were identified by visual inspection and corrected. However, it is certain that certain mark-up errors still remain, but they are of limited extent and significance. For example, where two citations appear close together without an intervening reference, they have sometimes been inadvertently treated as a single citation. Variant or inflected forms of the headword are supposed to be marked where they appear in the grammatical section of an entry, but they have sometimes been overlooked, and occasionally a word other than a variant has been incorrectly tagged as a variant. In the current interface, words marked as variants are extracted from the body of the text and displayed in a list in the left-hand column; users should be aware that these lists may not be complete and may occasionally contain forms that are not variants of the headword. However, we thought it of some significant use to display the variants in this way. It is hoped that these errors can be corrected in a future edition. The marked variants are also used in searches on Irish words to prioritize the list of search results: hits are sorted in priority order of headwords, marked variants, and finally any other occurrences in citations. Existing errors will have a minimal effect on this search function.
The interface was designed byin late 2006. Programming of the search engine and interface was carried out by between December 2006 and May 2007.
Gregory Toner, May 2007.
Members of Staff
Former Members of Staff
eDIL is supported by