Journal of English Linguistics

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to register

Click here for free access to the SAGE eReference platform!

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by MacQueen, D. S.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Journal of English Linguistics, Vol. 32, No. 2, 124-143 (2004)
DOI: 10.1177/0075424204265944

Developing Methods for Very-Large-Scale Searches in Proquest Historical Newspapers Collection and Infotrac the Times Digital Archive: The Case of Two Million Versus Two Millions

Donald S. MacQueen

Uppsala University

Historical corpora designed for linguistic research are often too small to provide statistically robust information about infrequent items. Alternative sources exist in the form of historical collections available online, but these databases may present methodological problems. Some of these problems can be circumvented, and useful results can be gleaned, including a proxy for incidence. In studies on the integration of the word million into the English system of number words, based on billions of words from historical newspapers, it was possible to determine that parity was reached between obsolescent (two millions) and Present-Day (two million) forms in American papers around 1880 and in The Times around 1920. The explosive growth in the use of million proved to start with WWII in the U.S. and in the 1950s in the U.K. This information could not be teased from a 20-million-word ‘megacorpus’ of commonly used diachronic and synthetic corpora designed by linguists.

Key Words: historical corpus linguistics • number words • s-inflection • newspaper language • American-British variation • electronic sources


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?