Cognate Linguistics

Home


Free Resources, Stats, and Much More


Students' collaborative class writing

° English Cognate Readers: "Eddie Beneti" (sample)
° Spanish Cognate Readers: "Eddie Beneti" (sample)

Project's Spanish Corpus

° Ruben Moran's Spanish Corpus

Cognates in the News 2015

Don't newspapers feature difficult text to read for language learners, specially at an introductory level? Well, not from a Cognate perspective.

   

In Cognate Linguistics we claim that, "even in the most disadvantageous cases, cognates represent at least 25% of the unique English written words met by Romance language speakers, and vice versa, when exposed to modern language."

That minimum 25% is truly literal. In the statistical data below, for example, Cognates actually represent an average of 30% of both Total and Unique words in the texts.

Now, do never forget that we cannot assume that English language learners will only understand cognates in a text not knowing any other English words. Interestingly, if we added to these statistics only the four most frequent uncognate words (the, to, of, a) the comprehension range would increase an additional 10%.

Here we present PrintScreens and Plain Text of the web frontpage covers of News sites, all of them retrieved on January 2, 2015. The data was processed using Prof. Paul Nation's statistical software Frequency (version 1.40) and Range (version 1.32). There might be a difference of up to 4 words between wordlists due to the existence of items (usually symbols or isolated letters) which the software might either take into account or omit in the statistics; however, this number of words is inmaterial in the statistical results.

The New York Times

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
753/34.54%
1427/65.46%
2180
Unique Words
420/41.50%
592/58.50%
1012


The Washington Post

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
610/30.42%
1395/69.58%
2005
Unique Words
374/36.45%
652/63.55%
1026


The Wall Street Journal

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
538/29.32%
1297/70.68%
1835
Unique Words
337/34.11%
651/65.89%
988


CNN

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
410/33.63%
809/66.37%
1219
Unique Words
259/35.87%
463/64.13%
722


The Guardian

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
907/32.81%
1857/67.19%
2764
Unique Words
501/37.95%
819/62.05%
1320


BBC

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
380/35.78%
682/64.22%
1062
Unique Words
255/38.87%
401/61.13%
656


Financial Times

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
658/36.27%
1156/63.73%
1814
Unique Words
379/37.52%
631/62.48%
1010


The Australian

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
730/31.70%
1573/68.30%
2303
Unique Words
411/35.43%
749/64.57%
1160


News 24 - South Africa

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
913/25.36%
2687/74.64%
3600
Unique Words
488/31.83%
1045/68.17%
1533


Shanghai Daily

Cover image  |  Plain text  |  Highlighted Text  |  Analysis  |  Unique words in frequency order


Cognates
Remainder     
Total
Number of words     
591/33.50%
1173/66.50%
1764
Unique Words
327/36.87%
560/63.13%
887


Cognate-Linguistics

Cognate Linguistics: Research's Support Documentation

(Cognates & False Cognates Database 2007 - 2010)

Here we present some evidence of English and Spanish cognates out of unique word analysis. As you will see on Table 2.1, text written before the 1900's sometimes defy the common minimum 25% of cognates in unique written words.

Wordlists

° GSL - The General Service List (West, 1953)
   Words: 2284. Cognates: 736 (32%). False cognates: 158 (7%)
° AWL - The Academic Wordlist (Coxhead, 2000).
   Words: 570. Cognates: 405 (71%). False cognates: 41 (7%)
° Brown 5000 - The first 5000 most frequently used words in The Brown Corpus of Standard
   American English (Francis and Kucera, 1964).
   Words: 5000. Cognates: 2207 (44%). False cognates: 184 (4%)
° Ogden's Basic English Word List (Ogden, 1930)
   Words: 850. Cognates: 245 (29%). False cognates: 49 (6%)
° CIDE Defining Vocabulary (Cambridge International Dictionary of English, 1995)
   Words: 3732. Cognates: 1375 (37%). False cognates: 148 (4%)
° Oxford 3000 Word List (Oxford University Press)
   Words: 3457. Cognates: 1441 (42%). False cognates: 207 (6%)
° Oxford Business - Oxford University Press' Business and Finance Words
   Words: 270. Cognates: 150 (55%). False cognates: 9 (3%)
° Brian Kelk's UK English Word List (First 20,000 MFW)
   Words: 20833. Cognates: 8316 (40%). False cognates: 431 (2%)
° Voice of America's Special English Word List
   Words: 1477. Cognates: 584 (39%). False cognates: 98 (7%)

News 2007 - Highlighted Web Covers

° El Telegrafo, Ecuador (Spanish cognates)
° El Comercio, Ecuador (Spanish cognates)
° CNN Español, USA (Spanish cognates)
° BBC Español, USA (Spanish cognates)
° Expresso, Portugal (Portuguese cognates)
° La Repubblica, Italy (Italian cognates)
° Le Monde, France (French cognates)
° LeTemps, Switzerland (French cognates)
° Jurnalul, Romania (Romanian cognates)

Business English

° From Deloitte's International Tax and Business Guides - Australia's Guide 2007 : 41%.

Vocabulary for admission to higher education in the USA

° TOEFL vocabulary practice, from supervoca.net: 44%
° SAT vocabulary practice, from freevocabulary.com: 46%
° GRE vocabulary practice, from supervoca.net: 43%

Miscellaneous

° From The Australian Constitution: 46.5%.
° From the speech "I Have a Dream", by M. Luther King, Jr.: 29.3% (spoken English)
° Frankenstein (full novel highlighted, 1,27Mb)
° Frankenstein (Unique words)
° President Obama's Inaugural Speech 
° From the headwords of the Taboo board game, by Hasbro, 1989 version: 37.6%.
° From the headwords of the Taboo board game, by Hasbro, 2000 version: 35.5%.

Table 2.1 Cognates in British and American Classics: unique word analysis

  Total   Unique Cognates
  Tokens Words
  By Charles Dickens
  A Christmas Carol, 1843289564392989 (22.5%)
  David Copperfield, 1850361758149333894 (26.1%)
   By Mark Twain
  The Adventures of
  Huckleberry Finn, 18851147667092868 (12.2%)
  The Adventures of Tom
  Sawyer, 18767224080061732 (21.6%)
  The Prince &
  The Pauper, 18827008186671947 (22.5%)
  By Jane Austen
  Emma, 181616036485042312 (27.2%)
  Pride & Prejudice, 181312223466582153 (32.3%)
  Sense & Sensibility, 189612285070912197 (31.0%)
  By Oscar Wilde
  The Canterville Ghost, 1906  116992534679 (26.8%)
  The Importance of Being
  Earnest, 1895209162646876 (33.1%)
  The Picture of Dorian
  Gray, 18917967872261932 (26.7%)
  By Shakespeare
  Hamlet, 16033263746351029 (22.2%)
  Macbeth, 1606191303311700 (21.1%)
  Romeo and Juliet, 1595268643722671 (18.0%)
  By M. W. Shelley
  Frankenstein, 18187502471242435 (34.2%)