Content
TNC is a 50-million-word corpus consisting of written texts (98%) across a wide variety of genres covering a period of 20 years (1990-2009). 2% of TNC consists of transcribed spoken data. The distribution ofnumber of words in the corpus is determined proportionally for each text domain, time and medium of text.
Written Texts | |||||
Domain | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Imaginative prose | 9,290,712 | 18.71 % | 674 | 13.51 % |
2 | Informative: Natural and pure sciences | 1,380,481 | 2.78 % | 255 | 5.11 % |
3 | Informative: Applied science | 3,439,177 | 6.92 % | 461 | 9.24 % |
4 | Informative: Social science | 7,128,807 | 14.35 % | 673 | 13.49 % |
5 | Informative: World affairs | 9,768,340 | 19.67 % | 757 | 15.17 % |
6 | Informative: Commerce and finance | 4,472,310 | 9.01 % | 429 | 8.6 % |
7 | Informative: Arts | 3,633,316 | 7.32 % | 347 | 6.95 % |
8 | Informative: Belief and thought | 2,167,924 | 4.37 % | 297 | 5.95 % |
9 | Informative: Leisure | 8,383,236 | 16.88 % | 1097 | 21.98 % |
Total | 49,664,303 | 4990 | |||
Gender | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unknown | 677,918 | 1.37 % | 85 | 1.7 % |
2 | Female | 7,438,319 | 14.98 % | 884 | 17.72 % |
3 | Male | 25,673,465 | 51.69 % | 2506 | 50.22 % |
4 | Mixed | 15,874,601 | 31.96 % | 1515 | 30.36 % |
Total | 49,664,303 | 4990 | |||
Audience | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unknown | 4,504 | 0.01 % | 1 | 0.02 % |
2 | Child | 1,347,767 | 2.71 % | 128 | 2.57 % |
3 | Teenager | 1,006,277 | 2.03 % | 70 | 1.4 % |
4 | Adult | 43,750,084 | 88.09 % | 4388 | 87.94 % |
5 | Any | 3,555,671 | 7.16 % | 403 | 8.08 % |
Total | 49,664,303 | 4990 | |||
Media | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unknown | 10,397 | 0.02 % | 1 | 0.02 % |
2 | Book | 31,262,290 | 62.95 % | 2145 | 42.99 % |
3 | Periodical | 15,845,281 | 31.9 % | 2094 | 41.96 % |
4 | Miscellaneous: published | 947,824 | 1.91 % | 294 | 5.89 % |
5 | Miscellaneous: unpublished | 1,598,511 | 3.22 % | 456 | 9.14 % |
Total | 49,664,303 | 4990 | |||
Sample | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unknown | 2,166,031 | 4.36 % | 108 | 2.16 % |
2 | Whole text | 23,102,208 | 46.52 % | 3237 | 64.87 % |
3 | Beginning sample | 9,790,755 | 19.71 % | 669 | 13.41 % |
4 | Middle sample | 8,279,315 | 16.67 % | 577 | 11.56 % |
5 | End sample | 6,246,245 | 12.58 % | 394 | 7.9 % |
6 | Composite | 79,749 | 0.16 % | 5 | 0.1 % |
Total | 49,664,303 | 4990 | |||
Register | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Academic prose: Medicine | 714,46 | 1.44 % | 145 | 2.91 % |
2 | Academic prose: Social, behavioral sciences | 2,892,961 | 5.83 % | 432 | 8.66 % |
3 | Academic prose: Humanities/Arts | 2,604,645 | 5.24 % | 354 | 7.09 % |
4 | Academic prose: Natural sciences | 1,236,958 | 2.49 % | 251 | 5.03 % |
5 | Academic prose: Politics, law, education | 3,857,971 | 7.77 % | 587 | 11.76 % |
6 | Academic prose: Technology, computing, engineering | 1,653,909 | 3.33 % | 251 | 5.03 % |
7 | Administrative and regulatory texts, in house use | 155,054 | 0.31 % | 11 | 0.22 % |
8 | Print Advertisements | 22,311 | 0.04 % | 164 | 3.29 % |
9 | Biographies/Autobiographies | 2,372,093 | 4.78 % | 158 | 3.17 % |
10 | Commerce&Finance/Economics | 2,282,709 | 4.6 % | 120 | 2.4 % |
11 | 31,316 | 0.06 % | 261 | 5.23 % | |
12 | School essays | 56,545 | 0.11 % | 10 | 0.2 % |
13 | Essay | 494,747 | 1% | 99 | 1.98 % |
14 | Excerpts from modern drama scripts | 655,618 | 1.32 % | 63 | 1.26 % |
15 | Single and multiple author collections of poems | 279,984 | 0.56 % | 35 | 0.7 % |
16 | Novels/short stories | 8,271,257 | 16.65 % | 566 | 11.34 % |
17 | Official/govermental documents/leaflets company annual reports etc.; excludes Hansard | 594,45 | 1.2 % | 56 | 1.12 % |
18 | Instructional texts | 305,829 | 0.62 % | 29 | 0.58 % |
19 | Personal letters | 105,693 | 0.21 % | 4 | 0.08 % |
20 | Professional/business letters | 20,092 | 0.04 % | 1 | 0.02 % |
21 | Miscellaneous texts | 1,932,821 | 3.89 % | 108 | 2.16 % |
22 | Broadsheet national newspapers: arts/cultural material | 573,701 | 1.16 % | 43 | 0.86 % |
23 | Broadsheet national newspapers: commerce & finance | 759,85 | 1.53 % | 67 | 1.34 % |
24 | Broadsheet national newspapers: miscellaneous material | 545,078 | 1.1 % | 56 | 1.12 % |
25 | Broadsheet national newspapers: science material | 378,432 | 0.76 % | 23 | 0.46 % |
26 | Broadsheet national newspapers: material on lifestyle leisure belief & thought | 1,600,828 | 3.22 % | 114 | 2.28 % |
27 | Broadsheet national newspapers: sports material | 662,518 | 1.33 % | 57 | 1.14 % |
28 | Broadsheet national newspapers: column | 418,734 | 0.84 % | 109 | 2.18 % |
29 | Non-academic: medical/health matters | 99,878 | 0.2 % | 5 | 0.1 % |
30 | Non-academic: social & behavioural sciences | 2,411,122 | 4.85 % | 87 | 1.74 % |
31 | Non-academic/non-fiction: humanities&arts | 2,644,260 | 5.32 % | 156 | 3.13 % |
32 | Non-academic: natural sciences | 93,182 | 0.19 % | 7 | 0.14 % |
33 | Non-academic: politics law education | 4,934,042 | 9.93 % | 247 | 4.95 % |
34 | Non-academic: technology, computing, engineering | 235,169 | 0.47 % | 9 | 0.18 % |
35 | Popular magazines | 667,094 | 1.34 % | 48 | 0.96 % |
36 | Religious texts | 975,833 | 1.96 % | 46 | 0.92 % |
37 | Planned speech, whether dialogue or monologue | 455,194 | 0.92 % | 24 | 0.48 % |
38 | Forum | 468,038 | 0.94 % | 68 | 1.36 % |
39 | Blog | 1,199,927 | 2.42 % | 119 | 2.38 % |
Total | 49,664,303 | 4990 | |||
Derived Text Type | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unknown | 52,28 | 0.11 % | 4 | 0.08 % |
2 | Academic prose | 14,447,035 | 29.09 % | 2123 | 42.55 % |
3 | Fiction and verse | 9,260,896 | 18.65 % | 672 | 13.47 % |
4 | Non-academic prose and biography | 12,226,131 | 24.62 % | 763 | 15.29 % |
5 | Newspaper | 4,907,380 | 9.88 % | 465 | 9.32 % |
6 | Other published written material | 7,034,023 | 14.16 % | 508 | 10.18 % |
7 | Unpublished written material | 1,736,558 | 3.5 % | 455 | 9.12 % |
Total | 49,664,303 | 4990 | |||
Type(s) of Author | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Corporate | 6,651,998 | 13.39 % | 495 | 9.92 % |
2 | Multiple | 9,336,415 | 18.8 % | 1028 | 20.6 % |
3 | Sole | 33,675,890 | 67.81 % | 3467 | 69.48 % |
Total | 49,664,303 | 4990 | |||
Years | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | 1989 | 104,105 | 0.21 % | 11 | 0.22 % |
2 | 1990 | 1,362,002 | 2.75 % | 170 | 3.42 % |
3 | 1991 | 1,442,241 | 2.91 % | 178 | 3.58 % |
4 | 1992 | 1,613,471 | 3.26 % | 177 | 3.56 % |
5 | 1993 | 1,696,560 | 3.42 % | 204 | 4.1 % |
6 | 1994 | 1,713,875 | 3.46 % | 187 | 3.76 % |
7 | 1995 | 1,747,712 | 3.53 % | 188 | 3.78 % |
8 | 1996 | 2,110,762 | 4.26 % | 195 | 3.92 % |
9 | 1997 | 2,160,150 | 4.36 % | 208 | 4.18 % |
10 | 1998 | 2,507,265 | 5.06 % | 225 | 4.52 % |
11 | 1999 | 2,359,312 | 4.76 % | 208 | 4.18 % |
12 | 2000 | 2,447,196 | 4.94 % | 228 | 4.58 % |
13 | 2001 | 2,663,871 | 5.38 % | 244 | 4.9 % |
14 | 2002 | 2,828,517 | 5.71 % | 247 | 4.96 % |
15 | 2003 | 2,945,000 | 5.94 % | 261 | 5.24 % |
16 | 2004 | 2,903,510 | 5.86 % | 240 | 4.82 % |
17 | 2005 | 3,380,326 | 6.82 % | 326 | 6.55 % |
18 | 2006 | 3,711,493 | 7.49 % | 350 | 7.03 % |
19 | 2007 | 3,491,378 | 7.04 % | 297 | 5.97 % |
20 | 2008 | 3,185,726 | 6.43 % | 311 | 6.25 % |
21 | 2009 | 2,476,538 | 5% | 331 | 6.65 % |
22 | 2010 | 298,291 | 0.6 % | 101 | 2.03 % |
23 | 2011 | 158,881 | 0.32 % | 72 | 1.45 % |
24 | 2012 | 145,144 | 0.29 % | 11 | 0.22 % |
25 | 2013 | 106,374 | 0.21 % | 8 | 0.16 % |
Total | 49,559,700 | 4978 |
Spoken Texts | |||||
Gender | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Female | 378,18 | 35.9 % | 347 | 47.66 % |
2 | Male | 662,243 | 62.86 % | 349 | 47.94 % |
3 | Unknown | 13,109 | 1.24 % | 32 | 4.4 % |
Total | 1,053,532 | 728 | |||
Education | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Ongoing | 282,239 | 26.79 % | 191 | 25.64 % |
2 | Uneducated | 8,774 | 0.83 % | 12 | 1.61 % |
3 | Elementary school | 37,111 | 3.52 % | 41 | 5.5 % |
4 | High school level | 60,153 | 5.71 % | 62 | 8.32 % |
5 | Graduate level | 544,571 | 51.69 % | 307 | 41.21 % |
6 | Unknown | 120,684 | 11.46 % | 132 | 17.72 % |
Total | 1,053,532 | 745 | |||
Social Status | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Upper class | 591,369 | 56.13 % | 299 | 41.47 % |
2 | Middle class | 311,942 | 29.61 % | 268 | 37.17 % |
3 | Lower class | 12,591 | 1.2 % | 31 | 4.3 % |
4 | Unknown | 137,63 | 13.06 % | 123 | 17.06 % |
Total | 1,053,532 | 721 | |||
Dialect | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | No | 991,511 | 94.11 % | 448 | 87.67 % |
2 | Yes | 62,021 | 5.89 % | 63 | 12.33 % |
Total | 1,053,532 | 511 | |||
Role of the speaker | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Announcer | 81,952 | 7.78 % | 197 | 22.75 % |
2 | Introductory Speaker | 528 | 0.05 % | 1 | 0.12 % |
3 | Correspondent | 12,497 | 1.19 % | 85 | 9.82 % |
4 | Speaker | 818,047 | 77.65 % | 401 | 46.3 % |
5 | Recorder | 115,434 | 10.96 % | 159 | 18.36 % |
6 | President | 15,333 | 1.46 % | 11 | 1.27 % |
7 | Secretary member | 669 | 0.06 % | 2 | 0.23 % |
8 | Reporter | 2,13 | 0.2 % | 6 | 0.69 % |
9 | Speakers | 11 | 0% | 1 | 0.12 % |
10 | Moderator | 6,877 | 0.65 % | 2 | 0.23 % |
11 | Audience | 54 | 0.01 % | 1 | 0.12 % |
Total | 1,053,532 | 866 | |||
Domain | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Context-governed | 545,139 | 55.92 % | 229 | 52.76 % |
2 | Demographically sampled | 429,718 | 44.08 % | 205 | 47.24 % |
Total | 974,857 | 434 | |||
Media Context | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Educational/informative | 13,185 | 1.35 % | 6 | 1.38 % |
2 | Public/Institutional | 166,542 | 17.08 % | 26 | 5.99 % |
3 | Miscellenaous | 297,242 | 30.49 % | 183 | 42.17 % |
4 | Unspecified | 497,888 | 51.07 % | 219 | 50.46 % |
Total | 974,857 | 434 | |||
Register | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Spoken | 926,299 | 95.02 % | 431 | 99.31 % |
2 | Written-to-be spoken | 48,558 | 4.98 % | 3 | 0.69 % |
Total | 974,857 | 434 | |||
Interaction Type | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Monologue | 48,241 | 4.95 % | 42 | 9.68 % |
2 | Dialogue | 926,616 | 95.05 % | 392 | 90.32 % |
Total | 974,857 | 434 | |||
Place | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unspecified | 34,538 | 3.54 % | 15 | 3.46 % |
2 | Shopping center | 14,813 | 1.52 % | 4 | 0.92 % |
3 | Car | 737 | 0.08 % | 1 | 0.23 % |
4 | Elevator, street, home | 1,32 | 0.14 % | 1 | 0.23 % |
5 | Garden | 4,303 | 0.44 % | 1 | 0.23 % |
6 | Balcony | 1,203 | 0.12 % | 1 | 0.23 % |
7 | Bank | 449 | 0.05 % | 2 | 0.46 % |
8 | Gas Station | 48 | 0% | 1 | 0.23 % |
9 | Office | 3,382 | 0.35 % | 1 | 0.23 % |
10 | Cafe | 4,196 | 0.43 % | 1 | 0.23 % |
11 | Mosque | 5,449 | 0.56 % | 1 | 0.23 % |
12 | Private Course Centers | 4,449 | 0.46 % | 2 | 0.46 % |
13 | Dolmush | 218 | 0.02 % | 1 | 0.23 % |
14 | Doner shop | 3,022 | 0.31 % | 1 | 0.23 % |
15 | Home | 227,037 | 23.29 % | 107 | 24.65 % |
16 | Home/Playing chess | 99 | 0.01 % | 1 | 0.23 % |
17 | Home/Doing sports | 2,917 | 0.3 % | 1 | 0.23 % |
18 | Home/Watching TV | 2,642 | 0.27 % | 1 | 0.23 % |
19 | Garden | 1,164 | 0.12 % | 1 | 0.23 % |
20 | Photo shop | 3,334 | 0.34 % | 1 | 0.23 % |
21 | Hotel, reception | 464 | 0.05 % | 1 | 0.23 % |
22 | Kadir Has University | 522 | 0.05 % | 1 | 0.23 % |
23 | Cafe | 15,539 | 1.59 % | 5 | 1.15 % |
24 | Cafe | 8,578 | 0.88 % | 1 | 0.23 % |
25 | CoffeHouse | 1,153 | 0.12 % | 1 | 0.23 % |
26 | Conference hall | 228,756 | 23.47 % | 37 | 8.53 % |
27 | Hairdresser | 1,943 | 0.2 % | 1 | 0.23 % |
28 | Market | 6,23 | 0.64 % | 2 | 0.46 % |
29 | Municipal City Theatre | 5,128 | 0.53 % | 2 | 0.46 % |
30 | Office | 34,634 | 3.55 % | 4 | 0.92 % |
31 | Schoolyard | 327 | 0.03 % | 1 | 0.23 % |
32 | Hotel | 96 | 0.01 % | 1 | 0.23 % |
33 | Hotel dining hall | 401 | 0.04 % | 1 | 0.23 % |
34 | Hotel, office | 170 | 0.02 % | 1 | 0.23 % |
35 | Hotel, reception | 3,11 | 0.32 % | 2 | 0.46 % |
36 | Hotel, reception, office | 1,479 | 0.15 % | 2 | 0.46 % |
37 | Hotel/ dining hall | 255 | 0.03 % | 1 | 0.23 % |
38 | Bus | 2,172 | 0.22 % | 2 | 0.46 % |
39 | Park | 18,032 | 1.85 % | 4 | 0.92 % |
40 | Bazaar | 121 | 0.01 % | 1 | 0.23 % |
41 | Rehearsal hall | 5,202 | 0.53 % | 1 | 0.23 % |
42 | Restaurant | 3,125 | 0.32 % | 1 | 0.23 % |
43 | Restaurant | 134 | 0.01 % | 1 | 0.23 % |
44 | Coast | 2,826 | 0.29 % | 1 | 0.23 % |
45 | Health care center | 490 | 0.05 % | 1 | 0.23 % |
46 | Street | 17,093 | 1.75 % | 10 | 2.3 % |
47 | Studio | 119,304 | 12.24 % | 146 | 33.64 % |
48 | Class | 24,998 | 2.56 % | 14 | 3.23 % |
49 | Turkish Grand National Assembly | 83,381 | 8.55 % | 11 | 2.53 % |
50 | Boat | 1,526 | 0.16 % | 1 | 0.23 % |
51 | Theatre | 26,45 | 2.71 % | 3 | 0.69 % |
52 | Train | 1,117 | 0.11 % | 1 | 0.23 % |
53 | Dormitory | 10,062 | 1.03 % | 6 | 1.38 % |
54 | Dormitory canteen | 845 | 0.09 % | 1 | 0.23 % |
55 | Dormitory room | 26,283 | 2.7 % | 16 | 3.69 % |
56 | Dormitory dining hall | 416 | 0.04 % | 1 | 0.23 % |
57 | Office | 948 | 0.1 % | 1 | 0.23 % |
58 | Tea garden | 3,601 | 0.37 % | 1 | 0.23 % |
59 | Internet cafe | 2,626 | 0.27 % | 1 | 0.23 % |
Total | 974,857 | 434 | |||
Activity Type | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | Unspecified | 2,379 | 0.24 % | 6 | 1.38 % |
2 | S: brodcast:discussion | 62,621 | 6.42 % | 15 | 3.46 % |
3 | S: brodcast:news | 43,133 | 4.42 % | 123 | 28.34 % |
4 | S: classroom | 4,538 | 0.47 % | 3 | 0.69 % |
5 | S: conv | 11,141 | 1.14 % | 3 | 0.69 % |
6 | S: interview | 67,899 | 6.97 % | 15 | 3.46 % |
7 | S: lect: soc_science | 14,486 | 1.49 % | 7 | 1.61 % |
8 | S: meeting | 134,609 | 13.81 % | 26 | 5.99 % |
9 | S: parliament | 83,381 | 8.55 % | 11 | 2.53 % |
10 | S: sermon | 5,449 | 0.56 % | 1 | 0.23 % |
11 | S: speech:scripted | 58,561 | 6.01 % | 4 | 0.92 % |
12 | S: speech:unscripted | 35,898 | 3.68 % | 5 | 1.15 % |
13 | S: chat | 426,443 | 43.74 % | 205 | 47.24 % |
14 | S: brodcast:discussion | 3,544 | 0.36 % | 2 | 0.46 % |
15 | S: brodcast:interview | 20,775 | 2.13 % | 8 | 1.84 % |
Total | 974,857 | 434 | |||
Years | |||||
ID | Section Name | Number of Words | % | Documents(total) | % |
1 | 1997 | 17,69 | 1.81 % | 1 | 0.23 % |
2 | 1998 | 11,14 | 1.14 % | 4 | 0.92 % |
3 | 1999 | 7,071 | 0.73 % | 3 | 0.69 % |
4 | 2000 | 14,049 | 1.44 % | 2 | 0.46 % |
5 | 2001 | 3,769 | 0.39 % | 1 | 0.23 % |
6 | 2002 | 14,06 | 1.44 % | 2 | 0.46 % |
7 | 2003 | 28,784 | 2.95 % | 2 | 0.46 % |
8 | 2004 | 38,793 | 3.98 % | 3 | 0.69 % |
9 | 2005 | 6,958 | 0.71 % | 1 | 0.23 % |
10 | 2006 | 77,489 | 7.95 % | 6 | 1.38 % |
11 | 2007 | 95,574 | 9.8 % | 10 | 2.3 % |
12 | 2008 | 25,126 | 2.58 % | 9 | 2.07 % |
13 | 2009 | 186,019 | 19.08 % | 228 | 52.53 % |
14 | 2010 | 228,283 | 23.42 % | 90 | 20.74 % |
15 | 2011 | 109,139 | 11.2 % | 36 | 8.29 % |
16 | 2012 | 110,913 | 11.38 % | 36 | 8.29 % |
Total | 974,857 | 434 |