Content

TNC is a 50-million-word corpus consisting of written texts (98%) across a wide variety of genres covering a period of 20 years (1990-2009). 2% of TNC consists of transcribed spoken data. The distribution ofnumber of words in the corpus is determined proportionally for each text domain, time and medium of text.

Written Texts
Domain          
ID Section Name Number of Words % Documents(total) %
1 Imaginative prose 9,290,712 18.71 % 674 13.51 %
2 Informative: Natural and pure sciences 1,380,481 2.78 % 255 5.11 %
3 Informative: Applied science 3,439,177 6.92 % 461 9.24 %
4 Informative: Social science 7,128,807 14.35 % 673 13.49 %
5 Informative: World affairs 9,768,340 19.67 % 757 15.17 %
6 Informative: Commerce and finance 4,472,310 9.01 % 429 8.6 %
7 Informative: Arts 3,633,316 7.32 % 347 6.95 %
8 Informative: Belief and thought 2,167,924 4.37 % 297 5.95 %
9 Informative: Leisure 8,383,236 16.88 % 1097 21.98 %
  Total 49,664,303   4990  
Gender          
ID Section Name Number of Words % Documents(total) %
1 Unknown 677,918 1.37 % 85 1.7 %
2 Female 7,438,319 14.98 % 884 17.72 %
3 Male 25,673,465 51.69 % 2506 50.22 %
4 Mixed 15,874,601 31.96 % 1515 30.36 %
  Total 49,664,303   4990  
Audience          
ID Section Name Number of Words % Documents(total) %
1 Unknown 4,504 0.01 % 1 0.02 %
2 Child 1,347,767 2.71 % 128 2.57 %
3 Teenager 1,006,277 2.03 % 70 1.4 %
4 Adult 43,750,084 88.09 % 4388 87.94 %
5 Any 3,555,671 7.16 % 403 8.08 %
  Total 49,664,303   4990  
Media          
ID Section Name Number of Words % Documents(total) %
1 Unknown 10,397 0.02 % 1 0.02 %
2 Book 31,262,290 62.95 % 2145 42.99 %
3 Periodical 15,845,281 31.9 % 2094 41.96 %
4 Miscellaneous: published 947,824 1.91 % 294 5.89 %
5 Miscellaneous: unpublished 1,598,511 3.22 % 456 9.14 %
  Total 49,664,303   4990  
Sample          
ID Section Name Number of Words % Documents(total) %
1 Unknown 2,166,031 4.36 % 108 2.16 %
2 Whole text 23,102,208 46.52 % 3237 64.87 %
3 Beginning sample 9,790,755 19.71 % 669 13.41 %
4 Middle sample 8,279,315 16.67 % 577 11.56 %
5 End sample 6,246,245 12.58 % 394 7.9 %
6 Composite 79,749 0.16 % 5 0.1 %
  Total 49,664,303   4990  
Register          
ID Section Name Number of Words % Documents(total) %
1 Academic prose: Medicine 714,46 1.44 % 145 2.91 %
2 Academic prose: Social, behavioral sciences 2,892,961 5.83 % 432 8.66 %
3 Academic prose: Humanities/Arts 2,604,645 5.24 % 354 7.09 %
4 Academic prose: Natural sciences 1,236,958 2.49 % 251 5.03 %
5 Academic prose: Politics, law, education 3,857,971 7.77 % 587 11.76 %
6 Academic prose: Technology, computing, engineering 1,653,909 3.33 % 251 5.03 %
7 Administrative and regulatory texts, in house use 155,054 0.31 % 11 0.22 %
8 Print Advertisements 22,311 0.04 % 164 3.29 %
9 Biographies/Autobiographies 2,372,093 4.78 % 158 3.17 %
10 Commerce&Finance/Economics 2,282,709 4.6 % 120 2.4 %
11 E-mail 31,316 0.06 % 261 5.23 %
12 School essays 56,545 0.11 % 10 0.2 %
13 Essay 494,747 1% 99 1.98 %
14 Excerpts from modern drama scripts 655,618 1.32 % 63 1.26 %
15 Single and multiple author collections of poems 279,984 0.56 % 35 0.7 %
16 Novels/short stories 8,271,257 16.65 % 566 11.34 %
17 Official/govermental documents/leaflets company annual reports etc.; excludes Hansard 594,45 1.2 % 56 1.12 %
18 Instructional texts 305,829 0.62 % 29 0.58 %
19 Personal letters 105,693 0.21 % 4 0.08 %
20 Professional/business letters 20,092 0.04 % 1 0.02 %
21 Miscellaneous texts 1,932,821 3.89 % 108 2.16 %
22 Broadsheet national newspapers: arts/cultural material 573,701 1.16 % 43 0.86 %
23 Broadsheet national newspapers: commerce & finance 759,85 1.53 % 67 1.34 %
24 Broadsheet national newspapers: miscellaneous material 545,078 1.1 % 56 1.12 %
25 Broadsheet national newspapers: science material 378,432 0.76 % 23 0.46 %
26 Broadsheet national newspapers: material on lifestyle leisure belief & thought 1,600,828 3.22 % 114 2.28 %
27 Broadsheet national newspapers: sports material 662,518 1.33 % 57 1.14 %
28 Broadsheet national newspapers: column 418,734 0.84 % 109 2.18 %
29 Non-academic: medical/health matters 99,878 0.2 % 5 0.1 %
30 Non-academic: social & behavioural sciences 2,411,122 4.85 % 87 1.74 %
31 Non-academic/non-fiction: humanities&arts 2,644,260 5.32 % 156 3.13 %
32 Non-academic: natural sciences 93,182 0.19 % 7 0.14 %
33 Non-academic: politics law education 4,934,042 9.93 % 247 4.95 %
34 Non-academic: technology, computing, engineering 235,169 0.47 % 9 0.18 %
35 Popular magazines 667,094 1.34 % 48 0.96 %
36 Religious texts 975,833 1.96 % 46 0.92 %
37 Planned speech, whether dialogue or monologue 455,194 0.92 % 24 0.48 %
38 Forum 468,038 0.94 % 68 1.36 %
39 Blog 1,199,927 2.42 % 119 2.38 %
  Total 49,664,303   4990  
Derived Text Type          
ID Section Name Number of Words % Documents(total) %
1 Unknown 52,28 0.11 % 4 0.08 %
2 Academic prose 14,447,035 29.09 % 2123 42.55 %
3 Fiction and verse 9,260,896 18.65 % 672 13.47 %
4 Non-academic prose and biography 12,226,131 24.62 % 763 15.29 %
5 Newspaper 4,907,380 9.88 % 465 9.32 %
6 Other published written material 7,034,023 14.16 % 508 10.18 %
7 Unpublished written material 1,736,558 3.5 % 455 9.12 %
  Total 49,664,303   4990  
Type(s) of Author          
ID Section Name Number of Words % Documents(total) %
1 Corporate 6,651,998 13.39 % 495 9.92 %
2 Multiple 9,336,415 18.8 % 1028 20.6 %
3 Sole 33,675,890 67.81 % 3467 69.48 %
  Total 49,664,303   4990  
Years          
ID Section Name Number of Words % Documents(total) %
1 1989 104,105 0.21 % 11 0.22 %
2 1990 1,362,002 2.75 % 170 3.42 %
3 1991 1,442,241 2.91 % 178 3.58 %
4 1992 1,613,471 3.26 % 177 3.56 %
5 1993 1,696,560 3.42 % 204 4.1 %
6 1994 1,713,875 3.46 % 187 3.76 %
7 1995 1,747,712 3.53 % 188 3.78 %
8 1996 2,110,762 4.26 % 195 3.92 %
9 1997 2,160,150 4.36 % 208 4.18 %
10 1998 2,507,265 5.06 % 225 4.52 %
11 1999 2,359,312 4.76 % 208 4.18 %
12 2000 2,447,196 4.94 % 228 4.58 %
13 2001 2,663,871 5.38 % 244 4.9 %
14 2002 2,828,517 5.71 % 247 4.96 %
15 2003 2,945,000 5.94 % 261 5.24 %
16 2004 2,903,510 5.86 % 240 4.82 %
17 2005 3,380,326 6.82 % 326 6.55 %
18 2006 3,711,493 7.49 % 350 7.03 %
19 2007 3,491,378 7.04 % 297 5.97 %
20 2008 3,185,726 6.43 % 311 6.25 %
21 2009 2,476,538 5% 331 6.65 %
22 2010 298,291 0.6 % 101 2.03 %
23 2011 158,881 0.32 % 72 1.45 %
24 2012 145,144 0.29 % 11 0.22 %
25 2013 106,374 0.21 % 8 0.16 %
  Total 49,559,700   4978  

 

Spoken Texts    
Gender          
ID Section Name Number of Words % Documents(total) %
1 Female 378,18 35.9 % 347 47.66 %
2 Male 662,243 62.86 % 349 47.94 %
3 Unknown 13,109 1.24 % 32 4.4 %
  Total 1,053,532   728  
Education          
ID Section Name Number of Words % Documents(total) %
1 Ongoing 282,239 26.79 % 191 25.64 %
2 Uneducated 8,774 0.83 % 12 1.61 %
3 Elementary school 37,111 3.52 % 41 5.5 %
4 High school level 60,153 5.71 % 62 8.32 %
5 Graduate level 544,571 51.69 % 307 41.21 %
6 Unknown 120,684 11.46 % 132 17.72 %
  Total 1,053,532   745  
Social Status          
ID Section Name Number of Words % Documents(total) %
1 Upper class 591,369 56.13 % 299 41.47 %
2 Middle class 311,942 29.61 % 268 37.17 %
3 Lower class 12,591 1.2 % 31 4.3 %
4 Unknown 137,63 13.06 % 123 17.06 %
  Total 1,053,532   721  
Dialect          
ID Section Name Number of Words % Documents(total) %
1 No 991,511 94.11 % 448 87.67 %
2 Yes 62,021 5.89 % 63 12.33 %
  Total 1,053,532   511  
Role of the speaker          
ID Section Name Number of Words % Documents(total) %
1 Announcer 81,952 7.78 % 197 22.75 %
2 Introductory Speaker 528 0.05 % 1 0.12 %
3 Correspondent 12,497 1.19 % 85 9.82 %
4 Speaker 818,047 77.65 % 401 46.3 %
5 Recorder 115,434 10.96 % 159 18.36 %
6 President 15,333 1.46 % 11 1.27 %
7 Secretary member 669 0.06 % 2 0.23 %
8 Reporter 2,13 0.2 % 6 0.69 %
9 Speakers 11 0% 1 0.12 %
10 Moderator 6,877 0.65 % 2 0.23 %
11 Audience 54 0.01 % 1 0.12 %
  Total 1,053,532   866  
Domain          
ID Section Name Number of Words % Documents(total) %
1 Context-governed 545,139 55.92 % 229 52.76 %
2 Demographically sampled 429,718 44.08 % 205 47.24 %
  Total 974,857   434  
Media Context          
ID Section Name Number of Words % Documents(total) %
1 Educational/informative 13,185 1.35 % 6 1.38 %
2 Public/Institutional 166,542 17.08 % 26 5.99 %
3 Miscellenaous 297,242 30.49 % 183 42.17 %
4 Unspecified 497,888 51.07 % 219 50.46 %
  Total 974,857   434  
Register          
ID Section Name Number of Words % Documents(total) %
1 Spoken 926,299 95.02 % 431 99.31 %
2 Written-to-be spoken 48,558 4.98 % 3 0.69 %
  Total 974,857   434  
Interaction Type          
ID Section Name Number of Words % Documents(total) %
1 Monologue 48,241 4.95 % 42 9.68 %
2 Dialogue 926,616 95.05 % 392 90.32 %
  Total 974,857   434  
Place          
ID Section Name Number of Words % Documents(total) %
1 Unspecified 34,538 3.54 % 15 3.46 %
2 Shopping center 14,813 1.52 % 4 0.92 %
3 Car 737 0.08 % 1 0.23 %
4 Elevator, street, home 1,32 0.14 % 1 0.23 %
5 Garden 4,303 0.44 % 1 0.23 %
6 Balcony 1,203 0.12 % 1 0.23 %
7 Bank 449 0.05 % 2 0.46 %
8 Gas Station 48 0% 1 0.23 %
9 Office 3,382 0.35 % 1 0.23 %
10 Cafe 4,196 0.43 % 1 0.23 %
11 Mosque 5,449 0.56 % 1 0.23 %
12 Private Course Centers 4,449 0.46 % 2 0.46 %
13 Dolmush 218 0.02 % 1 0.23 %
14 Doner shop 3,022 0.31 % 1 0.23 %
15 Home 227,037 23.29 % 107 24.65 %
16 Home/Playing chess 99 0.01 % 1 0.23 %
17 Home/Doing sports 2,917 0.3 % 1 0.23 %
18 Home/Watching TV 2,642 0.27 % 1 0.23 %
19 Garden 1,164 0.12 % 1 0.23 %
20 Photo shop 3,334 0.34 % 1 0.23 %
21 Hotel, reception 464 0.05 % 1 0.23 %
22 Kadir Has University 522 0.05 % 1 0.23 %
23 Cafe 15,539 1.59 % 5 1.15 %
24 Cafe 8,578 0.88 % 1 0.23 %
25 CoffeHouse 1,153 0.12 % 1 0.23 %
26 Conference hall 228,756 23.47 % 37 8.53 %
27 Hairdresser 1,943 0.2 % 1 0.23 %
28 Market 6,23 0.64 % 2 0.46 %
29 Municipal City Theatre 5,128 0.53 % 2 0.46 %
30 Office 34,634 3.55 % 4 0.92 %
31 Schoolyard 327 0.03 % 1 0.23 %
32 Hotel 96 0.01 % 1 0.23 %
33 Hotel dining hall 401 0.04 % 1 0.23 %
34 Hotel, office 170 0.02 % 1 0.23 %
35 Hotel, reception 3,11 0.32 % 2 0.46 %
36 Hotel, reception, office 1,479 0.15 % 2 0.46 %
37 Hotel/ dining hall 255 0.03 % 1 0.23 %
38 Bus 2,172 0.22 % 2 0.46 %
39 Park 18,032 1.85 % 4 0.92 %
40 Bazaar 121 0.01 % 1 0.23 %
41 Rehearsal hall 5,202 0.53 % 1 0.23 %
42 Restaurant 3,125 0.32 % 1 0.23 %
43 Restaurant 134 0.01 % 1 0.23 %
44 Coast 2,826 0.29 % 1 0.23 %
45 Health care center 490 0.05 % 1 0.23 %
46 Street 17,093 1.75 % 10 2.3 %
47 Studio 119,304 12.24 % 146 33.64 %
48 Class 24,998 2.56 % 14 3.23 %
49 Turkish Grand National Assembly 83,381 8.55 % 11 2.53 %
50 Boat 1,526 0.16 % 1 0.23 %
51 Theatre 26,45 2.71 % 3 0.69 %
52 Train 1,117 0.11 % 1 0.23 %
53 Dormitory 10,062 1.03 % 6 1.38 %
54 Dormitory canteen 845 0.09 % 1 0.23 %
55 Dormitory room 26,283 2.7 % 16 3.69 %
56 Dormitory dining hall 416 0.04 % 1 0.23 %
57 Office 948 0.1 % 1 0.23 %
58 Tea garden 3,601 0.37 % 1 0.23 %
59 Internet cafe 2,626 0.27 % 1 0.23 %
  Total 974,857   434  
Activity Type          
ID Section Name Number of Words % Documents(total) %
1 Unspecified 2,379 0.24 % 6 1.38 %
2 S: brodcast:discussion 62,621 6.42 % 15 3.46 %
3 S: brodcast:news 43,133 4.42 % 123 28.34 %
4 S: classroom 4,538 0.47 % 3 0.69 %
5 S: conv 11,141 1.14 % 3 0.69 %
6 S: interview 67,899 6.97 % 15 3.46 %
7 S: lect: soc_science 14,486 1.49 % 7 1.61 %
8 S: meeting 134,609 13.81 % 26 5.99 %
9 S: parliament 83,381 8.55 % 11 2.53 %
10 S: sermon 5,449 0.56 % 1 0.23 %
11 S: speech:scripted 58,561 6.01 % 4 0.92 %
12 S: speech:unscripted 35,898 3.68 % 5 1.15 %
13 S: chat 426,443 43.74 % 205 47.24 %
14 S: brodcast:discussion 3,544 0.36 % 2 0.46 %
15 S: brodcast:interview 20,775 2.13 % 8 1.84 %
  Total 974,857   434  
Years          
ID Section Name Number of Words % Documents(total) %
1 1997 17,69 1.81 % 1 0.23 %
2 1998 11,14 1.14 % 4 0.92 %
3 1999 7,071 0.73 % 3 0.69 %
4 2000 14,049 1.44 % 2 0.46 %
5 2001 3,769 0.39 % 1 0.23 %
6 2002 14,06 1.44 % 2 0.46 %
7 2003 28,784 2.95 % 2 0.46 %
8 2004 38,793 3.98 % 3 0.69 %
9 2005 6,958 0.71 % 1 0.23 %
10 2006 77,489 7.95 % 6 1.38 %
11 2007 95,574 9.8 % 10 2.3 %
12 2008 25,126 2.58 % 9 2.07 %
13 2009 186,019 19.08 % 228 52.53 %
14 2010 228,283 23.42 % 90 20.74 %
15 2011 109,139 11.2 % 36 8.29 %
16 2012 110,913 11.38 % 36 8.29 %
  Total 974,857   434