TNC v2-Word and Document Counts
Query Interface
About the Corpus
Publications
About Us

Content

TNC is a 50-million-word corpus consisting of written texts (98%) across a wide variety of genres covering a period of 20 years (1990-2009). 2% of TNC consists of transcribed spoken data. The distribution of number of words in the corpus is determined proportionally for each text domain, time and medium of text.

Composition of the written component of TNC

Domain

%

Medium

%

Imaginative

19

Book

58

Social Science

16

Periodicals

32

Art

7

Miscellaneous published

5

Commerce/ finance

8

Miscellaneous unpublished

3

Belief and thought

4

To-be-spoken

2

World affairs

20

Applied science

8

Natural science

4

Leisure

14

domains

Transcriptions from spoken data constitute 2% of TNCs database, which involves spontaneous, every day conversations and speeches collected in particular communicative settings, such as meetings, lectures.