Welcome to the homepage of Turkish National Corpus.
Turkish National Corpus (TNC) with a size of 50 million words, is a balanced and a representative corpus of contemporary Turkish. It consists of samples of textual data across a wide variety of genres covering a period of 20 years (1990-2009). Written component consists of texts produced in different domains on various topics. Transcriptions from spoken data constitute 2% of TNC’s database, which involves spontaneous, every day conversations and speeches collected in particular communicative settings.
TNC-Demo Version with its 4438 different text samples represents 9 domains and 34 different genres. From a size of 48 million words collection, users will be able to perform queries by defining restrictions to generate outputs from media, text sample, domain, derived text type, sex of author, type of author, text genre, as well as the audience of the text.
TNC-Demo version is RELEASED
For registration and full access, check out the "Query Interface" menu on the left.
Publishing TNC-based studies: