From Daily Conversation to Public Speech: A Quantitative Analysis of Lexical and Grammatical Characteristics of the Corpus of Spoken Lithuanian
Keywords: spoken language, register of spoken language, parts of speech, lexical diversity, inverted word order
AbstractThe aim of the paper is to describe a variety of different registers of spoken Lithuanian and to discuss and compare their lexical and grammatical characteristics. Thus, in this article we: (1) characterize different registers and genres of spoken Lithuanian; (2) present a distribution of different parts of speech in the Corpus; (3) discuss the lexical diversity; (4) present the most typical inverted order of words; (5) when possible, compare our results with those obtained in written Lithuanian, Quantitative and statistical methods and a methodology of corpus linguistics were employed for the study. The study was based on the data of the Corpus of Spoken Lithuanian. During the study, all the conversations stored in the corpus were classified into five registers: academic, mass-media, casual, intimate, and consultative. An analysis of the distribution of different parts of speech revealed that the registers of spontaneous speech (namely, the casual, the intimate, and the consultative registers) did not differ among each other. However, significant differences were revealed between spontaneous and prepared speech: nouns and adjectives were more frequent in academic and mass-media discourse (both of which might be characterized as prepared speech), while adverbs, pronouns, and particles were more often used in spontaneous speech. A comparative analysis confirmed that from the perspective of the distribution of parts of speech the academic and the mass-media registers are the most similar to written language, especially to fictional texts. This might be explained by such characteristics of a fictional text as its stylistic flexibility and a presence of conversations among the personages. An analysis of lexical diversity revealed the only difference among the registers: namely, the index of noun lexical diversity distinguished among the registers, while the indices of adjective and verb lexical diversity were rather similar. Independently of the register, the most frequent verbs were sakyti ‘to say’, žinoti ‘to know’, reikėti ‘to need’, norėti ‘to want’, turėti ‘to have’, galėti ‘to be able’, and būti ‘to be’; the most frequent adjectives were didelis ‘big’, įdomus ‘interesting’, naujas ‘new’, mažas ‘small’, svarbus ‘important’, and įvairus ‘various’; and the most frequent nouns were diena ‘a day’, laikas ‘time’, žmogus ‘a man/human’, and metai ‘a year’. An analysis of the inverted word order revealed that in spontaneous speech (especially in the consultative register) the attributes quite often follow the agreement controllers; subordinate clauses precede the head nouns; and the objects are used at the onset of the clauses. However, the inverted word order was also observed in the prepared speech, that is, the academic and the mass-media registers.