Investigating lexical diversity and lexical sophistication of productive vocabulary in the written discourse of Indonesian EFL learners

This paper reports the findings in investigating lexical diversity and lexical sophistication of productive vocabulary in the written discourse of Indonesian EFL learners. Thirty one students at high school level participated in this study; 15 students were from B1 level and 16 students were from B2 level according to Common European Framework of Reference for Languages (CEFR). Students’ written compositions were used as the main data for this study. The gauge was done based on the result of the calculation of lexical frequency profile (LFP). The result of the calculation showed that the lexical diversity index of students at higher level was greater than that of students at lower level. In addition, based on the calculation per LFP category, it was found that the two groups shared similar patterns of lexical diversity index in which most varied vocabulary used in their writings falls into the second most common 1000 wordlist, followed by vocabulary that belongs to ―not in the lists‖ category and AWL, respectively. Subsequently, the first common 1000 words category became the least varied words used by the learners. In terms of lexical sophistication, it was found that the percentage of advanced vocabulary used by less proficient learners was slightly larger than the percentage of advanced vocabulary used by more proficient learners. However, there was no significant difference found between two groups of learners in terms of lexical diversity and lexical sophistication.


INTRODUCTION
It has been long accepted that vocabulary plays a major role in the second language learning due to its importance for communicative competence and the acquisition process (Schmitt, 2000).It provides a base for learners to perform all skills needed in a language, either receptive or productive.With vocabulary, learners can express their ideas and understand information in the target language precisely.On the contrary, such activities become much more challenging when learners do not possess enough knowledge of words of the target language.
In the aspect of language output, Laufer (1995) points out that vocabulary often becomes a factor that differentiates L2 learners and native speakers, or language level among L2 learners themselves.It is rarely disputed that the main difference between L2 learners and native speakers is the number of vocabulary they use in the language production, either oral or written.Most L2 learners relatively use a quite limited range of vocabulary compared to native speakers that have a much wider range.Among L2 learners themselves, vocabulary knowledge often determines the level of language proficiency.The development of vocabulary is regarded as a marker of the language progress and an approximation towards native speaker's lexical system (Laufer, 1995).This means when the range of vocabulary expands, the L2 proficiency will relatively improve.
The interest in evaluating vocabulary of second language learners has been increasing in recent years.As one of knowledge areas in language, vocabulary is often considered as a benchmark to see how well an L2 learner performs in the acquisition of a second language.In this respect, Nation (2007) puts emphases on the importance of investigating the way learners use vocabulary in order to get an insight of their language knowledge.Similarly, Laufer (1995) argues that kind of investigation is needed to see -a gradual increase in the number of words in the learner's lexicon‖ (p.265).
In writing particularly, several measures have been coined to evaluate L2 learners' productive vocabulary use.Lexical diversity/variation (LV) and lexical sophistication (LS) are two among those measures used to assess vocabulary knowledge by looking at L2 learners' written production.Lexical diversity primarily assesses how varied vocabulary is used, whereas lexical sophistication (LS) deals with the proportion of advanced vocabulary employed by learners in their writings.Grobe (1981) points out the importance of word diversity in L2 writing.According to him, in most of second language learning contexts, teachers generally perceive good writing as always closely associated with the lexical diversity.That is to say, a text written by an L2 learner will normally get a high grade and is considered good when it consists of more variation of words; besides, it is built up of good grammatical structure.By the same token, Astika (1993) suggests the need of advanced vocabulary as one of the aspects of vocabulary proficiency.He proposes that lexical proficiency could be the best indicator to the quality of overall L2 writing.The study carried out by Laufer and Nation (1995) implicitly provides a support for Grobe (1981) and Astika (1993).When investigating the lexical richness of EFL learners based on their written production, Laufer and Nation (1995) found that there is a positive correlation between the quality of writing produced by EFL learners and those two lexical features.The result of their study reported that learners with more language proficiency, who produce better quality writing, generally make use of more advanced words in their written production and more diverse vocabulary.
Lexical knowledge has become an interesting area to study in the field of second language acquisition.Some researchers have addressed this topic to investigate whether student's knowledge of word brings a positive impact towards their performance in the second language production.For instance, research carried out by Siskova (2012) has found out a strong relationship between lexical richness and the quality of students' writing in the context of Czech EFL learners.Similarly, Staehr (2008) investigates the correlation between vocabulary size of Danish EFL students to some skills in English language: listening, reading and writing.He comes to the conclusion that vocabulary size is strongly associated with the students' language proficiency.Although the investigation of lexical knowledge has gained its popularity in several countries, there is still a limited amount of such kind of research carried out in Indonesia.The most recent study is conducted by Djiwandono (2016) within the context of tertiary education that compares the lexical richness of the academic papers written by Indonesian EFL lectures and university students.In addition, to the best of my knowledge, the study on measuring lexical knowledge that looks at language output conducted within the scope of Indonesian EFL learners at secondary school level has not yet been substantial.Therefore, the present study aims at bridging this gap.Investigating the lexical diversity and lexical sophistication in written language output among Indonesian EFL learners is appealing considering the fact that vocabulary has a close link to the performance of L2 writing (Kwon, 2009).Also, for Indonesian EFL learners in particular, Setyowati and Sukmawan (2016) report that writing is interesting, but at the same time they feel writing is more difficult than other skills in language as well.Thus, apart from anxiety factor that may occur during the process of writing, it would be interesting to see how lexical knowledge of Indonesian EFL learners, particularly in terms of lexical diversity and lexical sophistication, is reflected in their written language output.
The subjects of the present study are Indonesian EFL learners at a high school coming from two different proficiency levels, i.e., B1 and B2 according to Common European Framework of Reference for languages (CEFR).Hence, the present study aims to achieve the following objectives: (1) to see typical lexical diversity of students at B1 and B2 level; (2) to see typical lexical sophistication of students at B1 and B2 level; (3) to find out whether there is a significant difference between the two groups of learners in terms of lexical diversity and lexical sophistication.
Before moving on further, it is best to review related literature and the definitions of some key concepts.As commonly believed, vocabulary is regarded as an important component in second language acquisition that contributes to both learners' receptive skills and productive skills.Alqahtani (2015) views vocabulary knowledge as an important tool for L2 learners to establish successful communicative skills in the second language.In addition, several researchers have realised that the acquisition of vocabulary is essential for language use (Laufer & Nation, 1999;Read, 2000;Gu, 2003).The increase of learners' vocabulary brings a crucial impact in the language learning progress (Linse & Nunan, 2006).Schmitt (2000) points out that vocabulary is the base to communicative competence and provides foundation for learners to comprehend information as well as produce discourses for communication purpose.Azodi, Karimi and Vaezi (2014) assert that the lack of vocabulary will hinder L2 learners to understand normal texts or utterances.The same problem will also occur when they come to productive skills like writing or speaking.By contrast, Schmitt (2010) posits by knowing sufficient amount of words, L2 learners can use the language properly.He suggests that the number of words which is necessary to make L2 learners enable to communicate depends on their learning goals.In other words, if one wishes to achieve native-like competence, it is then presumably to have a number of vocabulary similar to a native speaker.
Some previous research has revealed the impacts of vocabulary knowledge on language skills development of second language learners (Staehr, 2008;Wise, Sevcik, Morris, Lovett, & Wolf, 2007).Alderson (2005) conducts a comprehensive study to find out relationship between vocabulary and language skills through a test called DIALANG.He compared scores on various vocabulary tests with the scores from other language components of the DIALANG test (reading, listening, writing and grammar) and managed to uncover strong relationship among them.The result of his study has shown that the checklist test and vocabulary test correlate with reading at .64, listening at the range of .61-.65, grammar at .64 and writing from .70-.79.
With regard to vocabulary and writing, Engber (1993) reports that holistic measure of writing quality significantly correlates with lexical variation, either including error (at the correlation of .43)or without error (.57).She also suggests that it is important to help and encourage learners to bring their knowledge of word into active use of writing.Within the same area, Arnaud (1984) investigated the correlation between lexical variation and productive translation performance.He found that those two variables support each other with the correlation of .36.Therefore, it can be concluded that vocabulary mastery and language skills, either receptive or productive, are inextricably intertwined.The knowledge of words builds a foundation for learners to develop their ability to use the language well.In other words, with an extensive vocabulary, there will be an increased opportunity for L2 learners to comprehend any information in the target language and to use structures as well as functions of the language properly for the sake of comprehensible communication.In addition, as Nation (2001) asserts, vocabulary knowledge and language use also have a complementary relation.As said, the presence of sufficient vocabulary will enable learners to use the language.On the other hand, language use relatively will lead to the increase of vocabulary knowledge.
Measuring lexical knowledge has become a major object of research in the field of applied linguistics to assess vocabulary development of L2 learners.There have been various measures developed to investigate learners' lexical knowledge.These measures mostly focus on learners' vocabulary acquisition and the level of lexical proficiency of L2 learners, compared with an external reference point (Van Gijsel, Speelman, & Geeraerts, 2005).With regard to lexical production, the measures are primarily to assess learners' vocabulary use reflected in oral or written text (Kojima & Yamashita, 2014).Daller and Xue (2007) argue that the words used in spoken or written texts are a representation of learners' vocabulary knowledge.Investigating productive lexical knowledge will give information on the learners' use of vocabulary, such as their choice of words, whether the learners rely on highly frequent words or choose infrequent vocabulary or whether or not the learners use the structure and function words in appropriate proportions, which provides useful insights of their lexical resources (Milton, 2009).Vermeer (2004) believes that learners with great amount of vocabulary in their mind are prone to use rare words compared to those with smaller vocabulary and thus a valid measure of of lexical richness can function as a pointer to vocabulary size.Other than the use of a discrete test, such as Productive Vocabulary Level Test (PVLT) (Nation, 1990;Laufer & Nation, 1999) that is often criticised for not really being able to extrapolate the knowledge of productive lexicon of the learners and for having some issues regarding its validity (Kojima & Yamashita, 2014), another way to measure L2 students' lexical richness is by looking at their language output and assessing them in the description of the productive lexicon, such as lexical diversity/lexical variation (LV) and lexical sophistication (LS).
The term lexical diversity is often used as an equivalent term to lexical variety (Laufer & Nation, 1995) and lexical richness (Johansson, 2009;Daller, Van Hout & Treffers-Daller, 2003) although there are some researchers that propose the difference between lexical diversity and lexical richness (e.g.Malvern, Richards, Chipere, & Durán, 2004).It is a measure to assess how varied words or vocabulary produced by learners in a text.Laufer and Nation (1995, p.310) define lexical diversity as -the ratio in per cent between the different words in the text and the total number of running words‖.According to Johansson (2009) lexical diversity depends on the variety of vocabulary possessed by a text.In other words, in the production of language, the speaker or the writer has to use a large number of different words with no or little repetition in their utterances and writings to be accounted as highly lexically diverse.To measure lexical diversity, the TTR (Type-Token Ratio) (Lieven, 1978;Bates, Bretherton, & Snyder, 1991) has been commonly employed in various investigations.It is done by dividing the number of different words (type) to the total number of words in a text (tokens).Meanwhile, lexical sophistication (LS) or -rareness‖ (Read, 2000, p.203) refers to the proportion of -advanced‖ words in the text (Laufer & Nation, 1995).It shows the percentage of sophisticated or advanced vocabulary produced by learners (Lindqvist, Gudmundson, & Bardel, 2013).However, there is still no exact definition of what is meant by -advanced‖ or -sophisticated‖ word as there are various opinions regarding this term (Kyle & Crossley, 2015).Therefore, in assessing lexical sophistication, the Copyright © 2018, IJAL, EISSN 2502-6747 classification of words labelled -advanced‖ depends on the researcher's definition that makes it quite subjective.Arnaud (1984) and Linnarud (1986), for instance, define sophisticated words with the reference to official list of vocabulary for English language teaching in their countries.They assume sophisticated vocabulary are those words that the students were not expected to know well at their level in education system.Likewise, Laufer (1990) considers that the vocabulary in the university word list (UWL; Nation, 1990) as being advanced for her students in Israel.On the other hand, Kyle and Crossley (2015) put the emphases on the frequency of the lexical items, in which they assert that words that are rarely used are generally considered to be sophisticated and often take longer time for learners to proceed rather than high-frequency words.

METHOD Participants
The subjects of the present study were learners of English as a foreign language consisting of 31 people.They were aged between 15-16 years old.At the time of data collection, they were enrolled as second year students at a high school namely Pribadi Bilingual Boarding School, situated in Bandung, Indonesia.The participants came from two different groups of proficiency level, in which 15 students belonged to level B1 and 16 students were at level B2 according to Common European Framework (CEFR).The levelling was determined by the school at the beginning of the academic year through a standardised placement test.All of the students had been learning English at the school for nearly two years and were taught by mostly the same teachers with the similar teaching approaches.

Data Collection
Students' written compositions were used as the main source of data to be analysed in the present study.The method of data collection adopted the previous research approach employed by Laufer and Nation (1995) in investigating students' vocabulary size and its use in written production.The participants were asked to write two compositions with different topics during English lesson time in a period of one week.The reason of giving this short time interval between the compositions was to keep the language level of the learners stable and unchanged to a significant degree (Laufer, 1995).The participants were allocated one hour to complete each composition.The topics for the compositions were set to be general and did not require expert knowledge of specific subject matters (see appendix).Each composition had to be around 300-word long (with +10% tolerance) as Laufer and Nation (1995) have reported that the lexical profiles in 200-word essays or over are found to be consistent rather than those of less than 200 words.

Data Processing
To analyse the data and to measure the lexical diversity as well as the lexical sophistication, the present study used a computer program called RANGE (https://www.victoria.ac.nz/lals/resources/range) that could provide lexical frequency profile (LFP) of each composition written by the learners.The first step done was entering the data into computer.All compositions were retyped and turned into .txtformat so that they could be read by the computer program.The compositions written by the learners were treated as follows: all proper nouns on the writings were omitted since they do not belong to the second language lexicon.The same went for the words that were clearly used incorrectly, they were all removed.Laufer and Nation (1995) argue that a word which is misused cannot be regarded as part of the productive lexicon of the participants.On the other hand, if a word was used correctly but written in incorrect spelling, it was corrected and still considered as part of students' productive lexicon.Compound words and verb-particle construction were typed either hyphenated or separated according to the reference of dictionary.After all compositions had been inputted into computer, they were then processed using RANGE program to find out their lexical frequency profile (LFP).This process was pretty straightforward and did not take a long time since the program could accommodate up to 32 different texts at the same time.Once finished, the program showed the information of each composition in terms of the number of total tokens, types, and word families through a table.It also classified the words in compositions into four different lists of word frequency: the first 1000 most frequent words, the second 1000 most frequent words, the University Word List (UWL) and the not-in-the-list words.

Data Analysis
There were two types of analysis conducted in present study; collective analysis and separate analysis.The collective analysis was aimed to find out the general lexical frequency profile (LFP) of each group of learners by putting together all compositions of each group and analysing them using RANGE.On the other hand, the separate analysis was done by individually processing each composition written by learners using RANGE to find out the LFP of each writing.The data yielded from the computer program were then entered into Microsoft Excel sheets in order to be classified and used for further analysis.In terms of the lexical diversity, the type-token ratio (TTR) approach was used as a tool of measure.Whereas the lexical sophistication was measured using the proportion of advanced vocabulary in the text.The words belonging to the University Word List (UWL) category and the -not-inthe-lists‖ category were regarded as sophisticated or advanced considering its rareness (Read, 2000) and low frequency of occurrence.FINDINGS 62 written compositions were collected from students comprising a total number of 18848 words (tokens).All Copyright © 2018, IJAL, EISSN 2502-6747 of the compositions were entered into computer and analysed using RANGE to find out their lexical frequency profile (LFP).As mentioned earlier, LFP provides information about the written texts analysed in the form of the total number of tokens, types and word families, and categorises them into four different frequency bands: first 1000 most frequent words (word list one), second 2000 most frequent words (word list two), the university/academic words (UWL/word list three) and not-in-the-lists words.Table 1 shows the result of collective analysis of the compositions written by the learners at B1and B2 level.A total of 30 compositions written by 15 students at B1 level were analysed resulting on 8478 tokens in total.Here, token is any occurrence of a word form regardless how many times it appears in the text.Among these 8478 tokens, the majority of words used belong to word list one (the first 1000 most frequent words) that accounts for 6999 words (82.6%), followed by -not-in-the-list‖, word list three (UWL) and word list two that account for 612 words (7.2%), 444 words (5.2%) and 423 words (5.0%) respectively.In terms of types, 614 out of 967 total word types belong to the first common 1000-word list which makes up 63.5% of the total running words.Subsequently, the number of types belong to -not-in-the-list‖ accounts for 147 or 15.2% of the total followed by types that belong to second common 1000-word list and the UWL that make up 12.1% and 9.2% respectively.Unlike tokens, the types are any form of a word counted only once regardless how many times it might appear in the text.With regard to learners at B2 level, 32 compositions written by 16 learners were analysed.As illustrated in the table, there are 10370 tokens, 1232 types and 693 word families in total.Out of 10370 tokens, 8729 words belong to word list one that makes up 84.2%, 516 words or 5.0% are in word list two and 473 words which equals to 4.6% are in word list three.Also, 655 words do not belong to any of the lists that make up 6.3% of the whole text.Similar distribution also goes for the types in which 762 belong to word list one that account for 61.9% of the total types in the texts, 163 or 13.2% types belong word list two and the types that are in word list three and -not-in-the-list‖ are 115 (9.3%) and 192 (15.6%) respectively.For further analysis, normality test was conducted using SPSS based on the LFP result obtained from RANGE program to make sure that data is distributed normally.After that, paired t-test was done towards two compositions of each group to ensure that they are stable and not significantly different to obtain a reliable result.
According to the result of normality test using Saphiro-Wilk procedure, it is found that all data from both groups of learners are normally distributed.The significance levels (p-value) of each composition are greater than 0.05.It is then considered that the data can be used for further analysis, i.e., a paired sample t-test to find out whether there is a significant difference in terms of composition 1 and composition 2 of each group of learners.Furthermore, results of paired sample test indicate that there is no noticeable difference between composition 1 and composition 2 in B1 level (t = -0.426,p = 0.676 > 0.05).Similarly, the level of composition 1 in group 2 (B2 level) is not significantly different from composition 2 (t = -0.246,p = 0.809 > 0.05).Therefore, it can be concluded that the compositions obtained from the students are reliable enough to be used as the source of data for the present study since they remain stable and have no prominent discrepancy among them.
The result of the average TTR on each writing of the participants in each group is shown in the Table 2. Overall, it can be inferred that students at B2 level produced more diverse vocabulary compared to those at lower level although it seems that the difference of two groups is not really significant.The ratio between 0 and 1 was used as the indicator (Mackiewicz, 2016), i.e., the closer result to 1, the greater lexical diversity of the vocabulary in the compositions.As can be seen, the average TTR shows that the learners at the higher level used more varied vocabulary in their composition as it accounts for 0.45, greater than 0.43 which is the average of TTR generated from B1 group.To obtain a deeper result, the TTR of participants' composition was compared according to LFP result in terms of the first most common 1000-word list, the second most common 1000-word list and the university word list (UWL).By comparing the learners' written composition against the first two wordlists, the percentage of words used by the learners from each group could be determined so that most experts would consider necessary for daily interaction in English and how diverse they are.In this respect, the first most common 1000-word list and the second most common 1000-word list are determined by reference to the General Service List (GSL) of English words (West & West, 1953) which is a list of most useful 2000 word families for English learners.Nation and Kyongho (1995, p.35) define a general service vocabulary like the GSL as follow: General service vocabulary consists of words that are of high frequency in most uses of the language.It is the essential common core.It includes the most useful function words, like the, of, be, because and could, content words like stop, agree, person, wide, and hardly.General service words occur frequently across a wide range of text.
On the other hand, by using the comparison of learners' composition against UWL/AWL, the percentage of words that are considered useful for academia from both groups could be obtained since this sort of list is determined by the reference of the Academic Word List (AWL) (Coxhead, 2000).The word list itself contains of 3000 vocabulary from 570 headwords that are normally used in the tertiary education and often used as reference to prepare students for college and academic life, such as -comprehensive‖, -demonstrate‖ and -indicate‖.Table 3 shows the overall results of comparing the compositions of learners at B1 and B2 level against these lists.For the first 1000 most common wordlist, using the scale between 0 and 1 as Mackiewicz ( 2016) suggests, it is found that both groups of learners used very little variation of vocabulary in their writings.It accounts only 0.087 for both levels which is nearer to 0 rather than 1.In other words, it can be inferred that most of the learners were likely to repeatedly use the same common words several times as there were only a few number of types even though learners had produced a quite large amount of tokens.The proportion of type and token for the second 1000 wordlist shows different result.The result depicts that generally, the lexical diversity of students for the vocabulary that belong to the second 1000 wordlist is greater than the lexical diversity for the vocabulary in the first 1000 wordlist.In this case, the lexical diversity index of students at B2 level makes up 0.315 which is greater than B1 level students that account for 0.277.
Considering the academic words, although just small percentages of the tokens in the students' composition at B1 and B2 level fall into AWL (5.2 % and 4.6% respectively), the ratio of types produced by students at both groups that belongs to this list is relatively ample, i.e., 9.2% for B1 group and 9.3% for B2 group.Some words related to the given topic falling into the AWL that learners used, for example -migrate‖, -assignment‖ and -regulation‖.In terms of lexical diversity, the result of TTR index denotes that the students employ sufficient variation of vocabulary that is considered useful for academic context.At this point, similar to the previous word lists, students with higher English proficiency level seem to have slightly more variation on the use of academic words in their writing with TTR index of 0.243, greater than that of lower level students which account for 0.200.These findings, in general, suggest that some words related to the topic given in students' writings have application in other academic contexts.The words produced by learners that belong to AWL are not necessary common or easy words of English, but as suggested by several experts (Coxhead, 2000;Mackiewicz, 2016), they are important for academic success.
In terms of lexical sophistication (LS), the classification of words considered as advanced or sophisticated in present study was determined under the consideration of their rareness and low frequency of occurrence in normal texts (Read, 2000).Therefore, using the result of LFP, all words produced by learners in their writings that belong to academic word list and -not-in-the-lists‖ were regarded as advanced lexical items.Previous study that used the same methodology, i.e., beyond 2000 or condensed profile (Laufer, 1995) found that such approach was valid and reliable to be used as a means to calculate lexical sophistication of productive vocabulary in written text.Simple statistical was used in order to establish this gauge.Table 4 summarises the result of the calculation of average lexical sophistication for both groups of learners based on the reference point mentioned previously.
As can be seen from Table 4, the overall proportion of lexical sophistication of students at level B1 is slightly higher than students in level B2.It Copyright © 2018, IJAL, EISSN 2502-6747 accounts for 6.39% out of total vocabulary produced in the text, whereas the percentage of advanced words of students at B2 level is 6.36%.It needs to be noted that the above calculation was based on the total occurrence of sophisticated vocabulary (tokens) in each text, regardless how many types appear.In order to obtain more profound information on the lexical sophistication of each group of learners, besides calculating the overall percentage of advanced vocabulary, the proportion of lexical sophistication in terms of words that belong to academic word list and that of -not in the lists‖ was also measured.In addition, the number of sophisticated words in both composition 1 and composition 2 was also measured and compared.The result of calculation for students at B1 level can be seen in Figure 1.It shows that the advanced vocabulary in students' writing is mostly made up by the words that fall into -not in the lists‖ category with the percentage of 56.36% out of the total advanced tokens in the text, whereas the lexical items that belong to academic word list contribute for 43.64%.

Figure 1. Sophisticated words in composition among B1 students
With regard to lexical sophistication in each composition, it is found that the average of advanced vocabulary used by students in the first composition is smaller compared to the use of advanced vocabulary in the second composition.The average percentage of sophisticated words produced by learners in the first composition makes up 6.36% of the total tokens in the text (SD=1.402).On the other hand, in the second composition, around 6.50% (SD=1.698) of total words used belong to sophisticated vocabulary.
Paired sample t-test was conducted in order to further examine whether the differences of the average of advanced vocabulary used between two compositions were significant.The result indicates that, as shown in Table 5, statistically, there was no significant discrepancy between the number of advanced words used by students in the first composition and those that were produced in the second composition.The p value is 0.789 >0.05 (t= -0.246, df=14).In other words, the results suggested that there was no meaningful change on the degree of advance words learners used in both compositions.
A similar calculation was also conducted towards students at B2 level, regarding the proportion of lexical sophistication in terms of words in academic word list and those that belong to -not in the lists‖.In addition, the difference level of sophisticated vocabulary used in both compositions was also measured (see Figure 2).The result indicates that out of all advanced vocabulary produced by learners, most of them were those that involved in not in the list category that accounts for 59.48%, whereas the words that fall into academic wordlist made up 40.52% of the total advanced token in students' writings.
In terms of advanced words used by learners in each composition, it was found that the proportion of lexical sophistication in the second composition was greater than in the first composition.Among total tokens produced by the learners in the first composition, 6.23% of them belong to advanced vocabulary (SD=1.710).Meanwhile, in the second composition, the average of advanced words produced by learners increases to 6.48% (SD= 1.725) out of total tokens.However, based on paired t-test, it was found that there was no significant statistical difference between the advanced words used in the first and the second composition (p = 0.676 > 0.05, df = 15, t = -0.426).
In addition, an independent sample t-test was conducted to find out whether there was any significant difference between two groups in terms of lexical diversity and lexical sophistication.Regarding lexical diversity, the result of the statistical calculation indicated that there were no significant differences in the TTR score of learners at B1 level (M= 42.95,SD= 5.97) and learners at B2 level (M= 44.1, SD= 5.01).The value of p= .413which is greater than .05,t(60) = -.824.The similar result was found with regard to lexical sophistication in which no significant differences were found between the results of two groups in terms of the average of advanced words used; t(60)= .083,p= .935).The results of this calculation suggest that in present study the level of English language proficiency of learners does not really affect the performance of students to produce compositions with higher percentage of diverse and advanced vocabulary.

DISCUSSION
Although assessing lexical diversity using TTR procedure is often criticised due to its dependencies on the length of texts (Malvern & Richards, 1997;McCarthy & Jarvis, 2010), the finding of this research might still give superficial information of learners' linguistic performance that cannot be measured through a means of vocabulary test.One of the key findings in this study is that the comparison of TTR index of compositions written by B1 level students and B2 level students shows a moderate difference between two groups in which B2 students generally produced writings with more diverse vocabulary (TTR index B2= 0.45, B1=0.42).More specifically, if we look at the comparison of TTR index in terms of four word categories in LFP, the result shows that students with better proficiency, again, produced more diverse words in three out of four categories, i.e., 2 nd 1000 words (TTR= 0.315), academic words (TTR= 0.243) and -not in the lists‖ (TTR= 0.294).Meanwhile, for the first 1000 most frequent word category, the result shows identical result between two groups (TTR B1/B2=0.083).Another interesting point is although generally students at higher level generate more lexical diversity in their compositions, the two groups of learners share similar patterns of lexical diversity index in which the most varied vocabulary used in their writing fall into the second most common 1000 wordlist, followed by vocabulary that belong to -not in the lists‖ category, AWL and the first common 1000 words respectively.This particular result, in general, indicates that the dimension of vocabulary size plays a role in the production of second language output.Learners in higher level seem to take advantage of adequate amount of vocabulary they possess to generate ideas, develop and present them in their writing (Raimes, 1985).As the basic dimension of lexical competence, vocabulary size often becomes a determiner between learners with good L2 proficiency and those with low proficiency (Laufer, 1995).Meara (1996) argues that possessing good knowledge of word will provide a crucial contribution for learners in almost all aspect in second language acquisition, including enhancement of receptive skills and productive skills.Regarding productive skills particularly, the dimension of vocabulary size is often linked with another dimension called -organisation‖ (Meara, 1996) which is related to the ability of learners to manage the words they have in their mind for producing language in form of either written or spoken.This dimension of organisation is structured and connects lexical network that makes up learner's mental lexicon (Gyllstad, 2013).
The result of the comparison of TTR index between B1 students' writing and B2 students' writing in this study also partially supports and is consistent with some of previous investigations.For instance, a study conducted by Engber (1993) reported that there is a positive correlation between lexical diversity and the written production.The use of diverse vocabulary is a result of the possession of better knowledge of word (vocabulary size) in the students' mind and it correlates and affects positively with the degree of writing in the second language (Kwon, 2009).Therefore, learners with good vocabulary size will be likely to produce written texts with varied word choices and good grammatical structures compared to those learners with lack of this dimension.
Regarding lexical sophistication, one of key findings is that learners at the lower level surprisingly use more percentage of advanced words than those students at the higher level based on the calculation per composition (separate analysis).Although the result of collective analysis using general LFP result, as mentioned previously, indicates that B2 level students use quite more words that belong to word list three and not in the lists category.This result is somehow interesting given the fact that in terms of lexical diversity, as discussed previously, learners with higher proficiency tend to use relatively more diverse vocabulary in their writing but when it comes to the production of sophisticated words, the result is the opposite.This implies that the ability of producing written text with higher lexical diversity index does not always guarantee students to produce a composition with larger percentage of lexical sophistication.Also, the level of second language proficiency is not another factor affecting it.It makes this particular result inconsistent with the study carried out by Laufer and Nation (1995) within the similar topic in investigating lexical richness of learners of English as a foreign language.In addition, it is also contradictory to Siskova's (2012) finding in investigating lexical richness in narrative texts written by Czech EFL learners.In that investigation, she found that there is a positive correlation between lexical diversity and lexical sophistication in which students with higher lexical diversity index can produce more sophisticated words in their narrative texts compared to students with low lexical diversity index.However, it should also be noted that apart from the influence of language proficiency level and the ability to produce writing with diverse vocabulary as mentioned by previous researchers (Laufer & Nation, 1995;Siskova, 2012), there are some other factors that can affect students' performance on the production sophisticated lexical items such as the quality of input of the teaching situation and learners' knowledge of other language (Bardel, Gudmundson, & Lindqvist, 2012).The first factor is related to pedagogical aspect, whereas the latter is more about cognitive aspect of learners and their ability to recognise the semantic relation between words (Amer, 2002), such as their knowledge on cognates or false friends.
Another finding of the present study suggests that the level of language proficiency does not really give significant contribution towards the ability of students to produce a written text with diverse and sophisticated vocabulary.The difference of means of lexical diversity index and lexical sophistication of two groups of learners is not really meaningful as the compositions written by students at B1 level and B2 level contain a quite similar number of diverse and advanced vocabulary.There are some possibilities that could make this result happen.One of which is the learners from both groups might find the topics given for the compositions familiar since the topics were something close to their lives.Students might benefit from prior knowledge they have that relates to the topics.Lee and Anderson (2007) argue that the presence of background knowledge plays a crucial role in second language learning, either with regard to receptive skills or productive skills.When learners have prior knowledge on the discussed subject and are familiar with it, they will be easy to recall and elaborate on that topic.In writing particularly, Tedick (1988) argues that familiarity to the topic will stimulate learners to improve their quality of writing performance.Similarly, Long (1990) has suggested topic familiarity brings a positive impact on learners' production practice.In his study, Long (1990) found that L2 learners perform significantly better in summary tasks when the topics given are familiar to them.This study is corroborated by Hamp-Lyons and Prochnow (1990) that investigate the effect of topics and task types towards the writer's performance.They found that topic types were a crucial factor affecting the final product of a writer.When L2 learners were given an opportunity to respond to a topic which they knew and were familiar with, they would tend to produce longer texts with better quality.

CONCLUSION
The present study aims at measuring lexical diversity and lexical sophistication of productive vocabulary in the written discourse of Indonesian EFL learners and finding out whether there is any significant difference in terms of those two lexical features between two groups of EFL learners with different proficiency levels as the subjects.The subjects of this study were students that came from level B1 and B2 according to CEFR.
To sum up, this research finds out that learners at two different levels show identical typical of lexical diversity and lexical sophistication since the result of the calculation indicates that there is no meaningful difference on those two lexical features between two groups.In terms of lexical diversity, one of the key findings is that learners at higher level generally employ more diverse vocabulary in their written production than those at lower level although the gap is not really significant.Also, based on the calculation per LFP category, it was found that the two groups share similar patterns of lexical diversity index in which most varied vocabulary used in their writings fall into the second most common 1000 wordlist, followed by vocabulary that belong to -not in the lists‖ category and AWL respectively.Subsequently, the first common 1000 words category becomes the least varied words used by the learners.In terms of lexical sophistication, based on the calculation of advanced words per composition, it was found that the percentage of advanced vocabulary used by less proficient learners is slightly larger than the percentage of advanced used by more proficient learners.The result also reveals that the majority of advanced words used by learners at both levels are form -not-in-the-lists‖ category rather than from academic word list.
However, it should be admitted that the current study also has some limitations.First, this study does not give a broad range of insights on the lexical diversity and lexical sophistication of Indonesian EFL learners in writing production as the scope is limited to particular subjects and it uses relatively limited number of texts, so it makes it insufficient to generalise the results.To get a more comprehensive result, another study within the same scope should be conducted in the future with larger number of participants and texts.Second, the fact that the measure of lexical diversity and lexical sophistication was carried out using only one method also needs to be cautioned.In fact, the measure of such lexical features can be done with different approaches that might not necessarily yield the same results.

Table 1 .
Collective analysis of composition among B1 and B2 learners

Table 3 .
The overall results of composition comparison between groups

Table 4 .
The average of lexical sophistication in composition between groups

Table 5 .
Paired sample test of composition between groups