LEMMATIZING TEXTBOOK CORPUS FOR LEARNER DICTIONARY OF BASIC VOCABULARY

The purpose of this study is to lemmatize the vocabulary used in textbooks with the purpose to provide information for dictionary entries that are relevant to the learners. Although dictionaries for learners are commercially and electronically available, they may not be relevant to the learners‟ needs for vocabulary learning in class, or may not have been designed for specific pedagogical purposes. To bridge this gap, the vocabulary used in some textbook series were analyzed using the vocabulary profiler available in Lextutor.com to identify the lemmas of different frequency levels. The analysis shows that the total number of lemmas in the textbooks was approximately about 58% of the total number of the most frequently used lemmas in the New General Service List (Browne, Culligan, & Phillips, 2013). Further exploration of the lexicogrammatical environment in the textbook corpus discovered lexical items that behave differently with regard to functions and meanings. These findings should provide lexicographers or teachers with useful information about word entries for a relevant learner dictionary to be used in the classroom.

Every teacher and student will agree that having sufficient stock of vocabulary is essential to language learning.It has been widely acknowledged that words and phrases are basic components of language in communication.There are many ways or strategies that teachers can apply to teach words and phrases to students.One of them is to use a dictionary.A large proportion of the vocabulary used in a textbook is usually basic vocabulary.Basic vocabulary is "a list of the most important highfrequency words useful for second language learners of English…" (Browne et al., 2013).This list provides information about the first 1000 most frequently used words (K-1).It also contains the second and third 1000 most frequently used words (K-2 and K-3).This K-1, K-2, and K-3 words should be understood by the students early in English language learning.Schmitt (2000) suggests that at early stages of learning, students should learn about 1000-2000 high-frequency words.A textbook used in schools usually has glosses of vocabulary derived either from the chapters or entire books.While glosses are useful for quick reference of word meanings, the selection may not have been based on frequency decisions but intuitions of the textbook writer.
Learning vocabulary in the classroom generally involves the use of a dictionary as the primary source for definitions and examples of word use that learners meet in the textbooks.It is a common practice for students to use pocket or electronic dictionaries.These dictionaries may or may not be helpful or efficient for classroom learning because such dictionaries do not always offer example sentences that are relevant to the students' needs.Although this kind of dictionary is convenient, it may not have been designed with appropriate pedagogical purposes, and it may not promote students" vocabulary knowledge or motivation for learning.Commercial dictionaries are generally used for a general audience.They are not dictionaries designed explicitly for a specific group of students to understand vocabulary in the textbooks being used in schools.A dictionary that students need is the one that they can use to address their real needs for word meaning during English classes, that is, the words that are used in the textbooks.
This study is an attempt to fill this gap to lemmatize the vocabulary in the student textbooks for dictionary entries of basic vocabulary, that is, the most frequently used vocabulary.Such a dictionary can serve as a handy learning tool that is useful and relevant to the students and teachers because it directly addresses students" learning needs in the classroom; that is to master the material being learned or to pass the tests.This idea may seem to deprive the learners of a broader aim of learning English for communicative purposes, but we can not expect students to develop communicative skills without sufficient vocabulary knowledge.
Despite the teacher qualification in TEFL or TESOL and frequent teacher training to develop teaching skills, English proficiency of our students may not have been satisfactory.This condition appears to be pervasive in EFL contexts such as those in Indonesia (Muhson, 2014), Saudi Arabia (Nezami, 2012), China (Lin, 2002), and Iran (Kheirzadeh & Tavakoli, 2012) showing that EFL learners have difficulties in reading comprehension because they lack sufficient vocabulary knowledge.This condition may have been due to the needs of the learners that have not been met in language learning where vocabulary acquisition requires constant exposure of input as frequently described in many studies.Nation (2001), for example, indicates that frequent exposure to new words is necessary for students to understand meanings and learners should be taught how to learn and use new words (Azman, Bhooth, & Ismail, 2013;Bhooth, Azman, & Ismail, 2015).
My observation during teaching practicum shows that the practice of school teachers at the participating schools places a high emphasis on translation using Indonesian and memorization of new words in isolation.Many of class activities during vocabulary learning are decontextualized, promoting rote learning without clear teaching of vocabulary learning skills or strategies.Vocabulary acquisition requires learners to use appropriate learning strategies especially at the early stages of learning.Therefore, vocabulary learning strategies need to be introduced to the students and strategies can be taught (Nemati, 2008;Tsai & Chang, 2009).
The significance of vocabulary in language learning has been widely recognized and discussed extensively in the literature.Vocabulary is a central factor in language learning and of great importance for language learning skills (Mirzaii, 2012;Nation, 2001;Sadeghi & Nobakht, 2014).To be successful in learning, students should have adequate reading skills and vocabulary knowledge is regarded as an essential factor in reading (Chen, 2011).According to Laufer (1996, p. 20), "no text comprehension is possible either in one's native language or a foreign language without understanding the text's vocabulary."This statement implies that students will not be able to speak, listen, write or read appropriately without substantial knowledge of vocabulary (Nation, 2009) and vocabulary knowledge is generally considered as a good predictor of language proficiency in learning a new language (Staehr, 2008).Considering the importance of vocabulary knowledge, teachers should teach students appropriate vocabulary learning strategies to help them understand written or spoken texts.
One strategy for learning vocabulary that is frequently used by students is the use of dictionaries.This strategy has recently been a focus of research such as those by Tsai and Chang (2009) and Tran (2011) who explored EFL teachers' perceptions of vocabulary acquisition and instruction and identified their students" use of vocabulary learning strategies.The findings revealed that most of the participants used a monolingual dictionary in their learning.The participants believed that dictionaries play significant roles in language learning and dictionary use was ranked the most frequently used strategy among eight learning strategy categories as reported by the students.The study by Ta'amneh (2015) with 306 ninth grade students with an average age of 14 in Saudi Arabia shows that the use of dictionary could facilitate the learning of new words that are crucial to understanding.This assertion indicates the needs of teaching students to use an appropriate dictionary properly where the students can read the sample sentences that illustrate all the senses of word meaning.Exposures to many sample sentences can help students to be aware of the slight differences that may exist in meaning, connotation, or usage between words.
For vocabulary learning, there are two categories of dictionaries; monolingual dictionary and bilingual dictionary.However, the issue as to which dictionary is the most effective for learning is still debated.Nation (2008), for example, maintains that bilingual dictionaries offer advantages for faster L2 vocabulary learning because the L1 equivalents are provided.A similar conclusion is also mentioned by Folse (2006), Lotto and de Groot (1998) who state that students will have better word retention level if L1 translations are provided.On the other hand, Chan (2004) and White (1997) consider bilingual dictionaries as limited, and they contain rigid or imprecise L1 translations that may not be helpful to learners to develop lexical awareness.
Using the corpus from prescribed textbooks to design a tailor-made bilingual dictionary may provide an answer to vocabulary learning difficulties especially at the beginning stages of learning because students are given an L1 translation of high-frequency words.The studies by Laufer and Hadar (1997), Wu (2005), Marin-Marin (2005) and Amirian and Heshmatifar (2013) show that the students in Taiwan, Mexico, and Iran used and benefited from using bilingual dictionaries.This finding may apply to Indonesian contexts where English is a foreign language.These studies point to the need for a bilingual dictionary that provides L1 equivalents and sufficient exposure of how specific vocabulary items are used in different sentences with different contexts.A bilingual dictionary will provide students with examples of sentences with new words and their translation in L1.Studies by Nation (2001Nation ( , 2017) ) and Tsai and Chang (2009) indicate that learners acquire vocabulary much more efficiently using bilingual dictionaries.If students are trained to use an appropriate dictionary correctly, they will be able to read and understand sample sentences that provide illustrations of the shades of meaning of words through multiple exposures, a necessary condition for vocabulary acquisition.
The benefits of using dictionaries have been found positive in enhancing students" vocabulary knowledge.McAlpine and Myles (2003) clarified that regardless of the type of dictionary, the primary purpose of using it is to help learners improve their vocabulary size and increase their awareness of common grammar errors.Incidental vocabulary learning may take place during listening and reading activities with the aid of dictionaries especially helpful to less proficient learners.Luppescu and Day (1993) have studied the potential benefits of improving students' vocabulary knowledge by using bilingual dictionaries appropriately because it can have a substantial positive impact on vocabulary learning and reading development.
Use of concordances for vocabulary learning has been gaining popularity in recent years.Studies on the use of concordances (Al-Mahbashi, Noor, & Amir, 2015; Cobb, 1997;Poole, 2012) have provided information to learners who could benefit from concordance output to enhance their vocabulary knowledge.Concordances can help learners infer meanings and acquire productive vocabulary through multiple usages of vocabulary in authentic contexts.Concordance output can be manipulated to create motivating materials and activities for vocabulary learning that can enhance learners" lexical competence and promote students" autonomy.Some studies on the use of concordancers (Kaur & Hegelheimer, 2005;Poole, 2012;Schmitt, 2000) show that the students learned vocabulary inductively and exhibited improvement in understanding word meanings and were able to transfer their vocabulary knowledge when reading new texts.It appears that learning vocabulary through concordances can lead students to acquire different senses of word meanings and apply this acquired knowledge in reading new texts.Concordances, as summarized by Nation (2001), provide learners with vocabulary in real contexts with rich information not only about word meaning but also a variety of grammatical features that challenge learners to construct generalization and patterns of word usages.The significant role of context in concordance-based vocabulary learning is well recognized (Poole, 2012) since it exposes students to authentic usages of vocabulary in a variety of meaningful contexts.Lin and Huang (2008) state that contexts in concordances offer students with meaning-inferencing activities that are considered to be more efficient than meaning-given activities commonly found in vocabulary look-up activities using pocket dictionaries.
Considering the need for a more relevant dictionary of basic vocabulary and the urgency of acquiring high-frequency words and easy access to a user-friendly online tool (Lextutor.com),this smallscale study attempts to lemmatize the vocabulary items in some English textbooks currently used in Junior High Schools in Indonesia.The textbooks have been recommended by the Indonesian Ministry of Education.The lemmatized vocabulary in this study can be used as a reference to create a dictionary of high-frequency words that are appropriate and relevant to the students at this level with sample sentences selected from the textbooks.

METHOD
This study was conducted in a private university in Central Java, Indonesia.It used a documentary method.Documentary methods are the techniques used to categorize, investigate, interpret and identify written documents, whether in the private or public domain such as personal papers, commercial records, or state archives, communications or legislation (Payne & Payne, 2004).In this research, the textbooks were considered as documents, and the vocabulary in the textbooks was taken as data.
The samples of this research were three series of English textbooks for Junior High Schools that have been recommended and documented in the Ministry of Education website (http://bse.kemdikbud.go.id/).The textbooks are presented in Table 1.Note that the vocabulary has been screened for the research.
Each of the sample textbooks was downloaded and converted into word document files.Then, the words were "screened" to remove proper names, numbers, phonetic transcripts, illustrations, and Indonesian words.Thus, only the function and content words necessary for analysis were retained.Finally, the cleaned texts were fed into Lextutor with the following steps.
The first step was data analysis using the Vocabulary Profiler to produce word frequencies (K-1, K-2, K-3) for each textbook.The output shows a list of vocabulary items with different frequencies for each textbook.Since the word frequency lists contain words with their frequency of occurrences (e.g., about_[3]), it was necessary to extract the lists to obtain only the lexical items.Thus, this step produced word lists of K-1, K-2, K-3 of each book.
The next step in the analysis was to run Text Compare in the Lextutor to compare the lexical items in K-1, K-2, and K-3 across the books.The result of this comparison was lists of words that were shared as well as unique in the word lists being compared.The shared and the unique word lists were combined and made up all words in K-1, K-2, and K-3 used in all books.However, the combined word lists (K-1, K-2, and K-3) contained lexical items that were not organized, Therefore, the Excel program was utilized to arrange the words alphabetically for the dictionary entries.

FINDINGS AND DISCUSSIONS
The steps above produced many pages of word list with 1645 lemmas.A truncated list of the lemmas is presented in Table 2. Ta"amneh (2015) states that multiple exposures of new words with their L1 equivalents will facilitate learning because this provision is necessary and efficient (Nation, 2001(Nation, , 2017;;Tsai & Chang, 2009).With a lemmatized word list in place such as the one above, examples of word usages from the textbook corpus can be selected using the concordance facility in the Sketchengine, an online corpus tool available at https://the.sketchengine.co.uk (see Figure 1).The concordance output below illustrates the usages of get in different contexts and the selection of sample sentences can be based on the lexicogrammatical environment of get to provide sufficient exposures for word meanings of get.
Figure 1.The concordance output of get produced by the sketchengine corpus tool.
Here are a few examples sentences with get derived from the lines that can be included to illustrate its meanings in different grammatical contexts.
a. How do I get to the post office?b.I will go in to get some food.c.Get me a piece of chalk, please.d.Send a note to her, get well soon.e. Let"s get busy.f.Say thank you when you get help from someone.
These examples show that get in different grammatical contexts has different meanings in Indonesian and the meanings should be supplied in the entries.In Table 3, the number of lemmas in the sample textbooks have been tabulated in the table below showing the word frequency group, the lemma count of each group and its percentage.
As displayed in Table 2, the number of lemmas (headword) of K-1 is 40% (658 words), K-2 43% (708 words), and K-3, 17% (279 words).In total, the number of lemmas used in the textbooks is 1645, which is approximately about 58% of the total number of K-1, K-2, and K-3 words in the New General Service List (http://www.newgeneralservicelist.org/).This number of lemmas does not include those that fall under the Academic Word category because this group of words are generally used in academic texts and may not be very useful for students at the junior high school level.
The 1645 lemmas used in the textbooks currently used in the junior high schools in Indonesia should be acquired by the students either through explicit teaching of vocabulary or use of a dictionary.As mentioned earlier, commercially available dictionaries may or may not be convenient or relevant for learning vocabulary in class.Therefore, a tailor-made dictionary based on the list of vocabulary above should be able to meet the students" needs for learning English using the textbooks mandated by the Indonesian Ministry of Education.
The vocabulary items Table 1 have been organized alphabetically for each frequency level.Since there are three levels (K-1, K-2, and K-3), it is possible that three volumes of dictionaries can be created.It should be noted that the number of words in the K-3 level is not large enough to warrant a sufficiently large size of vocabulary entries.To make up for its small number of entries, it is necessary to include words categorized in the AWL (Academic Word List) group.Although the words in the AWL category are commonly used in academic publications, the students should have at least receptive knowledge of the meanings of those words because they are used in the textbooks.Understanding of these lower frequency words (AWL group) will improve the students' understanding of the texts in the textbooks.As pointed out by Nation (2001), frequent exposure is needed for students to understand word meanings and use them for productive use of language.Frequent exposure to sample sentences that use a particular word will help students to be aware of differences of word meaning or connotations (Ta"amneh, 2015).
The lemmatized lexical items in this study were explored to see how a particular word works or functions in relation to other words in its immediate contexts using an online tool called Sketchengine (https://the.sketchengine.co.uk).With this tool, the function of a word or its meaning can be inferred from its lexicogrammatical environment.For example, in this study, the word have behaved differently in different environments in the textbook corpus and carries different meanings as indicated in the following sentences.
a.They are tall and have dark hair.have = to own or possess b.You always have breakfast every morning.
have breakfast = to eat breakfast c.She will have a birthday party on Wednesday.
have a party = to arrange a party d.We have to swim to the island.
have to swim = must swim e. Have you eaten the food?have = a question for a completed act f.I have to take an English course.
have to take a course = join a course Another interesting example in the textbook corpus is the word take.It is used in the following phrases with different meanings:  take care of = memelihara.How do you take care of plants? take a bath = mandi.Take a bath and then have your dinner. take a walk = jalan-jalan.You are much excited and have decided to take a walk. take a rest = istirahat.What you need is just take a good rest and drink a lot of fresh water. take turns = giliran.In your group, you will take turns making a puzzle. take part = ambil bagian.The government also takes part to increase the function of post offices. take a breath = ambil nafas.If smoke is around you don't be panic.Take short breaths and crawl. take a medicine = minum obat.You need to take medicine soon. take a picture = memotret.The man wants to take a picture of a bird. take actions = mengambil tindakan.Get the reader to take action. take notes = mencatat.Remember to take notes of, for example, the animal appearance. take place = terjadi.When did the story take place? take me home = mengantar.Could you take me home?I have a flat tire. don't take it badly =jangan terlalu dipikirkan.Don't take it badly.Don't blame yourself.I know how you must be feeling. it takes longer = lebih lama.It will take longer to write by hand.
These phrases are useful entries for the dictionary because take carries different meanings when it collocates with other words to form collocations.
Further observations of word sketch output in the Sketchengine found that take appears in various grammatical environments in the textbook corpus as seen below.
a. Objects of take: 158 cases b.Subjects of take: 58 cases c.Pronominal subjects of take: 43 cases d.Pronominal objects of take: 35 cases e. Modifiers of take: 29 cases f.Prepositional phrases: 9 cases g.Particles after take: 4 cases h.Particles after take with objects: 3 cases i. Infinitives objects of take: 3 cases j.Adjectives after take: 2 cases This statistics provides helpful information about word entries that need to be prioritized for the dictionary, and the selection of word entries should be based on frequency-informed decisions.A search for the meaning of take in Oxford Basic English Dictionary (2016, p. 394) found different entries for take: take after somebody, take something away, take something down, take off, take over, take up.While these entries are useful for word knowledge, they may not be needed by the students to understand textbooks they use in class.Students need to know the meanings of take as it is used in the examples above.More word search in the textbook corpus would reveal more cases of word usages that may not be found in published dictionaries.
It should also be useful to include fixed expressions in the dictionary.Further exploration on the textbook corpus for this study shows that there are many fixed expressions that can be used as dictionary entries.The following examples are a few among many fixed expressions used in the textbook corpus along with their frequency of occurrences from high to low:  Excuse me ( 63 These are useful expressions as dictionary entries, and learners can learn the expressions as small chunks of English especially for speaking skill development. Further investigation of the textbook corpus found that sense of word meaning may differ "according to the mode of production, that is, whether a text is spoken or written" (Anderson & Corbett, 2009, p. 63), or the part of speech that follows.Here is a selection of four concordance lines retrieved from the textbook corpus.Note that the language used in the textbooks may not reflect authenticity in the real sense, but it may resemble real use of English.
 1a.It really helps to plan and remind us. 1b.Choose words that really describe your room. 2a.I"m really sorry; I have to visit my mom today. 2b.I really regret rushing off the house.
It appears that set 1a and 1b has a different degree of affect from set 2a and 2b because of the words following really: action verbs in 1a and 1b, "feeling" adjective and verb in 2a and 2b respectively.In Indonesian, the meaning of really in both sets of sentences may differ regarding the intensity of feeling produced by the words following really.

CONCLUSION
This study explored a corpus of English textbooks for use in Indonesian junior high schools to provide information for dictionary entries of high-frequency words.The analysis indicates that the lemmatized word list covers as much as 58% of the highfrequency vocabulary in the New General Service List.The inclusion of the lemmas in this study into a dictionary for the students at this level could be useful for classroom learning using the textbooks because the dictionary entries and sample sentences can be selected from the textbooks being used.However, this amount of word coverage may not be sufficient to equip learners with the necessary knowledge of high-frequency words at the junior high school level.
In addition to the needs to provide meanings of the individual word in the L1, the corpus analysis in this study has also pointed out the needs to include specific words such as have and take that may carry different meanings when such words collocate with other words.As described by Anderson and Corbett (2009), knowledge of collocation can help learners infer subtle differences of meanings and usages which may not be evident from meanings of individual words that make up the collocations.
Knowledge of set expressions or pragmatic routines should also be useful for dictionary entries.By using the Sketchengine corpus tool, this study identified some routines such as Nice to meet you, or Would you like to… in the corpus that students need to learn and use to develop their sociopragmatic competence.A lot more examples of such routines can be explored in the corpus for dictionary entries.With easy access to online software programs for language learning, English educators will have the necessary tools to elevate teaching and learning burden and make language learning in class more enjoyable.
It is worth noting that this study did not cover all English textbooks recommended by the Ministry of Education.The corpus would have been larger if additional textbooks had been included in the samples.Although the textbooks were sampled from those currently being used in Indonesia, a similar study could be carried out with other textbooks used in other EFL settings to provide more information for dictionary entries of basic vocabulary.

Table 1 .
Textbooks downloaded and their respective vocabulary size