IMPLEMENTING ASSESSMENT FOR LEARNING IN THE CHINESE EFL CONTEXT : AN EXPLORATORY ACTION RESEARCH STUDY

This article reports on a two-year-long action research project, where Assessment for Learning (AfL) was implemented in a tertiary foreign language classroom in China. It aims to seek answers to three research questions. First, to what extent can AfL impact on learner autonomy? Second, to what extent can AfL effectively improve learners’ proficiency of the target language? Third, what factors may influence the implementation of AfL? The qualitative data elicited from the interviews with learners, triangulated with the quantitative data from questionnaires, revealed that AfL is a rather effective way of promoting learner autonomy. In addition, quantitative data from preand post-tests lend support to the hypothesis that AfL in general achieves overall beneficial effects on learners’ language proficiency, though there are differential effects on sub-groups, i.e. female vs. male, and Shanghainese vs. nonShanghainese. This study also pinpoints certain factors that are possibly conducive to or constraining the implementation of AfL in the Chinese EFL context.


INTRODUCTION
Contextualized in education, assessment is "the measurement of the ability of a person or the quality or success of a teaching course, etc." (Richards, et al., 1992, pp. 35-36).Despite the pivotal role of assessment, for most stakeholders in foreign language education, the notion of assessment or testing may quite naturally evoke negative emotion due to the anxiety and fear caused by standardized tests (see Maclntyre & Garner, 1989;Young, 1991).
As summarized by Black et al. (2003), the negative impact of traditional assessment, exist in three aspects: effective learning, negative impact and the managerial role of assessment.Distinct from traditional assessment, Assessment for Learning (AfL) is defined as "the process of seeking and interpreting evidence for use by learners and their teachers, to identify where the learners are in their learning, where they need to go to and how best to go there" (Assessment Reform Group, 2002, pp. 2-3).Central to these defining characteristics of AfL is the notion that students are actively involved in gathering information and feedback that help them understand their learning processes, and are provided with the right for pedagogical decision-making (Berry, 2008).
The bifurcation between assessment for and of learning is largely influenced by the distinct conceptions about the nature of learning.Formative assessment is underpinned by the neo-behaviorist model of mastery learning (Bloom, 1971;Hasting & Madaus, 1971), which stresses the learning process rather than require students to master specific learning objectives.
Theoretically viable as it may sound, we still know admittedly little about the longitudinal implementation of AfL in the Chinese EFL context.In mainland China, assessment reforms have always been the main foci of educational reforms for various stages of education, for example, the issue and dissemination of English Curriculum for Basic Education (The Ministry of Education, 2012), College English Curriculum Requirements for Non-English Majors (The Higher Education Division of the Ministry of Education, 2007) and College English Curriculum Requirements for English Majors (College Foreign Language Teaching Steering Committee, 2000).However, a noticeable disparity is found between the guidelines issued by the educational authority and the ways they are implemented at the school level (Berry, 2011, p.54).
A review of the extant literature reveals that there has been a dearth of empirical studies on the effectiveness of implementing AfL in Chinese tertiary EFL classrooms.In addition, it remains an unsolved question as what changes occur to learners' motivational, affective and strategic factors over the process and what factors contribute to such a change, if any.As such, it is this research gap that motivates the present study.

Research questions
Against the aforementioned research background, this study aims to seek answers to three research questions as follows.

Research Design
This study employed the action research method and was designed to follow the cyclic process of action research (Kemmis and McTaggart, 2005;Nunan, 1992) as illustrated in Figure 1.It started with a preliminary investigation intended to identify problems under investigation, based on which the research questions were formulated.Then the researchers worked out their Action Agenda I and implemented it into the teaching practices spanning two semesters.Towards the end of this phase was the evaluation, on which the ensuing reflections brought forth the formulation of Action Agenda II.Thereafter, an identical procedure was implemented for another two semesters.

Research setting and participants
Isabella (pseudonym, one of the researchers), an EFL teacher based in Shanghai, has been teaching tertiary level students for nine years.In commencing her teaching at a new institution in the spring of 2010, she was confronted with a hardly manageable situation.Among 44 students in the class (there were 7 males and 37 females, aged between 17 and 20), 25 originated from cities outside Shanghai where students' English proficiency is normally lower than Shanghainese students.At the outset of the first academic year, a college-wide placement test was administered to all the freshmen.After analyzing her students' test scores with an independent samples ttest, she found there was statistical difference for the total scores between the two groups of students, t (41.94) = 4.55, p < .001.With regard to 6 subsections of the test, i.e., listening, vocabulary and grammar, cloze, reading, translation and writing, independent samples t-tests also found statistical differences in listening, t (38.84) = 5.12, p < .001;reading, t (41.33) = 2.15, p < .05;translation, t (42.00) = 4.11, p < .001.There were therefore good reasons to consider those students from outside Shanghai were less proficient in English in comparison with their counterparts from Shanghai.
Moreover, a few initial contacts with the students left Isabella an impressionistic supposition that they might be lacking clear and appropriate learning goals, and their beliefs, confidence, motivation and learning strategies were not likely to be at an advantageous level.Most of them, if not all, were teacher-dependent and relied on classroom-based language learning.In realizing these problems, she found some changes were necessary before she could implement her teaching syllabus that values learner empowerment and autonomy.As such, she decided to conduct an action research project.
As there was a rapport between Isabella and her students, they responded enthusiastically over this longitudinal action research project.For a concern over research ethics, all the interviews in this study were conducted with participants' consent.A noteworthy issue here that may incur the doubt "how informed is the informed consent" is intrinsic with classroom-based inquiries in general.Notwithstanding, the authors agree with Smith (1990) and Zeni (1998) in that field research differs from experimental studies, and thus the relationship between the researcher (teacher) and the participants are rather 'covenants' of mutual trust (Smith, 1990), and the research is an inherent part of how students were taught.In this study, the students were therefore knowingly involved and actively participated in the two-year-long research.

Preliminary investigation
In order to transform the existing problems into operational research variables, Questionnaire A was constructed to measure learner autonomy.The questionnaire items were selected out of an item pool, which was formulated through a focus group brain storming session participated by 3 teachers and 5 researchers.The items were also informed by the opt-cited studies (Horwitz, 1986;Gardner, 1985), and studies on Chinese EFL students (Wen and Johnson, 1997;Gao et al., 2007)  and was administered as the orientation week came to the end.Before the questionnaire was administered, it was first piloted on a group of 10 students for content validation and then piloted on two classes of Year 1 students who study the same major with the target participants, yielding a Cronbach's alpha coefficient of 0.79.After conducting factor analysis, content areas in this questionnaire include learner motivation (5 items), general beliefs toward English learning (3 items), and learning strategies (4 items) .What is presented here is the English version that was co-translated by both authors of this article.The collected data were analyzed with SPSS (V.17.0).In this study, subsequent questionnaires (Questionnaire B and Questionnaire C) were designed and administered following the same procedure.Table 1 shows the means in Questionnaire A.
Table 1 Questionnaire A No.
Items Mean 1.I study English in order to pass exams such as CET 4, CET 6, etc.
3.72 2. I study English in order to find a good job.
3.65 3. I study English because I love the language and the culture of English-speaking countries.
2.47 4. I am interested in English learning.
3.16 5.I am confident in English learning.
2.85 6.I believe learners' learning is an essential factor for the success of language learning.
3.13 7. I believe the teachers' teaching is an essential factor for the success of language learning.
3.69 8.I think high exam scores mean successful language learning.4.37 9.I have an English learning plan for this semester.
3.08 10.I have learned some effective English learning skills and can use them well.
2.84 11.I often spend some time (at least 1-2 hours on average per day) learning English outside my English classes.

2.53
12. I often turn to peers or teachers when confronted with difficulties in my study.

2.46
A structured interview (Interview A) informed by the data elicited with Questionnaire A was subsequently conducted with six students, who were randomly selected by their student IDs.The interviews took place either in a classroom or at Isabella's office, and were embedded in casual chats to elude the observer's paradox (Labov, 1972).The interview sessions ranged from around 30 to 40 minutes for each student.In addition, considering English might impede the students from expressing themselves freely, the interviews were conducted in their native language, i.e. standard Mandarin, and were audio-recorded under the participants' consent (The same procedure applies to Interview B and Interview C in this study).The interview data were then transcribed verbatim by one researcher and checked by the other of this study, and were further sent back to the participants for member checking (Brown and Rogers, 2003).Revisions were made following their responses, before the two authors cotranslated them into English.The inductive analysis of interview data by the two authors was conducted following a procedure of reducing original data, free coding, and pattern coding (Ellis & Barkhuizen, 2005).The interview outline is as follows: (

FINDINGS Learner motivation
In Questionnaire A, Items 1-5 are about learners' motivation, including instrumental motivation and integrative motivation (see Gardner & Lambert, 1972;Kumaravadivelu, 2006).Compared with Item 3 (mean=2.47),which was intended for integrative motivation, the means of Items 1 (3.72) and 2 (3.65) are relatively higher, revealing that the students had higher instrumental motivation than integrative motivation.Furthermore, it can be found from Items 1-2 that students were likely to pursue realistic and exam-oriented goals for English learning, which is reinforced by the data from the interview, as 4 out of 6 students claimed they were eager to pass CET-4 , which entitled them to waive all the English courses, as stipulated in the institutional academic guidelines.
As claimed by Alderson and Wall (1993), tests are powerful determiners that predict the happenings in classrooms.The investigation here reveals that the students lacked enthusiasm and far-sighted motivation in language learning, and there emerge possible signs of negative washback effects (Messick, 1996;Taylor, 2005) of large-scale standardized exams on learners' motivation.Items 4 and 5 show a somewhat satisfactory level interest in English learning among the group of students, but they did not notably display much confidence.During the interview, when talking about their past learning experience, three students recalled their discouraging and frustrating experience of being severely criticized by their instructors in high schools.

Learner beliefs
As far as learners' beliefs are concerned, the learners had moderately high sense of responsibility in learning, as is shown by the means of Item 6 (3.13).Meanwhile, learners tended to depend more on teachers than on themselves, as is revealed by item 7 (mean=3.69).The output from Item 8 (mean=4.37)can further support the findings from Item 1. English learning, as the students perceived it, was largely tinted with exams.With this, it should be deemed necessary to take measures to rekindle learners' interest and collect their confidence, and somehow transform their teacher-reliance and exam-propelled beliefs.

Learning strategies
Studies in the field of autonomous language learning have provided resonant evidence to support the assumption that learner autonomy is positively correlated with their skills of manipulating learning strategies (e.g.Oxford & Nyikos, 1989).Items 9-12 were intended for learners' metacognitive strategies, cognitive strategies and social-cultural strategies respectively, as classified by O' Malley and Chamot (1990).As can be found in Table 1, among three types of strategies, the best performance reported was metacognitive strategies, or more specifically, planning strategies.Though the questionnaire data reveal an observable commonality among the students to claim they did have a study plan to follow (mean=3.08)when requested to elaborate in detail at the interview, five out of six students could not present any ecologically feasible learning plans.The same situation happened when cognitive strategies were tackled in the questionnaire (Item 10) and at the interview (Interview Question 4).The result for Item 11 (mean=2.53)indicates the average amount of time spent by the students learning English outside classroom did not exceed 1-2 hours per day.And they were moderately likely to consult peers or teachers in confrontation with their English learning.

Identified problems and hypotheses
With the above analysis in the preliminary investigation, certain space could be felt for the improvement of learners' motivation, beliefs as well as learning strategies, all of which were turned into operationalized hypotheses.As traditional language classrooms are generally teacher-centered with PPP (Presentation, Practice, Production) as the dominating pedagogy, which leaves limited space for interaction and development of learner autonomy (Davies & Pearse, 2000;Harmer, 2007), the first hypothesis of this action research is that AfL could be an effective way to promote learner autonomy.The second one is that AfL is effective to improve learners' proficiency of the target language.Meanwhile, this action research also intends to find out some hints of the practical constraint(s) and possible factors contributing to implementing AfL in Chinese tertiary FLT classrooms.

Actions
Based on the identified problems and hypotheses above, two cycles of planned actions were designed and further implemented in two academic years (from September 2010 to June 2012).

Reactive autonomy promotion: An action agenda (I)
The action agenda in the first academic year (September 2010 through June 2011) is illustrated in Appendix 1.It has been designed and implemented for transitional and developmental purposes.By "transitional", this session was supposed to bridge the gap between pre-tertiary and tertiary learning.Tasks of quizzes and post-quiz conferences quite resemble what the students had experienced in high schools.In addition, the developmental purpose for this session is to promote learners' reactive autonomy, as proposed by Littlewood (1999) as what "does not create its own direction but, once a direction has been initiated, enables learners to organize their resources autonomously in order to reach their goal" (p.75).In this session, the teacher had more control over designing, assigning and assessing the tasks, which is demonstrated in the specifications of individual oral presentation (news report and free topic speech).
At the end of the first academic year, all the students in the class sat for CET-4 and 79.55% of them passed and obtained the certificates.This was a marvelous achievement for most of them because their performance was far beyond the average level.According to a school-based regulation issued by the Office of Academic Affairs, students who have passed CET-4 at the end of the second semester could be exempted from attending English classes.Somewhat surprisingly, almost all the students decided to take the English course even though a large number of them could be exempted.In the beginning of the second academic year (Sep., 2012), an informal group interview (Interview B) was conducted with the same 6 students in the preliminary investigation to probe into the reasons behind their decision.Interview B was conducted in line with the ensuing questions.
( Most of the students said they were greatly encouraged by the exciting success in the exam, thus brimming with enthusiasm and ambition to continue with English learning.The findings from interview B revealed that it was time to promote proactive autonomy for the students.

Proactive autonomy promotion: An action agenda (II)
After evaluating the feedback and effects of action agenda I, Isabella designed and implemented the second cycle of the action plan, in which more freedom was rendered to the students for negotiating the rubrics, and more cooperative learning was required for the task completion.Two of the three assessment tasks at this stage involved peer interaction and cooperation, and the mini-TED talk was assumed cognitively more challenging than the free topic speech in the first year (Richards, 2015).The action plan for the second academic year is illustrated in Appendix 2.

Evaluation of the actions
The effects of the actions in this study were evaluated by questionnaires, interviews and language proficiency tests.

Evaluation with Questionnaire B
The output from another questionnaire (Questionnaire B) tapping into learner motivation, beliefs, and learning strategies is illustrated in Table 2.As one male student dropped out for military service, the total number of respondents to Questionnaire B declined to 43.
The results for Questionnaire B illustrate the changes in students' non-cognitive factors after two action research agendas were implemented.Item 1 is a general question about learners' self-evaluation of the attainment of study goals.The mean value (3.95) shows participants were highly likely to consider their learning objectives being obtained after two years' practice of ) and Item 3 (3.49-2.85),relating to learners' interest and confidence, were also improved.Items 4-7 address learner belief.
The results for Item 4 (4.15-3.13)and Item 5 (2.84-3.69)indicate a shift from a teacher-dependent to self-reliant conception, and data for Item 6 (2.62-3.72)and 7 (2.21-4.37)show students' departure from traditional exam-driven belief.Moreover, Item 8 (3.87-2.84)and Item 10 (3.67-2.46)reveal learners' development in employing learning strategies, cognitive and social-cultural strategies in particular.The result for Item 9 (3.79-2.53)shows students were prone to invest more time in English learning.

Evaluation with interviews
Nine students, consisting of 3 learners of high, middle and low levels of language proficiency respectively, were interviewed to obtain more in-depth information of learners' feedback on AfL.The students were classified into three proficiency levels based on the ranking of their scores in CET 4 and CET 6.The structured interview C outlined below was conducted.
(1) What activity impressed you most in your college English classes?(2) Why is it so impressive?What have you learned from it?(3) Do you have any future plan of English learning?If yes, what is it?
The responses to the first two questions related to learners' cognitive and socio-affective process in preparing and completing the AfL tasks.The third question was intended for the possible changes in learners' motivation and meta-cognitive strategies.It can be found from the interview data that learners worked cognitively by searching for information, reading intensively and selectively, writing and revising essays, designing and making PPT slides, etc.Meanwhile, learners' social affective factor was activated by their interaction with peers and advice from the teacher.Evidence is apparent as key words about interactive learning process, such as "collaboratively searched for information", "worked with group members", "collected learning materials for group members", etc. as well as those words concerning learners' affective factors including "sense of fulfillment", "confident", "the beauty of reading", "courageous", etc. were salient in the data from Interview C.
At the inception of this action research, it was identified that learners' motivation was mainly instrumental, which, as previous research found (see Dörnyei, 1990), would promote learning only with external incentives.Learners' responses to the third question show that most learners have gradually developed integrative motivation after this period of learning.Five out of nine interviewees admitted that passing exams was just one of their learning objectives in the future.Eight interviewees also contended that the exam certificate was only a byproduct of their language learning.As far as the meta-cognitive strategies are concerned, learners' change was also found in comparison with what they did in the previous interview.Five students could articulate more sophisticated learning plans with concrete and feasible procedures.

Evaluation with pre-test and post-test for triangulation
To lend more support to the second hypothesis, this study explored the potential difference between the test scores of pre-test (T1) and those of post-test (T2) to illustrate learners' change in language proficiency.T1 is a college-wide placement test while T2 is an achievement test taken at the end of the second academic year.Both tests were constructed and marked cooperatively by 4 teachers to ensure reliability.The results are displayed in Table 3. Class 1 (C1), as the participant group, is the class Isabella taught and conducted the action research study with.Class 2 (C2), also Isabella's class, is the one with similar demographic distribution, and the one that obtained similar results in the pre-test.It can be seen in Table 3 that there was no significant difference between two classes (p = .906> .05)before the intervention, whereas C1 performed significantly better (p = .021< .05)than C2 in T2 towards the end of the project.
To illustrate how the intervention impacted different groups of students, the effect sizes were also calculated, and Cohen's d (see Cohen, 1992;Hattie, 2012) was adopted.The results show moderate positive impact of AfL on improving learners' language proficiency (d1 = 0.25, d2 = 0.50).With respect to sub-groups of participants, it is found that the effect sizes were much larger for female students (d1 = 0.28, d2 = 0.57) than male students (d1 = 0.01, d2 = -0.08),which indicates that AfL is likely to be more effective in improving female students' language proficiency in the present study, and had little observable influence on male students.Moreover, the effect sizes are significantly larger for non-Shanghainese (d1 = 0.76, d2 = 0.60) than Shanghainese students (d1 = -0.50,d2 = 0.33), implicating that AfL is more effective in mediating foreign language learning for those who were initially under-developed.

Reflections
The data from a third questionnaire (Questionnaire C) was concerned with learners' feedback on AfL, which would be helpful for the teacher to reflect upon the present action research.Given the descriptive statistics in Table 4, the teacher's reflections were focused on three aspects.Second, the learners maintained they benefited more from the teacher's feedback than from the peers' feedback.Their agreement with Item 10 (3.65) is lower than that with Item 11 (3.98), though both above 3.0 (agree slightly).The implications might be twofold: (1) AfL takes effect as students are engaged in a collaborative learning process where they "learn from and support each other" (Carless, 2011, p.165); (2) in addition to the findings by previous studies (e.g.Connor & Asenavage, 1994;Nelson & Carson, 1998), learners still perceived peer feedback of limited benefit even after two-year in-depth and sustainable implementation of AfL.
Third, learners reportedly showed more interest in cooperative learning compared with individual learning.Item 13 (3.40) and Item 14 (3.86) indicate students had more preference for cooperative assessment tasks than tasks involving individual work.Item 15 (3.98) and Item 16 (3.88)reveal that learners perceive they benefited both from the planning process of a task and from peers' task performance.The reasons why this happen could be that there was a sense of shared responsibilities among the learners, and they would work with more mutual sympathy, commitment and inclination to task completion (Jacobs et al., 2006).

CONCLUSIONS
This study found that in a Chinese FLT classroom, AfL is effective in promoting learner autonomy and improving learners' proficiency in the target language.As a limitation intrinsic in most action research studies, this study is likely to be vulnerable to criticism of its external reliability as it may be argued that it was the teacher's enthusiasm that took effect.Plausible might this sound, action research and AfL are mutually compatible, for in doing both a language teacher integrates problem solving with academic research, with a common goal of bringing forth change and enhancing learning.
However, findings from this study have implications for implementing AfL not only in the Chinese EFL context but also in other EFL scenarios.First, as expounded early, the differences across learners are obvious from the outset of certain academic study phases, language instructors, therefore, are urged to lay more emphasis on the negotiation process of how assessment can enhance individual learners' motivation.On top of that, teachers are advised to bring learners' attention to the appropriate use of learning contracts, by which learners and teachers as well should abide.Second, it is highly desirable for language instructors to be adequately aware of the importance of providing timely feedback to learners in AfL so that learners can be immediately informed of the most suitable ways to improve themselves.Otherwise there might be delayed feedback that impedes learners' improvement.Last, it is also found that to ensure successful and sustainable implementation of AfL in a Chinese EFL setting, teacher learning and development is quintessential before and while conducting this assessment innovation in classroom teaching.In many EFL contexts, where formative assessment seems proliferated, there still needs an endeavor, i.e., the shift from AoL and AfL can be made among teachers.In doing so, teachers should be equipped with adequate knowledge and professional competence, as part of assessment literacy, to conduct AfL.
First, to what extent can AfL impact on learner autonomy?Second, to what extent can AfL effectively improve learners' proficiency of the target language?Third, what factors may influence the implementation of AfL in the Chinese EFL context?
were paid close attention to.It was a five-point Likert scale of agreement in Chinese,

Table 2
Questionnaire B

Table 3
Comparing pre-and post-test and two classes in post-test

Table 4
Learners' feedback on AfL