Development of the STEM-Pedagogical Content Knowledge Scale for Pre- service Teachers: Validity and Reliability Study

This study aims to develop a valid and reliable scale to determine pre-service teachers' STEM-Pedagogical Content Knowledge (STEM-PCK) levels. This study was conducted in the 2018-2019 academic year with 322 pre-service teachers in Turkey. In the study, one of the mixed method typologies, Exploratory Sequential Design, was applied. The scale was submitted for evaluation by four experts to determine the content and face validity. Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) was performed to determine the scale's construct validity. As a result of EFA, the scale had five factors: STEM Pedagogical Knowledge, Pedagogical Knowledge, Engineering Pedagogical Knowledge, Mathematics Pedagogical Knowledge, and Science Pedagogical Knowledge. As a result of CFA χ2/df = 2.71, RMSEA = .07, RMR = 0.04, SRMR = .07, NFI = .94, NNFI = .96, CFI = .96, IFI = .96, RFI = .94 values were reached and the factor structure determined to be suitable. To determine the reliability of the scale, internal consistency and test-retest reliability analyzes were made. The internal consistency reliability value of the scale was found as .98. The final form of the STEM-PCK scale is a 5-point Likert type that includes 57 items and five factors.


INTRODUCTION
Science, Technology, Engineering, and Mathematics (STEM) has been a growing area since President Barack Obama shared the United States national dialogue in 2011 (Epstein & Mille, 2011). They were improving STEM research studies focused mainly on preparing students for STEM-related careers and improving STEM literacy for all students. The studies on STEM is showed that teachers' effectiveness has a substantial effect on students' achievement in STEM subjects (Knezek, Christensen & Tyler-Wood, 2015;York, 2018). In addition, Wahono, Lin, & Chang (2020) explained that STEM enactment positively affects Asian students' learning outcomes in academic learning achievement, higher-order thinking skills, and motivation.
Science teachers should know how to combine many different types of knowledge from science disciplines for effective science instruction for STEM areas (Schmidt & Fulton, 2015). Each science discipline requires important content of scientific knowledge and subject-specific pedagogy (York, 2018). Because teachers' knowledge of a particular subject matter, which includes Pedagogical Content Knowledge (PCK) and Content Knowledge (CK), are vital components of teacher education programs (Kleickmann et al., 2013). STEM-related research showed that pre-service teachers are not experts in STEM subjects because of inadequate content knowledge and low selfconfidence to teach STEM (Epstein & Miller, 2011;York, 2018). Therefore, pre-service teachers need a more robust STEM education (Srisawasdi, 2012).
Most teacher education programs include content courses for chemistry, biology, physics, astronomy, technology, earth sciences, and science teaching methods systems. However, York (2018) argued that to know theoretical knowledge of different disciplines is not enough to understand how to integrate and teach in actual classroom instruction. Therefore teacher education programs must prepare pre-service teachers about how to lead a specific content knowledge at the level of their students in terms of pedagogy and inquiry. Also, teacher education programs should support teachers' PCK for STEM (Srikoom, Faikhamta & Hanuscin, 2018). Srikoom, Faikhamta & Hanuscin (2018) explained that the components of PCK for STEM include how teachers comprehend the place of STEM in science education curriculum and educational materials, how teachers conceptualized STEM education. Integrating STEM into science education and teaching practices in return the complex multi-dimensional process with many dimensions such as Scientific Knowledge, Mathematical Knowledge, Technological Knowledge, Engineering Knowledge, Content Knowledge, Pedagogical Knowledge become important (Sarkim, 2020).
Initially, Pedagogical Content Knowledge (PCK) propose by Shulman in 1986. Shulman explained three categories of content knowledge, including subject-matter (content) knowledge, subject-matter pedagogical knowledge, and curricular knowledge (Kind, 2009). Magnusson, Krajcik, & Borko (1999) explained that PCK requires "…teacher's understanding of how to help students understand the specific subject matter. It includes knowledge of how particular subject matter topics, problems, and issues can be organized, represented, and adapted to the diverse interests and abilities of learners, and then presented for instruction." (p. 96). PCK is unique to each subject, context, and teacher because they have different knowledge about teaching and how to teach the subject matter knowledge to their students (Park, Suh & Seo, 2018).
Studies showed that qualified science subjects significantly affect students' attitudes and career choices toward science-related fields (Knezek, Christensen & Tyler-Wood, 2015). Developing teachers' STEM pedagogical knowledge and proficiency with STEMspecific pedagogical strategies would increase their selfconfidence to teach STEM and, in turn, would positively affect their students' career choices in STEM-related fields (York, 2018).
In recent years, studies on STEM education (Corlu, Capraro, & Corlu, 2015;Derin, Aydın & Kirkiç, 2017) have focused mainly on teachers and pre-service teachers. El-Deghaidy & Mansour (2015) argued that to promotes effective STEM education, we need to identify teachers' content knowledge and pedagogical content knowledge to enact STEM education in class. Therefore, this study aims to develop a valid and reliable scale that can determine the Pedagogical Content Knowledge of pre-service teachers about what, when, why, and how they teach STEM subjects and contributes to the field.

Research Model
The study aimed to develop a valid and reliable scale to determine teacher candidates' Science, Technology, Engineering, and Mathematics-Pedagogical Content Knowledge (STEM-PCK) levels. To achieve this aim, Exploratory Sequential Design, one of the mixed-method research typologies in which qualitative and quantitative research methods were used together, was used in the study (Tashakkori & Creswell, 2007). In the Exploratory Sequential Design, after the qualitative data are collected and analyzed, the data are tested and diagnosed with quantitative methods (Creswell & Clark, 2011). The pattern scheme of the research design shows in Figure 1.

Scale Development Process
In this study, initial scale development steps explained by Seçer (2015) were used. These stages are(1) Determining the need; (2) Literature search and creating an item pool; (3)Taking expert opinion; (4) First form of the scale; (5) Applying the pilot study for item selection; (6) Determining the sample group; (7) Performing statistical analysis for the application of the rearranged scale and item selection; (8) Creating the final form of the scale.

Determining the need
When the literature examined, it is seen that in recent years, scale development studies in the fields of PCK and TPCK (Technological Pedagogical Content Knowledge) for teachers and pre-service teachers are frequently used (Graham et al., 2009;Schmidt et al., 2009;Landry, 2010;Marks, 1990;Sun & Strobel, 2014). However, when the literature was examined, a limited number of studies were found that measure the Pedagogical Content Knowledge of teachers and pre-service teachers towards STEM (Rigelman, 2014;Yildirim & Sahin-Topalcengiz, 2019

Literature research and creating an item pool
While creating the scale items developed to determine pre-service teachers' STEM PCK levels, first examined the theoretical structure with a wide-ranging literature review in PCK and TPCK. As a result of the literature research The item pool prepared to ensure the content and face validity was submitted to the evaluation of three field experts working in the science education department at three different universities in Turkey and a Turkish language expert who teaches Turkish in a public secondary school in Turkey to be examined in terms of compliance with the spelling rules (Table 1).
Experts ask to indicate in Table 2 their answers regarding the eligibility of the items for the scale. Next, experts ask to mark the "Appropriate " section on the table if the thing is appropriate and the "Not Appropriate " area if it is not applicable and write their explanations in the "Explanation " section. Finally, it states that experts could add the new items suggested to the scale to the table.

First form of the scale
In line with the experts, two items (6th and 53rd items) were excluded from the item pool as it was stated that they were not directed to the feature intended to measure. Items excluded from the scale: Item 6: "I have sufficient opportunities to work on each of the STEM fields." Item 53: "I can apply/teach a lesson plan that includes (combines) Science, Technology, Engineering and Mathematics subjects." In the feedback from experts, it was also determined that some items were not clearly expressed and understood, some items could not measure the behavior they wanted to measure adequately, and some things could not focus on the behavior they wanted to measure, and 2, 4, 7, 8, 11, 12 ., 13., 14., 15., 16., 17., 18., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 44., 54., 58., 75. and 79. İtems were rearranged. Examples of mixed items:  Item 2: The item is written as "I know the importance of STEM in science education" was arranged as "I know the importance of STEM in education." Item 58: The item is written as "While evaluating the STEM teaching process, I can develop measurement/evaluation tools appropriate to the subject." It was arranged as "I can prepare the appropriate STEM measurement tools for the evaluation process." The draft scale form with 85 items was made ready for pilot application after the expert Turkish teacher checked the scale in terms of spelling and language.

Applying the pilot study for item selection
A pilot study carries out to determine whether there were any unnoticed expressions or format problems on the form and items of the draft scale created before the general applications. The pilot application was carried out by the researchers on a group of 15 senior pre-service science teachers. According to the application results, necessary corrections made on the items decide that 20 minutes are required to implement the scale.

Determining the sample group
The study population consists of pre-service teachers from 1., 2., 3. and 4. grades that's a study in Turkey during 2018-2019. The sample of the study consisted of 322 teacher candidates in different grade levels (1., 2., 3. and 4. grades) from six (6) other teaching departments in seven (7) different universities in Turkey using a purposeful sampling method.
The scale was prepared online, for which the first draft form create after the pilot study. In the study, the STEM-PCK draft scale form link was sent as an e-mail to the pre-service teachers in the sample group. Pre-service teachers who received the invitation were provided with access to the online scale form to fill out on demand. As a result of the study, 322 teacher candidates filled the scale. The teacher candidates' 274 (85.1%) are female, and 48 (14.9%) are male. The information about the university and departments of the teacher candidates, who constitute the research study group, is given in Table 3.

Content validity
Within the scope of the content validity, three field experts consulted for the quantitative and qualitative evaluation of the items in the scale. As a result of an assessment of the field experts, the things they found appropriate, unsuitable, and suggested review examines. Intercoder reliability among experts is according to Miles & Huberman (1994). It calculates with the formula (Reliability = number of agreements / (number of the accords + disagreements) x 100), and this ratio determines as .87.

Face validity
For the face validity of the scale, the scale evaluates by three experts. Relevant items remove from the scale according to their feedback. After the necessary corrections, an explanation section was prepared on the front page of the hierarchy about what purpose the ranking will be used for, what it aims to measure, and how many items it consists of. Later, the scale was applied to pre- service teachers within the scope of a pilot study, and it was aimed to determine the incomprehensible items from the items in the hierarchy.

Construct validity (Factor analysis)
Factor analysis was carried out to reveal the implicit structure of the measuring tool and to determine the subdimensions of the scale by bringing together the related items. In scale development studies, Exploratory Factor Analysis (EFA) is performed to reveal the implicit structure of the scale, while Confirmatory Factor Analysis (CFA) is committed to verifying this implicit structure.
Exploratory factor analysis (EFA). Exploratory Factor Analysis (EFA) was carried out to determine the sub-dimensions of the items related to each other in the scale and the relationship between them. Developing the scale first determines whether the sample size to which the scale is applied is sufficient for factor analysis and whether multivariate normality is achieved (Buyukozturk, 2007). The EFA result calculated the Kaiser-Meyer-Olkin (KMO) test result as .94 (Table 4). According to Kaiser (1974), values between .00 and .49 are unacceptable, .50 and .59 are miserable, .60 and .69 are mediocre, .70 and .79 are middling, .80 and .89 are meritorious, and .90 and 1.00 are marvelous. For this reason, the KMO test result obtained shows that the sample size is suitable for factor analysis of the available data. Because as the value obtained approaches 1.00, the adequacy of the sample size increases (Seçer, 2015). Also, the significance of Barlett's test of sphericity (Bartlett's test of sphericity) indicates that the data set provides multivariate normality (14118.21, p <.05) (Seker & Gencdogan, 2014).
There are 85 items in the first form of the STEM-PCK scale, creates due to the pilot study. As a result of the factor analysis, five factors were found with an eigenvalue more significant than one and explaining 60.38% of the total variance. The first factor explains 22.52%, the second factor 12.44%, the third factor 9.39%, the fourth factor 8.60%, and the fifth factor 7.43% (Table 5).  The Scree Plot chart ( Figure 2) examines five vertical breaks with an eigenvalue greater than 1. Therefore, according to these results, it evaluates that the scale had five factors.
The resulting component matrix table was examined at the end of the analysis with a factor load value of at least .32 and above for each item. Since the sub-dimensions in the measurement tool were thought to be unrelated to each other, Varimax rotation was performed, which is one of the orthogonal rotation techniques. Items with a difference of less than .10 between factor loading values were evaluated as overlapping items, and as a result of the analysis, 1-2-3-4-5-6-7-9-12-  were excluded from the scale. When the Rotated Component Matrix table (Table 6) was  When the data obtained as a result of EFA are evaluated, the STEM-PCK scale, the first version of 85 items, revealed its implicit structure consisting of 57 items and five sub-dimensions (Table 7).
Confirmatory Factor Analysis (CFA). Confirmatory Factor Analysis (CFA) is an analysis performed to test and verify sub-dimensions determined by EFA (Seçer, 2015). It is seen that the x2/df value of the model obtained as a result of the CFA analysis is less than 3, so it provides the acceptable fit value. In addition, NNFI (Non-Normed Fit Index), IFI (Incremental Fit Index), RMSEA (Root Mean Square Error of Approximation), and SRMR (Standardized Root Mean Square Residual) values meet the criteria of good fit; NFI (Normed Fit Index), RFI (Relative Fit Index), CFI (Comparative Fit Index) and RMR (Root Mean Square Residual) Values were evaluated to be in acceptable compliance ranges. These values show that the model tested for the five-factor structure of the STEM-PCK scale examined within the scope of CFA fits well (Table 8). Table 8 data has been interpreted according to the eligibility criteria of Seçer (2015). In figure 3, t values of the model are shown. The t values of the model obtained from the STEM-PCK scale as a result of CFA were examined. Since all the values are higher than 1.96, all variables determined that path in the model are significant. Therefore, it evaluates that no item was incompatible with other things and had an excellent validity (Figure 3). In figure 4, standardized solutions of CFA shows. In this figure, the factor load value of each item has a load value of at least .30 or more was examined. The path diagram given in Figure 4 below review shows that the scale factor load values are at the desired level, between .32 and .73. Therefore it concludes that standard factor loadings have an effect on each factor on variable's variance.

Reliability Studies
To determine the reliability of the STEM-PCK scale, the internal consistency coefficient was calculated, and testretest reliability analysis was performed. The test-retest method applies a scale to the same subject group twice under the same conditions and within a specific time interval. The correlation coefficient of the measurement values obtained from the two applications is the reliability coefficient of the scale. For the test-retest reliability study, the scale was applied to 34 pre-service teachers twice with an interval of 21 days. The Cronbach alpha for the total STEM-PCK scale was .98. The coefficients for the factors ranged from .89 for Science Pedagogical Knowledge and .97 for STEM Pedagogical Knowledge.

Creating the Final Form of the Scale
The Cronbach Alpha ( Cronbach Alpha reliability coefficient value of .70 and above is considered sufficient for the reliability of test scores (Buyukozturk, 2007). Based on this, it can be said that the results to be obtained from the scale are highly reliable. Also, correlations between factors were calculated, and the results are given in Table 11 When Table 11 examines, it determined that there is no correlation between F2 (Pedagogical Knowledge) and F3 (Engineering Pedagogical Knowledge) (r = .29, p> .05). Also, it seems that the lowest correlation is between F4 (Mathematics Pedagogical Knowledge) and F2 (Pedagogical Knowledge), and the highest correlation is between F3 (Engineering Pedagogical Knowledge) and F4 (Mathematics Pedagogical Knowledge).

Discussion
In our age, developments in information and technology cause different needs along with it. Thereservice teachers are expected to have knowledge about science, technology, and engineering, to be able to use the knowledge they have learned and to integrate it with other subjects, as well as to support their technology knowledge  24,25,26,27,28,29,30,31, 32,33,34,35,36,37,38,39,40 ,41,42,43,44,45,57 24 . with content knowledge and field-specific pedagogical method knowledge (Mishra & Koehler, 2006). For this purpose, studies focused on developing a scale on STEM knowledge (Corlu, Capraro, & Corlu, 2015;Derin, Aydın & Kirkiç, 2017) and pedagogical content knowledge (Graham et al., 2009;Schmidt et al., 2009;Landry, 2010;Marks, 1990;Sun & Strobel, 2014).  For this reason, integrating STEM into the teaching of teacher candidates has become an academically valuable issue. Therefore, the study aimed to develop a valid and reliable scale to determine pre-service teachers' STEM-Pedagogical Content Knowledge (STEM-PCK) levels. Therefore, a validity and reliability study of the draft scale consisting of 85 items carried out with 322 teacher candidates in different grade levels (1., 2., 3. and 4. grades)  (6) other teaching departments in seven (7) different universities in Turkey.
The Likert-type scale is preferred chiefly among the scale types because it is practical, increases the grading level, and gives measurement results in an equal-interval scale (Tezbasaran, 2008). The items of the STEM-PCK scale prepare as 5-Likert type. As a result of the EFA, it determined that the KMO value for the STEM-PCK scale was .94, and Bartlett's test result was significant. These results showed that the sample size was suitable for factor analysis. The data set provided multivariate normality as a result of the varimax rotation, the implicit structure of the STEM-PCK scale consisting of 57 items and five subdimensions reveal. The STEM-PCK scale's subdimensions determined are STEM Pedagogical Knowledge, Pedagogical Knowledge, Engineering Pedagogical Knowledge, Mathematics Pedagogical Knowledge, and Science Pedagogical Knowledge. Yildirim & Sahin-Topalcengiz (2019) developed a STEMPCK scale consist of six factors. This scale has only one subdimension, which is Pedagogical Knowledge, similar to the STEM-PCK scale. This study aimed to measure pre-service teachers' pedagogical knowledge in the area of Science, Mathematics, Engineering, and STEM, while Yildirim & Sahin-Topalcengiz's (2019) STEMPCK Scale only one dimension to measure the pedagogical knowledge level of pre-service teachers.
The 5-factor structure of the scale was confirmed by CFA analysis (χ ² /df=2.71, RMSEA = .07, RMR = 0.04, SRMR = .07, NFI = .94, NNFI = .96, CFI = .96, IFI = .96. RFI = .94). Internal consistency and test-retest reliability analyze conducted to determine the reliability of the scale. As a result, the Cronbach's Alpha (α) internal consistency reliability value of the scale was .98, and the test-retest reliability value was .97. Correlation between 5 factors was calculated in the STEM-PCK scale, and it was determined that the highest correlation was between Engineering Pedagogical Knowledge and Mathematics Pedagogical Knowledge.

CONCLUSION
It can be said that the scale, which is developed as a result of validity and reliability analysis, consists of 57 items and five factors in 5-point Likert type, is a valid and reliable measurement tool for determining the STEM-Pedagogical Content Knowledge (STEM-PCK) levels of pre-service teachers. With the developed scale, it is thought that it will help measure the Pedagogical Content Knowledge toward STEM education, which is spreading rapidly all over the world, of teacher candidates who are educated in national and international fields.