Horizontal Equating of Science Test Forms Using Generalized Partial Credit Model (GPCM) in Secondary Education

Fajar Nur Cahyani, Samsul Hadi, H. Haryanto, Heri Retnawati

Abstract


Ensuring fair comparisons between different test forms is a central concern in educational assessments. This study explores the horizontal equating of two versions of a science academic test using the Generalized Partial Credit Model, which is suitable for items scored across multiple categories. Student responses were analyzed using the Mean–Sigma method, involving shared anchor items to align the scales of the two test forms. The analysis revealed consistent item parameters and student ability estimates after transformation. A good model fit was observed based on residual and approximation measures, although further refinement may be needed due to limited index values in structural comparisons. The threshold distributions became more stable after equating, and graphical analyses confirmed that item characteristics and measurement information were preserved. This approach proved successful because it aligned the measurement scales without distorting original ability estimates. The findings support the development of fairer assessment systems that uphold validity, reliability, and comparability in science education.

Keywords


Equating; Generalized Partial Credit Model (GPCM); Item Response Theory (IRT); Mean–Sigma; Polytomous Items

Full Text:

PDF

References


Al Husaeni, D.F., Al Husaeni, D.N., Fiandini, M., and Nandiyanto, A.B.D. (2024). The research trend of statistical significance test: Bibliometric analysis. ASEAN Journal of Educational Research and Technology, 3(1), 71-80.

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238–246.

Dorans, N. J., Moses, T. P., and Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series, 2010(2), i-41.

Fiandini, M., Nandiyanto, A.B.D., Al Husaeni, D.F., Al Husaeni, D.N., and Mushiban, M. (2024). How to calculate statistics for significant difference test using SPSS: Understanding students comprehension on the concept of steam engines as power plant. Indonesian Journal of Science and Technology, 9(1), 45-108.

Fitriana, Y., and Soepriyanto, Y. (2022). Implementasi model IRT 2PL dalam penyetaraan nilai ujian sekolah. Jurnal Penelitian dan Evaluasi Pendidikan, 26(1), 13–25.

Haertel, E. (1986). The valid use of student performance measures for teacher evaluation. Educational Evaluation and Policy Analysis, 8(1), 45-60.

Hooper, D., Coughlan, J., and Mullen, M. R. (2008). Structural equation modelling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53–60.

Hu, L. T., and Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

Petersen, N. S. (1989). Uses and misuses of standardized tests. The Phi Delta Kappan, 70(8), 634–639.




DOI: https://doi.org/10.17509/ijert.v5i3.88050

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Universitas Pendidikan Indonesia (UPI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View My Stats
Indonesian Journal of Educational Research and Technology (IJERT) is published by Universitas Pendidikan Indonesia (UPI)