Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
The Effect of Scrambling Test Item on Students’ Performance and Difficulty Level of MCQs Test in a College of Medicine, KKU  [PDF]
Ismail Satti, Bahaeldin Hassan, Abdulaziz Alamri, Muhammad Abid Khan, Ayyub Patel
Creative Education (CE) , 2019, DOI: 10.4236/ce.2019.108130
Background: Multiple Choice Tests (MCQs) are commonly used assessment tool in medical schools, which is delivered to our student in four versions (A, B, C and D) to avoid cheating. The aim of this study was to investigate the effect of scrambling test items on students’ performance and the difficulty level of each version, so as to decide on continuing randomization of test items or keeping it without randomization. Methods: A prospective, cross-sectional study was conducted, the participants were the 5th year undergraduate medical students during their major course of obstetrics and gynecology. Three tests where items were randomized are delivered to the students. After correction, the marks obtained by the candidates and difficulty index of each version were entered into the Statistical Package for Social Sciences (SPSS) version 20 and comparison amongst these four versions was carried out through analysis of variance (ANOVA). A p-value of <0.05 was considered as statistically significant. Results: No significant difference was found in the mean difficulty index for different versions in each test and there are no statistically different results when we compared version A mean students’ scores to other versions (B, C and D) after applying ANOVA analysis (F = 1.14, p = 0.42), (F = 0.75, p = 0.69) and (F = 1.29, p
The Role of Different Cognitive Components in the Prediction of the Figural Reasoning Test's Item Difficulty

LI Zhong-Quan,WANG Li,ZHANG Hou-Can,ZHOU Ren-Lai,

心理学报 , 2011,
Abstract: Figural reasoning tests(as represented by Raven's tests) are widely applied as effective measures of fluid intelligence in recruitment and personnel selection.However,several studies have revealed that those tests are not appropriate anymore due to high item exposure rates.Computerized automatic item generation(AIG) has gradually been recognized as a promising technique in handling item exposure.Understanding sources of item variation constitutes the initial stage of Computerized AIG,that is,searching for t...
Identifying predictors of physics item difficulty: A linear regression approach
Vanes Mesic,Hasnija Muratovic
Physical Review Special Topics. Physics Education Research , 2011,
Abstract: Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Item difficulty of multiple choice tests dependant on different item response formats – An experiment in fundamental research on psychological assessment  [PDF]
Psychology Science , 2007,
Abstract: Multiple choice response formats are problematical as an item is often scored as solved simply because the test-taker is a lucky guesser. Instead of applying pertinent IRT models which take guessing effects into account, a pragmatic approach of re-conceptualizing multiple choice response formats to reduce the chance of lucky guessing is considered. This paper compares the free response format with two different multiple choice formats. A common multiple choice format with a single correct response option and five distractors (“1 of 6”) is used, as well as a multiple choice format with five response options, of which any number of the five is correct and the item is only scored as mastered if all the correct response options and none of the wrong ones are marked (“x of 5”). An experiment was designed, using pairs of items with exactly the same content but different response formats. 173 test-takers were randomly assigned to two test booklets of 150 items altogether. Rasch model analyses adduced a fitting item pool, after the deletion of 39 items. The resulting item difficulty parameters were used for the comparison of the different formats. The multiple choice format “1 of 6” differs significantly from “x of 5”, with a relative effect of 1.63, while the multiple choice format “x of 5” does not significantly differ from the free response format. Therefore, the lower degree of difficulty of items with the “1 of 6” multiple choice format is an indicator of relevant guessing effects. In contrast the “x of 5” multiple choice format can be seen as an appropriate substitute for free response format.
Test item response time and the response likelihood  [PDF]
Srdjan Verbic,Boris Tomic
Physics , 2009,
Abstract: Test takers do not give equally reliable responses. They take different responding strategies and they do not make the same effort to solve the problem and answer the question correctly. The consequences of differential test takers' behavior are numerous: the test item parameters could be biased, there might emerge differential item functioning for certain subgroups, estimation of test taker's ability might have greater error, etc. All the consequences are becoming more prominent at low-stakes tests where test takers' motivation is additionally decreased. We had analyzed a computer based test in Physics and tried to find and describe relationship between the item response time and the item response likelihood. We have found that magnitude of such relationship depends on the item difficulty parameter. We have also noticed that boys, who respond faster, in average, give responses with greater likelihood than the boys who respond slower. The same trend was not detected for girls.
Effect of Test Item Analysis of Summative Exams on Quality of Test Designing
Shaban M,?Ramezani Badr F
Hayat Journal of Faculty of Nursing & Midwifery , 2007,
Abstract: Background & Aim: Item analysis is a process in which both test items (questions) and students' answers are examined in order to assess the quality and quantity of the items and test as a whole. The purpose of this study was to investigate the effect of analysis of multiple choice test items of summative exams on quality of the test design by faculty members of Tehran Nursing and Midwifery School. Methods & Materials: A quasi experimental method (pre-test and post-test) without control group was used in this study. After a pilot study, 33 nursing faculty members of school of nursing and midwifery at Tehran University of Medical Sciences were chosen through census sampling. Then one of their exams designed in the second semester (83-84) were chosen to be analyzed. The analysis results were reported to the faculty members. Then their designed tests for the next semester were analyzed again. The analysis was carried out using a checklist which included item structure, whole structure of exam, content validity, and levels of thinking skills reflected in questions and criteria for holding an exam. Moreover, for the quantitative analysis of questions, item difficulty and discrimination index were calculated. Item distracter analysis was examined by calculating the percentage of examinees who selected incorrect alternatives. Integrated t-test, Pearson and Spearman correlation coefficients, and Fisher's exact test were used for the statistical analysis.Results: 1056 questions before presenting the feedback and 803 questions at the end were analyzed and then the results were compared. According to the results, there was a significant difference between before and after intervention in variables item structure (P<0.001), levels of thinking skills (P<0.05), and item distracter analysis (P<0.001).While there was not significant difference between item difficulty, discrimination index, whole structure of exam, content validity, and criteria that should be considered on holding of an exam. Although, Pearson correlation coefficient showed that variables such as age (r=-0.535, P=0.004), and years of services (r=-0.546, P=0.003) with difficulty index were statistically significant. Conclusion: The results emphasized that item analysis, providing feedback to the faculty members and offering educational booklets to assist them were effective means on improving some qualitative and quantitative items analysis measures.
Differential Item Functioning: Implications for Test Validation
Mohammad Salehi,Alireza Tayebi
Journal of Language Teaching and Research , 2012, DOI: 10.4304/jltr.3.1.84-92
Abstract: This paper attempts to recapitulate the concept of validity, namely construct validity (i.e., its definition and its approaches and role in language testing and assessment). Validation process is then elaborated on and proved to be integral enterprise in the process of making tests, namely English language proficiency tests. Then come the related concept of test fairness and test bias and its sources (e.g., gender, field of study, age, nationality and L1, background knowledge, etc) and contributions and threads to the validity of tests in general and in high-stakes tests of English language proficiency in particular. Moreover, in the present study, different approaches to investigate the validity of tests will be reviewed. Differential Item Functioning (DIF), among the other methods to investigate the validity of tests is also explained along with the description and explanation of its different detection methods and approaches mentioning their advantages and disadvantages to conclude that logistic regression (LR) is among the best methods till now.
Designing item pools to optimize the functioning of a computerized adaptive test  [PDF]
Mark D. Reckase
Psychological Test and Assessment Modeling , 2010,
Abstract: Computerized adaptive testing (CAT) is a testing procedure that can result in improved precision for a specified test length or reduced test length with no loss of precision. However, these attractive psychometric features of CATs are only achieved if appropriate test items are available for administration. This set of test items is commonly called an “item pool.” This paper discusses the optimal characteristics for an item pool that will lead to the desired properties for a CAT. Then, a procedure is described for designing the statistical characteristics of the item parameters for an optimal item pool within an item response theory framework. Because true optimality is impractical, methods for achieving practical approximations to optimality are described. The results of this approach are shown for an operational testing program including comparisons to the results from the item pool currently used in that testing program.
Using IRT in Determining Test Item Prone to Guessing  [cached]
A.D.E. Obinne
World Journal of Education , 2012, DOI: 10.5430/wje.v2n1p91
Abstract: The 3-parameter model of Item Response Theory gives the probability of an individual (examinee) responding correctly to an item without being sure of all the facts. That is known as guessing. Guessing could be a strategy employed by examinees to earn more marks. The way an item is constructed could expose the item to guessing by the examinee.A study on comparison of the Psychometric properties of the Biology Examinations conducted by West African Examination Council and National Examination Council, in the year 2000, identified items that were prone to guessing.
Differential item functioning in the figure classification test  [cached]
E. van Zyl,D. Visser
South African Journal of Industrial Psychology , 1998, DOI: 10.4102/sajip.v24i2.650
Abstract: The elimination of unfair discrimination and cultural bias of any kind, is a contentious workplace issue in contemporary South Africa. To ensure fairness in testing, psychometric instruments are subjected to empirical investigations for the detection of possible bias that could lead to selection decisions constituting unfair discrimination. This study was conducted to explore the possible existence of differential item functioning (DIF), or potential bias, in the Figure Classification Test (A121) by means of the Mantel-Haenszel chi-square technique. The sample consisted of 498 men at a production company in the Western Cape. Although statistical analysis revealed significant differences between the mean test scores of three racial groups on the test, very few items were identified as having statistically significant DIF. The possibility is discussed that, despite the presence of some DIF, the differences between the means may not be due to the measuring instrument itself being biased/ but rather to extraneous sources of variation, such as the unequal education and socio-economic backgrounds of the racial groups. It was concluded that there is very little evidence of item bias in the test. Opsomming Die uitskakeling van onregverdige diskriminasie en kultuursydigheid van enige aard, is tans 'n omstrede kwessie in die werkpiek in Suid-Afrika. Ten einde regverdigheid in toetsing te verseker, word psigomefrriese toetse onderwerp aan empiriese ondersoeke na die moontlikheid van sydigheid wat kan lei tot keuringsbesluite wat onregverdige diskriminasie meebring. Hierdie ondersoek is ondemeem om die moontlikheid van differensiele itemfunksionering (DIF), of potensi le sydigheid, in die Figuurindelingtoets (A121), met behulp van die Mantel-Haenszel chikwadraattegniek, te ondersoek. Die steekproef het bestaan uit 498 mans by 'n produksiemaatskappy in die Wes-Kaap. Alhoewel statistiese ontleding beduidende verskille in gemiddelde toetstellings van drie rassegroepe op die toets aangedui het, is bate min items aangedui wat statistics beduidende DIF bevat. Die moontlikheid word bespreek dat, hoewel sommige DIF in die toets teenwoordig is, die verskille tussen die gemiddeldes nie die gevolg is van 'n sydige meetinstrument per se nie, maar eerder die gevolg van eksteme bronne van variasie, soos byvoorbeeld die ongelyke opvoedkundige- en sosio-ekonomiese agtergronde van die rassegroepe. Die gevolgtrekking was dat daar bate min getuienis van itemsydigheid in die toets is.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.