You are on page 1of 15

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.

4, August 2012

Modeling the Response to Questionnaire in Software Development: Multinomial or quasimultinomial?


Madhumita Singha Neogi1, Soubhik Chakraborty2*, Vandana Bhattacherjee3

Department of Information Technology, Xavier Institute of Social Service, Ranchi-834001, India 2 Department. of Applied Mathematics, Birla Institute of Technology, Mesra, Ranchi-835215, India 3 Department of Computer Science & Engineering, Birla Institute of Technology, Lalpur, Ranchi-834001, India *corresponding authors email: soubhikc@yahoo.co.in (S. Chakraborty)

ABSTRACT
In the present paper, an attempt is made to describe the response of questionnaire by a multinomial model. The questionnaire was designed for the undergraduate students to know whether students had agility implicitly without knowing agile software development model. In order to instill agile software development practices among students, data were collected about the effectiveness of few parameters in students group. There are 116 respondents and six questions, each with possible answers yes, neutral or no. While the individual responses can be assumed to be independent of one another, it is not clear whether the probabilities of yes, neutral, or no are fixed for every individual for a particular question. This is therefore tested, for each question separately, using Chi-Square test of goodness of fit which confirms the multinomial model. However, the probabilities of the possible responses are found to be varying from question to question.

KEY WORDS: Statistical modeling, multinomial distribution, probability, Chi-square goodness of fit,
quasi-multinomial distribution, software development.

1. INTRODUCTION
One of the strengths of statistics lies in modeling. Although all statistical models are subjective, it should be remembered that they arise from data that is objective--or nearly so--and often, at their center, stand some very elegant mathematical theorems--of course, free from bias. There are three fundamental steps in statistical modeling: first, fit a model to capture some phenomenon of the world around us; second, make an intelligent guess of the model parameters; and third, verify the goodness of the fit. We all know that statistics can be broadly divided into two categories: descriptive and inferential. In statistical modeling, both are involved in that we first describe a pattern (through modeling) and then infer about its validity (Klemens, 2008).
DOI : 10.5121/ijcsa.2012.2405 47

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Perhaps the most important change in software development methodology in the last 15 years has been the introduction of the word agile (Abbas et. al., 2010). Our ongoing research is also in this area. As any area matures there is a need to understand its various components, parameters and relations. In order to instill agile software development practices among students group we decided to collect data about the effectiveness of few parameters in students group. Before we teach them agile method of software development we observe few characteristics they already have, implicitly, such as they like to work with team mate, they think working in pairs improves quality because the codes are verified by two persons, they would like to interact with friends more to improve the quality of the program, they would like to write the code and be satisfied with by executing it properly. They hesitate to design the problem and instead they used to jump to coding and testing. They also go for test case design though not in a very systematic manner. Students performance in class tests and practical exams shows their competence towards the subject. All these characteristics of the students towards program development lead them to agile software development practice. As we know the manifesto for agile software development has given emphasis on following four issues: individuals and interactions over processes and tools, working software over comprehensive documentation, customer collaboration over contract negotiation, responding to change over following a plan (Pressman, 2005). To prove this an attempt is made to describe the response of questionnaire by a multinomial model. There are 116 respondents and six questions. For each question, the possible answers can be yes, no or neutral. While the response one individual provides for every question can be assumed to be independent of that provided by another individual, it is not clear whether the probabilities of yes, no or neutral are fixed for every individual for a particular question. If they are fixed, then multinomial model holds. Otherwise the model is quasi-multinomial. While verifying the goodness of fit of the multinomial model, it is this aspect which is tested for each of the six questions separately using Chi-Square goodness of fit test. The organization of rest of the paper is as follows. In section 2.1, we give the genesis of multinomial model briefly. In section 2.2 we give the nature of the problem related to software engineering that generated the data. The data itself is given in the appendix. The experimental results are given in section 3.This is followed by a discussion in section 4. Section 5 is reserved for conclusion.

2.1 Multinomial distribution


Consider n independent trials being performed. In each trial the result can be any one of k mutually exclusive and exhaustive outcomes e1, e2ek, with respective probabilities p1, p2pk. These probabilities are assumed fixed from trial to trial. Under this set up, the probability that out of n trials performed, e1 occurs x1 times, e2 occurs x2 times .ek occurs xk times is given by the well known multinomial law. {n!/(x1! x2!...xk!)}p1x1p2x2pkxk where each xi is a whole number in the range 0 to n subject to the obvious restriction on the xis, namely, x1+x2+..+xk=n. E(xi)=n(pi) and Var(xi)=n(pi)(1-pi), i=1, 2.k. Cov (xi, xj) = -n(pi)(pj) One can then calculate the correlation coefficient between xi and xj as Cov(xi, xj)/[Var(xi)*Var(xj)]. See also Gupta and Kapoor (2010). If the probabilities vary we shall get a quasi multinomial distribution.
48

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

2.2 Genesis of the problem


There are many research papers related to agile software development whether it has been implemented in organization or student projects. However we never come across any paper trying to modeling a questionnaire response into multinomial distribution in order to prove agility in students group. There have been many promising studies of the use of agile methods (Dyba and Dingshor, (2009)). Qualitative research methods were developed in the social sciences to enable researchers to study and social and culture phenomena and are designed to help researchers understand people and the social and cultural contexts within which they live (Dinzin and Lincoln, (2011)). Full consideration of human behavior within software development environments, and that of understanding the complexities of human behavior requires that researchers go beyond the limitations of quantification and statistical analysis (Dyba et. al. (2011)). In another paper Bailey et al. (2003), conventional techniques in psychology for attitude measurement are used to construct scales for measuring software engineering students attitudes toward the code inspection process. Eliasson et al. (2006) have shown statistically viewing the data set concerning prior knowledge supports the obvious assumption that, in the pre-survey, experienced students have higher faith in their ability than inexperienced students concerning programming, computers and problem-solving. In another study Lucas Layman has shown how collaborative work can increase student understanding through collaborative learning and how it improves student performance in computer science courses (Layman, 2006). There are many papers which experiments with students to show there are significant improvements in students performance. Experiments carefully designed, conducted, and most importantly carefully integrated with the course increase the students final performance. It also increases their motivation, team spirit and interest in empirical software engineering (Staron, (2007)), (Maher, (2009)). Dick et al. (2001) have shown improvements that can be achieved in the learning of software engineering practices when a proactive process of measurement and intervention based around adult learning principles is utilized. A study was carried out earlier to assess the pattern, which is similar to agile type methodologies in software development procedures of student groups (Bhattacherjee et al., (2007), (2009b). In our study, the n independent trials correspond to the n independent responses to a particular question, n=116. The possible responses yes, no and neutral correspond to the events e1, e2 and e3 (k=3 here) with corresponding probabilities p1, p2 and p3. These probabilities are obviously different in general for different questions. But are they fixed for a particular question for every individual or do they vary from individual to individual? The questionnaire is shown in table 1.

49

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Table1: The six questions asked to students of software development

3. Experimental results
We split the 116 responses into two halves of first 58 responses and next 58 responses respectively. Our null hypothesis H0 is that the overall probabilities of yes, no and neutral for all the 116 responses are maintained in the two halves as well. This would be tested against the alternative H1 that these probabilities are different in the two halves at 5% level of significance. The null hypothesis will be rejected if the relative proportion (a measure of probability) of yes, no and neutral are found to be significantly different in the two halves, to be tested by Chi-square for each question separately. Our experimental results are given below (Tables 2.1-2.4, 3.1-3.4, 4.14.4, 5.1-5.4, 6.1-6.4 and 7.1-7.4 corresponding to data for questions 1-6 respectively):Q1. First half (1-58) Expected Value of yes/neutral/no out of 58=Overall yes/neutral/nox58=[No. of (yes/neutral/no) out of 116/116]x58 Relative frequency of

Writing E for expected frequency and O for observed frequency, and abbreviating yes, neutral and no by 1, 2 and 3, we have E(yes)=88X58/116=44 E(neutral)=1X58/116=0.5 E(no)=27X58/116=13.5

Table 2.1: Table of observed and expected frequencies for Q1 (first half)

Table 2.2 Calculation of Chi-Square for Q1 (first half)

50

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Q1. Second half (59-116) Table 2.3 Table of observed and expected frequencies for Q1 (second half)

Table 2.4 Calculation of Chi-Square for Q1 (second half)

Q2. First half (1-58) E(yes)=90X58/116=45 E(neutral)=22X58/116=11 E(no)=4X58/116=2

Table 3.1 Table of observed and expected frequencies for Q2 (first half)

Table 3.2 Calculation of Chi-Square for Q2 (first half)

Q2. Second half (59-116) Table 3.3 Table of observed and expected frequencies for Q2 (second half)

51

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Table 3.4 Calculation of Chi-Square for Q2 (second half)

Q3. First half (1-58) E(yes)=102X58/116=51 E(neutral)=9X58/116=4.5 E(no)=5X58/116=2.5 Table 4.1 Table of observed and expected frequencies for Q3 (first half)

Table 4.2 Calculation of Chi-Square for Q3 (first half)

Q3 Second half Table 4.3 Table of observed and expected frequencies for Q3 (second half)

Table 4.4 Calculation of Chi-Square for Q3 (second half)

Q4. First half (1-58) E(yes)=99X58/116=49.5 E(neutral)=10X58/116=5 E(no)=7X58/116=3.5

52

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Table 5.1 Table of observed and expected frequencies for Q4 (first half)

Table 5.2 Calculation of Chi-Square for Q4 (first half)

Q4. Second half (59-116) Table 5.3 Table of observed and expected frequencies for Q4 (second half)

Table 5.4 Calculation of Chi-Square for Q4 (second half)

Q5. First half (1-58) E(yes0=95X58/116=47.5 E(neutral)=16X58/116=8 E(no)=5X58/116=2.5

Table 6.1 Table of observed and expected frequencies for Q5 (first half)

Table 6.2 Calculation of Chi-Square for Q5 (first half)

53

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Q5. Second half Table 6.3 Table of observed and expected frequencies for Q5 (second half)

Table 6.4 Calculation of Chi-Square for Q5 (second half)

Q6. First half (1-58) E(yes)=100X58/116=50 E(neutral)=13X58/116=6.5 E(no)=5X58/116=1.5 Table 7.1 Table of observed and expected frequencies for Q6 (first half)

Table 7.2 Calculation of Chi-Square for Q6 (first half)

Q6. Second half (59-116) Table 7.3 Table of observed and expected frequencies for Q6 (second half)

Table 7.4 Calculation of Chi-Square for Q6 (second half)


54

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Chi-Square (table) value at 5% level and 1 d.f.=3.841

4. Discussion
All the calculated Chi-Square values are less than the table value at 5% level and 1 degree of freedom and hence are insignificant. So we shall accept the null hypothesis at 5% level, i.e. 95% confidence coefficient and consequently the multinomial model holds for each of the six questions. However, although the nature of the probability model is identical for each question, the probabilities of the responses, which are the essential parameters characterizing the model, are different for each question. The following table (table 8) is of interest where the function P() is the probability function, S.D. is the standard deviation:Table 8: Table of probabilities of yes, neutral and no for questions Q1-Q6

It is also remarkable that the Chi-Square values in the two halves are same for each question. This has happened due to very few classes and the pooling of classes to ensure no theoretical cell frequency is less than 5. Chi-Square is a continuous distribution and it cannot maintain its character of continuity if any theoretical cell frequency is small. After pooling we are left with two classes. Due to the linear restriction that the sum of observed frequencies should equal the sum of expected frequencies, we lose on degree of freedom. So our Chi-Square has finally, after pooling of classes, 2-1=1 degree of freedom only. For further literature on Chi-Square, see Gupta and Kapoor (2010).

5. Conclusion
We have successfully modeled the response to questionnaire by multinomial distribution. ChiSquare goodness of fit confirms the validity of the model for each question separately. However, the probabilities of different responses which are important parameters in the multinomial model vary from question to question. As a final comment, we sincerely feel that to help our students grow into efficient software developers, in addition to good analysts and managers; we must instill into them the best methodologies for software development where statistical model building will have a definite role to play. Many researchers have also observed that software engineering as a subject is of paramount importance for computer science students and to improve their participation we can add on values through various aspects including Statistics.

Appendix
55

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

Data of responses for questionnaire:Q1 1 1 1 2 1 1 1 1 3 1 1 1 3 1 1 1 1 1 3 1 1 1 Q2 1 3 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 3 2 2 1 Q3 1 1 1 1 1 2 1 1 1 1 2 1 1 3 1 3 3 1 1 1 1 1 Q4 3 3 3 3 3 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 Q5 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 1 2 2 1 2 1 Q6 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 3 1 1 1


56

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

3 3 3 3 1 1 1 3 1 1 1 1 1 1 1 3 3 1 1 3 1 3 1

1 2 1 1 3 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1

1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1

1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1

2 1 3 2 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 1 1 1

1 1 2 2 1 2 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2

57

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

1 1 1 1 1 1 1 1 3 1 1 1 3 3 1 1 1 1 1 1 1 1 1

2 1 1 1 1 1 2 1 1 1 2 1 1 1 3 1 2 1 2 2 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1

1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1

1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

58

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

1 3 1 1 1 3 1 1 3 1 1 1 1 1 3 3 3 1 1 1 1 1 1

1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1

1 1 1 1 2 1 1 1 2 1 1 1 1 3 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 2 1 1 1 1 2 1 1 1 3 1 1 1 1 1 1 2 1 1

1 1 1 1 2 1 1 1 1 1 1 2 1 1 3 2 1 1 1 1 1 1 1

59

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

1 1 3 3 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 3 3

1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1

3 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 1

2 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1

60

International Journal on Computational Sciences & Applications (IJCSA) Vo2, No.4, August 2012

1 1

1 1

1 1

1 1

1 1

1 1

References
[1] [2] [2] [3] [4] B. Klemens, Modeling with data: tools and techniques for scientific computing, Princeton University Press, Princeton, NJ, 2008. S. C. Gupta and V. K. Kapoor, Fundamentals of Mathematical Statistics, Sultan Chand and Sons, New Delhi, reprint 2010. T. Dyba and T. Dingsoyr, What do we know about Agile software development, IEEE Computer Society, 2009, pp 6-9. T. Dyba, R. Prikladnicki, K. Ronkko, C. Seaman, J. Sillito, Qualitative research in software engineering, Empir. Software Engg. 16, pp 425-429, Springer, 2011 D. Bailey, T. Conn, B. Hanks, and L. Werner, Can We Influence Students Attitudes About Inspections? Can We Measure a Change in Attitude? Proceedings of the 16th Conference on Software Engineering Education and Training (CSEET03), 2003. J. Eliasson, L. K. Westin, and M. Nordstrm, Investigating students' confidence in programming and problem solving, 36th ASEE/IEEE Frontiers in Education Conference, 2006 L. Layman, Changing Students Perceptions: An Analysis of the Supplementary Benefits of Collaborative Software Development, Proceedings of the 19th Conference on Software Engineering Education & Training (CSEET06), 2006. M. Dick, M. Postema and J. Miller, Improving Student Performance in Software Engineering Practice. Proceedings of the 14th Conference on Software Engineering Education and Training (CSEET.01), 2001. M. Staron, Using Experiments in Software Engineering as an Auxiliary Tool for Teaching A Qualitative Evaluation from the Perspective of Students Learning Process, 29th International Conference on Software Engineering (ICSE'07), 2007. P. Maher, Weaving Agile Software Development Techniques into a Traditional Computer Science Curriculum, Sixth International Conference on Information Technology: New Generations, IEEE Computer Society, pp 1687-1688, 2009. V. Bhattacherjee, M. S. Neogi and R. Mahanti, Software Development Patterns in University Setting: A Case Study, in: Proceedings of the national seminar on Recent Advances on Information Technology, February, ISM Dhanbad, pp 40-43, 2007 V. Bhattacherjee, M. S. Neogi and R. Mahanti, Software Development Patterns Of Students: An Experience, National Journal of System and Information Technology, June, Vol. 2 (1), pp. 15-21, ISSN: 0974-3308, 2009b N. Abbas, A. M. Gravell, G. B. Wills, Using Factor analysis to generate clusters of agile practices. IEEE Computer Society, pp 11-16, 2010. R. S. Pressman, 2005. Software Engineering A Practitioners Approach, Sixth Edition, Tata McGraw Hill, International Edition.

[5] [6]

[7]

[8]

[9]

[10]

[11]

[12] [13]

61

You might also like