You are on page 1of 7

CS2110 Assignment 1 Basic Java and Object-Oriented Programming,

Summer 2008
Due Friday June 27, 2008, 10:00PM

0 Introduction
0.1 Goals
This assignment will help you refresh your knowledge of Java and/or learn Javas syntax. It is roughly a
comprehensive review of CS100. In addition it will help you practice using some concepts we will discuss in
the first few days of class, such as recursion.

0.2 Instructions
Be sure to read the entire assignment before answering the questions! Do all the tasks in the following
sections. In later assignments, we will not necessarily repeat the blitz of instructions in Section 0.

0.3 No Partners! (this time) and Academic Integrity


In general, assignments will allow you to work a partner. However, because we need to ensure that your Java
and programming skills suffice, you must work alone on this assignment. You must abide by the academic
integrity standards of CS 2110, the Computer Science department, and Cornell University. Links to these
academic integrity policies are provided on the course website.

0.4 Submission
Follow the Submission Requirements on the course website. The last section of this assignment summarizes
the files that you need to submit on CMS.

0.5 Grading
All code that you submit must run without warnings or errors. Otherwise, you will receive a zero for that
portion of the assignment. The assignment will be graded out of 100 points and is worth 10% of your course
grade.

0.6 Advice
Start early! For a Java refresher and help with using its command-line environment, refer to the bootcamp
notes (available on the website). You may also read the first three chapters of the textbook for a Java review.
Remember to use good programming style. Review the style guidelines set out on the course website.
In particular, use meaningful variable and method names. Add comments to clarify your code, and include
a comment at the top of the source file that includes your name, NetId, the date, and a brief description
of your solution. Correct solutions that are short, simple, and efficient are superior to submissions that are
unnecessarily long, complex, or slow.

1 Communication
Purpose: To get connected
Points: 2

Review the information about Usenet on the course website. In your favorite newsreader, subscribe to
cornell.class.cs2110 and cornell.class.cs2110.talk. Post a message in cornell.class.cs2110.talk.

1/7
CS2110 Assignment 1, Summer 2008 100 points 2/7

In the subject line, enter only A1:<your Cornell NetId> . In the message body write something witty. (If
youre not in a witty mood, writing Hello world! will suffice.)
Recall that the newsgroup cornell.class.cs2110.talk is meant for test posts and casual conversation
on topics related to 2110. The newsgroup cornell.class.cs2110 is the actual course newsgroup that we
reserve for technical questions regarding course material.
You may ignore this until the newsgroup starts working again. When it works it will be announced in
class and on the website.

2 Course website and policies


Purpose: To become familiar with the class website and course policies
Points: 8

Review the entire course website and answer the following questions.

1. How long do you think it will take to do this assignment? How long did it actually take?
2. What version of Java should you use?
3. What are the dates and times of the CS2110 exams?
4. What is the procedure to submit a regrade for an assignment?
5. Suppose Alice, a CS2110 student, writes some of the solutions for a particular assignment with a
partner named Bob. Then, the two students divorce because of irreconcilable differences and wish to
work alone or perhaps with other partners for the remainder of the current assignment. According
to the 2110 rules of academic integrity on the course website, may Alice and Bob divorce? May they
divorce for the remaining assignments?
6. Suppose that about two days before an assignment deadline, another student who is not your partner
pleads with you to share some of your code for the assignment. According to the course website, under
what conditions may you show your groups code to him or her?
7. Where is the CS2110 consulting office?
8. Where can copies of the optional textbooks be found?

Submit your answers in a plain text file called problem2.txt. If we cannot open the file with a text editor,
you will receive no credit. If you do not know what plain text is, refer to the submission format requirements
on the website.

3 Computing grades
Purpose: Basic Java syntax, keyboard input, error checking and exceptions, course grading policies
Points: 40

Imagine the future. Specifically, six weeks into the future. Youve just completed all of the CS 2110
coursework and youre anxious to know your overall course score and letter grade. Write a program called
GradeCalc that asks for your scores on the individual 2110 coursework items and outputs your overall course
score. The program should also output the minimum letter grade that you are guaranteed by that course
score.
The program must be user-friendly. It should ask the user to input scores for assignments, quizzes, and
exams. Part of this problem is to get practice writing programs that are robust to user input errors, so you
should check user input for errors, such as a score that is too high or too low (one of many possible input
errors). If an error is detected, the program should print a helpful message and allow the user to try again.
CS2110 Assignment 1, Summer 2008 100 points 3/7

You should begin this assignment by reviewing the course grading policies set out on the website. For
the purpose of this assignment, assume that six quizzes are given but that the lowest quiz score is dropped.
You can assume that scores on individual coursework items are integral (i.e. no fractional points are given).
Also assume that extra credit is not given. For homework assignments, the program should ask whether the
homework was submitted on-time, during the late submission period, or after the late submission period,
and adjust the grade accordingly.
Here is a sample of what an interactive session might look like:

Enter grade for homework #1: 85


Was homework #1 submitted on time? y
Enter grade for homework #2: 90
Was homework #2 submitted on time? y
Enter grade for homework #3: 100
Was homework #3 submitted on time? n
Was homework #3 submitted within 24 hours of the deadline? y
Enter grade for homework #4: 89
Was homework #4 submitted on time? y
Enter grade for homework #5: 90
Was homework #5 submitted on time? n
Was homework #5 submitted within 24 hours of the deadline? n
Enter grade for prelim: 950
Not a legal entry: please enter an integer between 0 and 100
Enter grade for prelim: 95
Enter grade for final: 99
Enter grade for quiz #1: 10
Enter grade for quiz #2: 6
Enter grade for quiz #3: 0
Enter grade for quiz #4: 9
Enter grade for quiz #5: 8
Enter grade for quiz #6: 9

-----------------
Your final course score: 81.8
Your letter grade will be at least: B

4 Farmer Munsons Soybeans


Purpose: Recursion, file input, command-line parameters, permutation tests
Points: 50

Due to the increasing global demand for soybeans this year, Farmer Munson decided to plant three fields
of soybeans. Unfortunately, the cost of chemical fertilizers is twice as much this year as last year due to the
increasing price of natural gas. Ever resourceful, Farmer Munson decided to try using some organic fertilizer
being sold by his friend Old MacDonald. (The fertilizer comes from MacDonalds chickens; the farmers have
a colorful name for it that we wont repeat here.)
Farmer Munson planted 1 field using chicken fertilizer and 1 field using chemical fertilizer (Munson
found some leftovers at the back of his storage shed). After hearing about this natural experiment, Professor
Cornellia from Cornells Agriculture School convinced Munson to plant the last field without fertilizer to get
an idea of crop yield without fertilizer. In return, she agreed to pay Munson the profit difference between
field 3 and field 1.
Last week Cornellia went to the fields and measured some heights of the fledgling soybean plants in each
of the fields to see what, if any, growth differences there were. Here are a few of the measurements:

chemical chicken nofert


CS2110 Assignment 1, Summer 2008 100 points 4/7

6 6 3
12 7 6
11 14 3
9

Is the chemical fertilizer better than the organic fertilizer? How strong is the evidence that fertilizer (of
either kind) produces taller soybean plants (vs plants without fertilizer)?
Cornellia wants to support her conclusions with tests of statistical significance. Unfortunately, she does
not know anything about how soybean heights are distributed, and she only measured a few plants (it started
to rain). She asks you to help her test her hypotheses using a permutation test.
For this problem you will write a program that tests a hypothesis using a permutation test (details below).
To help you, we have prepared an incomplete solution which you should use as the basis for your work. You
can download the incomplete solution through CMS. There are detailed comments in these files that will
hopefully clarify anything that is confusing.
Your task is to add code to the incomplete solution in order to make it work properly. The instructions in
the next two sections will help you do this. You may add code (including new methods and/or new member
variables), but you must not change the public interface of any classes. Exception: you may add a main()
method to a class to do unit testing of the class independent of the rest of the program.

4.1 Background: Permutation Tests for Hypothesis Testing


Often in science we want to know if changing a condition (e.g. fertilizer type) produces a change in a
quantity we care about (e.g. height of soybean plants). Naturally, we measure the quantity under the
different conditions. Typically there are some random effects that influence our measurements under all
conditions. As a result, there is always a possibility that differences in the quantity are due solely to random
variations, and not due to the changing condition. Basically, perceived differences can happen just because
we were unlucky in our measurement samples.
Before concluding, for example, that chemical fertilizer produces taller plants than organic fertilizer, we
need to rule out the possibillity that fertilizer type has no effect (called the null hypothesis and noted H0 ).
This involves a hypothesis test, which proceeds kind of like a proof-by-contradiction. Assume that the null
hypothesis is true. Under this assumption, compute the probability of observing the collected data. If the
probability is (very) low, then we can reject H0 with some confidence and conclude that the alternative
hypothesis (e.g. chemical > organic) is most likely true.
One kind of hypothesis test is a permutation test. The details for the permutation test are a little
long, but the basic idea is fairly simple. We are going to enumerate all possible permutations of assigning
measurements to one sample or another. Given this list, we will apply some criteria to each permutation to
see if the criteria is true or not. How often the criteria is true is used to measure the statistical significance
of a hypothesis.
Let s1 be measurements in sample 1, and s2 be measurements in sample 2 (e.g. chemical vs. organic).
Also let n1 and n2 be the number of measurements in each sample. Under the null hypothesis, s1 and s2
have the same distribution, and the assignment of measurements to each group is random.
Suppose we summarize the measurements in each group by taking the mean (average) of each group.
We can then look at the difference in the means to see how different the groups are. For example we can
compute
diff(chemical, organic) = mean(chemical) - mean(organic)
A large positive number would suggest that chemical fertilizer is bigger than organic. For this problem, the
difference of two means is the test statistic we will use for the hypothesis test.
Let obsDiff = diff(s1, s2) where s1 and s2 are the observed data samples. Assuming the null hy-
pothesis is true, how likely are we to get such a large obsDiff? Is obsDiff so large and so unlikely that
we should reject the null hypothesis? With access to fast computers, we can answer this directly when n is
small. We will just generate all the possible ways the measurements can be split into groups of size n1 and
n2 . For each of these random assignments (call the random groups t1 and t2 ), we can compute randDiff =
diff(t1, t2) and compare randDiff with obsDiff. The fraction of the time that randDiff >= obsDiff
CS2110 Assignment 1, Summer 2008 100 points 5/7

is the probability of seeing such a large test statistic value when the group assignments are random. This
probability is often called the p-value.
If the p-value is small, we can reject the null hypothesis as being very unlikely. For example, if the p-value
is 0.05, then the difference is real at the 95% confidence level.

4.2 Implementing Permutation Tests


The partial solution given to you has two classes: PermTest and Labeling. Class PermTest (in PermTest.java)
is responsible for implementing the permutation test logic, reading the measurements from files, and provid-
ing a command line interface for running permutation tests. Labeling (in Labeling.java) is a helper class
that represents which group a measurement is assigned to. The rest of this section highlights the parts you
need to implement.

4.2.1 Tracking Sample Members


There will be lots of shuffling measurements into different samples as part of the permutation test. To make
this easier, implement the methods in Labeling found in the file Labeling.java. These objects track which
items (aka measurements) are assigned to which samples, and which items are not yet assigned to any group.
It might be helpful to add a main() method to make sure your implementation works.

4.2.2 Reading in the Measurement Samples


Next, implement reading in the measurement samples. More precisely, implement PermTest.readSample()
and add code in PermTest.main() to call readSample. Chapter 2 in the course textbook might be helpful
if you need a reference for how to do File I/O.

4.2.3 The Test Statistic


We need some statistic, or summary, of how a sample t1 differs from the other sample t2 . For this assignment,
we will summarize each sample by its mean (average), and look at mean(t1 ) mean(t2 ) to quantify the
differences between the samples. Implement PermTest.sampleDifference() to meet your needs. This was
called diff() above.

4.2.4 The Permutation Test


We recommend you start by implementing PermTest.nChooseR(), since it can serve as a check that you are
generating the right number of random assignments. For this assignment we require that you implement the
function using recursion. There is helpful information about this in Wednesdays slides.
Finally, implement PermTest.computeSignificance() and PermTest.recListCombinations(). We
have left pseudo-code for computeSignificance() in the starter file so that the details of the hypoth-
esis test do not distract you too much from focusing on writing recListCombinations(). Note that
recListCombinations() must be implemented using recursion.

4.3 Sample Data


You can find Cornellias data samples in the files chemical.txt, chicken.txt, and nofert.txt. There are also
smaller samples (collected by a lazy grad student) in chemical.small.txt and chicken.small.txt. We strongly
recommend you use these for testing your code while you are in development. Using the -verbose option
on PermTest may also be useful.

4.4 Example Output


Here is the results of a permutation test of whether the chemical fertilizer is giving better results than the
chicken fertilizer (using the small samples):
CS2110 Assignment 1, Summer 2008 100 points 6/7

\$java -cp . PermTest -verbose chemical.small.txt chicken.small.txt


No. | 6.0 12.0 11.0 6.0 7.0 14.0 | diff(s1,s2)
--------------------------------------------
0 | 1 1 1 2 2 2 | 0.6666666666666661
1 | 1 1 2 1 2 2 | -2.666666666666666
2 | 1 1 2 2 1 2 | -2.0
3 | 1 1 2 2 2 1 | 2.666666666666666
4 | 1 2 1 1 2 2 | -3.333333333333333
5 | 1 2 1 2 1 2 | -2.666666666666666
6 | 1 2 1 2 2 1 | 2.0
7 | 1 2 2 1 1 2 | -6.000000000000001
8 | 1 2 2 1 2 1 | -1.333333333333334
9 | 1 2 2 2 1 1 | -0.6666666666666661
10 | 2 1 1 1 2 2 | 0.6666666666666661
11 | 2 1 1 2 1 2 | 1.333333333333334
12 | 2 1 1 2 2 1 | 6.000000000000001
13 | 2 1 2 1 1 2 | -2.0
14 | 2 1 2 1 2 1 | 2.666666666666666
15 | 2 1 2 2 1 1 | 3.333333333333333
16 | 2 2 1 1 1 2 | -2.666666666666666
17 | 2 2 1 1 2 1 | 2.0
18 | 2 2 1 2 1 1 | 2.666666666666666
19 | 2 2 2 1 1 1 | -0.6666666666666661
--------------------------------------------
Randomly assigning data to samples shows >= sample differences than
observed samples in 10/20 random permutations.
--------------------------------------------
pvalue: 0.5

5 What to submit
Submit the following files:
1. problem1.txt
2. GradeCalc.java
3. Labeling.java
4. PermTest.java

Extra Credit, Extra Fun!


If you have finished everything above and are craving for some more programming, here are a couple extra
credit items you can try your hand at. We will award extra credit based on how well they are done. However,
if the answers to the main part of the homework are not well done, we will not award very many extra credit
points at all.
Make sure you keep a pristine backup copy of the solutions before you start.

5.1 Non-Recursive Permutation Tests


Write a variant of PermTest that does not use recursion. Hint: you probably need a data structure like a
list or stack, that we will cover in class shortly.
CS2110 Assignment 1, Summer 2008 100 points 7/7

5.2 Alternative Permutation Test Statistics


In question 4 we used the difference between two sample means as a summary of how much better sample 1
was than sample 2. What other reasonable summary statistics can you think of? Justify your alternative(s)
and extend PermTest to use the alternative(s) based on command line options.

5.3 Computing Grade Ranges


Part way through the summer you have half your grades and are wondering what is the range of grades
you could receive, based on worst and best performance on the rest of the assignments and exams. Write
a variant of GradeCalc that computes the range of the users final grade. To make it interesting, the user
should be able to enter a range of values for the unknown future evaluationsoften you have a rough idea
of how well you will do in the future based on past grades, comfortableness with the material, etc.

You might also like