You are write context free grammar cfg using the nltk toolkit
Natural Language Process: COMS4705
Kathy McKeown, Fall 2009
There are three parts to this homework. In the first part, you are to write a Context Free Grammar (CFG) using the NLTK toolkit. A readme for downloading and using the toolkit can be found at:
1 Context Free Grammar (60 points)
Write a context free grammar in NLTK to handle the (slightly modified) story of Where the Wild Things Are, a story by Maurice Sendak which is about to be released as a major motion picture. You will find the story in
You will be graded in part on whether your rules are syntactically justifiable. You should attempt to make your rules general where possible (i.e., don’t make new rules for each and every new string of words you see; a rule should ideally cover several phrases that you see in the input).
You should hand in a file containing your grammar, documented so that it describes why you selected the grammar rules that you did. For example, you may justify using a particular set of rules because they handle multiple constructions or because they follow a rule that you found in C. 12.
Choices about the particular rules used and the resulting parse are adequately justified. Points will be deducted for parse trees that do not capture a good structure of the language according to your justification (25 points)
2 Stanford Parser (20 points)
[10 points] Suppose you use the output of your parser on Where the Wild Things Are as a Treebank. Show how you would compute the probabilities, and what they are, for the rules for VP and the rules for NP.
[10 points] You will find that your CFG often generates multiple parses for a sentence. This can happen when it’s ambiguous for attachment of the PP or when the scope of conjunction is ambiguous (and for other constructions too). Consider the following cases: sentence 18 where “of all wild things” could modify “king” or “made”; sentence 6 where “to bed” could be an argument of the verb or it could be a modifier of the verb; sentence 10 where “for Max” could modify either “boat” or “tumbled by”. Sentence 12 where “to the land” could modify “day” or “sailed off” and “of the wild things” could modify “land,” “day,” or “sailed off”. You cannot use your parser output in this case to compute probabilities to do disambiguation. Why not? You will be given access to the Penn Treebank in our class account. Describe what you would need to count, how your rules would be formulated and how you would use them in a probabilistic lexicalized framework in order to do disambiguation in these cases (note: a rough description will do. You do not need to describe a probabilistic version of CKY). Show the counts for disambiguating whether “to a land” attaches to “sailed off” or “day”.


