LX700: CHILDES Lab

CHILDES lab

This lab is due October 14.

To do this lab you will need access to the CHILDES data and the CLAN analysis program. Both can be downloaded freely. CLAN is available for Mac and Windows computers.

On this page, you will find information on:

Getting CLAN.
Getting Nina's transcripts from CHILDES.
Starting CLAN.
Entering commands in CLAN.
Your lab assignment.
Notes on the combo command.

(The deisgn of this lab assignment is due to Martha McGinnis, U. Calgary)

To get CLAN:

Go to the CHILDES page.
Click on "CLAN programs".
Download the appropriate version for your computer and install it.
- In previous years I have provided more detailed instructions, but they are getting somewhat out of date. If you have trouble getting it installed, you can email me for help. If you don't mind instructions that are slightly out of date, you may also want to look at the CHILDES class demo page from the Spring 2002 LX 700 course, or the CHILDES lab from the Fall 2002 LX 865 course.

To get Nina's transcripts:

Go to the CHILDES page.
Click on "CHILDES database"
Click on "English-USA" (not the "tagged" one, the regular one)
Download "suppes.zip" and un-zip it so that you have a "suppes" folder.
The "suppes" folder should contain 52 files, starting with "nina01.cha".

Starting CLAN:

To start CLAN on the Mac, double-click on the CLAN icon. On Windows, look under Programs (assuming you chose to install it there).
The main CLAN control window has four buttons on it, each followed by a directory name.
- "Working" is where it looks for your transcript files. Click "Working", then find and select the "suppes" folder with the Nina files in it.
- "Output" is where the files CLAN produces will be saved. Click "Output" and choose a folder for your saved files (or the Desktop).
- "Lib" and "Mor lib" you should just leave alone. By default, they should be the folder in which the CLAN application resides.

Entering commands in CLAN:

Commands are typed into the box at the bottom of the window with the four buttons.
The different CLAN commands can be viewed by clicking on the CLAN button above the box. The only commands we will use in this lab are mlu and combo. You can just type these in, or you can select them from the list that comes up when you press the CLAN button.
An example command to find the MLU of nina01.cha would be:
```
mlu +t*CHI nina* > mlu-nina.txt
```
The way to read a CLAN command is like this:
- mlu: this is the command. We will also use combo.
- +t*CHI: this restricts the search to just the child utterances.
- nina*: this looks in all files that start with "nina" (so, nina01.cha, nina02.cha, ...)
- > mlu-nina.txt: saves the result in a file called mlu-nina.txt.
You may want to refer to the notes on combo at the end of this page as well.
NOTE! THE SPACE BAR WILL HURT YOU. CLAN is very particular about where spaces are in the commands you enter. You cannot put a space between +t and *CHI, or between +w and 2, or between +s and @searchfile.txt. If you do, combo will not work. Put spaces between options (as shown above) but no spaces within options. You also cannot include spaces in either your output filename or your input filename. It is also wise to avoid punctuation marks in your filenames, particularly : (colon), / (forward slash), \ (backward slash), and " (quotation mark). Even though both Mac and Windows computers allow you to create files like "Nina's early pronouns", CLAN won't know how to deal with them. It is also a good idea to end your filename with ".txt"; this will help ensure that your computer knows what kind of file it is.

Your assignment:

The lab assignment comes in six parts. Where things are hihglighted in RED, these are things you should be handing in.

Use CLAN to determine MLU.
Record Nina's age and MLU for each file.
Use CLAN to determine word frequencies.
Use CLAN to look at subject drop in a small sample of two files.
Use CLAN to study Nina's use of subject drop in wh-questions.
Discuss the comparison with Vallian's (1991) results.

Part 1: Use CLAN to determine MLU.

Use the mlu command to determine the MLU for Nina's transcripts.

You can do this with the following CLAN command, which will save the results in a file called "mlu-nina.txt" in the Output directory.

mlu +t*CHI nina* > mlu-nina.txt

Part 2: Record Nina's age and MLU for each file 01-19.

Open the file "mlu-nina.txt" and print it out. Printing at least 2-up is recommended, there's a lot of wasted space.

Open each transcript file from 01 through 19. (Note: there is no file nina08.cha.)

At the top of each file, Nina's age in that file is recorded. Write down Nina's age for the transcript next to the MLU for that transcript on your printout.

Part 3: Use CLAN to determine word frequencies

For two representative samples, we will use CLAN to determine the frequency with which each word in the transcript appears. To do this, we use the freq command. It works very much like the mlu command described above. We will run freq on nina10.cha and nina19.cha, and you can use the following commands to do this.

freq +t*CHI nina10.cha > freq-nina10.txt
freq +t*CHI nina19.cha > freq-nina19.txt

After having done this, you will have two lists of words and numbers (one from file 10, one from file 19). We will look at each, and pick a regular verb that occurs the most often from each file.

I found that eat and see seemed to be equally popular verbs in the nina10.cha file. Somewhat arbitrarily, we'll look at eat (see is complicated by the fact that it often occurs as "See?", which properly lacks a subject). I discounted have because it can be an auxiliary (and auxiliaries behave differently), also an unnecessary complication for what we are trying to do.

In the nina19.cha file, I picked get as the verb to look at. It's a common verb, not as popular as irregular go, but go is involved in some auxiliary uses like have. Want would be an reasonable verb to pick as well, but it isn't even as interesting to look at as get.

Part 4: Use CLAN to look at subject drop in a small sample of two files

Part 4a. Search the transcripts for the examples.

Having picked a common verb from each file, what we're going to do is look at each time the verb is used in the transcript and count how often it appears with a subject.

To do this (you may want to look at the combo notes), use the following CLAN commands.

combo +t*CHI -w2 +s"eat*" nina10.cha > selected-nina10.txt
combo +t*CHI -w2 +s"get*" nina19.cha > selected-nina19.txt

Make sure you know why it does what it does.

This will give you two files (selected-nina10.txt and selected-nina-19.txt), which contain the child utterances containing the verbs you've picked and the two lines preceding each.

Part 4b. Count up the totals.

Now, go through each example and decide which of the following categories it falls under. Be sure to read the "exclusion" criteria carefully.

X. Excluded. The utterance is (a) a repetition of an immediately preceding utterance (either by the child or the adults), (b) incomprehensible, (c) part of a rote-learned expression (e.g., "...how I wonder what you are"), (d) an imperative.
O. Overt subject. The verb has an overt subject.
N. Null subject. The verb should have had a subject but the subject is missing.
F. Fragment. These look a lot like null subjects, but if a child answers a question like "what are you doing?" with "Eating sandwiches", it isn't accurate to call that a null subject utterance. However, in response to "What were the monkeys eating?", "Eating a balloon" should count as a null subject (not as a fragment), since this is not a well-formed fragment in adult speech.

Part 4c. Describe what you found

Create a 2 x 3 table of results (2 rows and 3 columns) like the one below. Fill in the overt and null subject numbers for each file. In the third column, add together the total number of overt and null subjects (the sum of the first two columns), and then use divide the number of overt subjects by the result (and then multiply by 100).

	null subjects	overt subject	Percentage with overt subject
nina10.cha; eat	N for nina10.cha	O for nina10.cha	100 O/(N+O)*
nina19.cha; get	N for nina19.cha	O for nina19.cha	100 O/(N+O)*

Describe your results. Does the percentage of dropped subjects decrease as Nina gets older?

Part 5: Use CLAN to study Nina's use of subject drop in wh-questions

Search Nina's transcripts 01 through 19 for occurrences of the following wh-words: who, what, where, when, how, why, whose, which. You should create two output files, one for transcripts 01 through 09, and one for transcripts 10 through 19.

I'll leave you on your own to figure out how to do this. Consult the combo notes on the bottom of this page for some tips. It might be easier to count things if you print them out, but you might consider printing 2-up (or even 4-up) to save paper.

Also: be sure your search will find not only what but what's and what'll and so forth. If you don't, you'll find that you have almost no instances in the first output file. There is an easy way to do this.

If Nina is repeating something she or someone else just said, don't count that utterance.

Go through your two output files in detail. For each output file, tally up and record how many utterances fall into each of the following four classes:

A. impossible to tell whether wh-word is the subject or not (e.g. one-word utterance)
B. wh-word is the subject
C. wh-word is not the subject, and the subject is dropped
D. wh-word is not the subject, and the subject is overt

Create a 2 x 3 table of results (2 rows and 3 columns) like the one below. Let the first row represent Nina's early transcripts (01-09) and the second row represent her later transcripts (10-19). This works just like the table from before. Let the first column represent the number of utterances in class C for each set of transcripts, and the second column represent the number of utterances in class D for each set of transcripts.

	non-subject wh-word, null subject	non-subject wh-word, overt subject	Percentage of wh-questions with overt subject
Early transcripts (01-09)
Later transcripts (10-19)

For the third column of your table, calculate the percentage of these (non-subject wh-word) utterances that have a dropped subject, by adding the class C and class D amounts for each set of transcripts together, then dividing the class C amount by the result and multiplying by 100 (that is, 100 * C / (C+D)). Put the resulting percentage of dropped subjects for each set of transcripts in the third column.

Describe your results. Does the percentage of dropped subjects decrease as Nina gets older?

Part 6: Discuss the comparison with Valian's (1991) results.

Consider the tables below, from O'Grady (1997), based on data from Valian (1991).

Valian (1991) reports on percentages of dropped subjects in general, not just in wh-questions. Describe first how your results on subject drop for eat and get (in part 4) compare with Valian's results (shown below). Pay particular attention to group of children whose age and/or MLU matches the transcript you are looking at. Did you find more or less what Valian found?

Now, let's compare the overall subject drop rate with what you found to be Nina's rate of subject drop in wh-questions (in part 5). Are subjects dropped more often or less often in wh-questions? Does this comparison support the hypothesis that Topic Drop accounts for some cases of subject drop in child English? Respond, and explain your answer.

Table 1.	English-speaking children in Valian's study (based on Valian 1991:38)
Group	No. of children	Age range	MLU
I	5	1;10 - 2;2	1.53 - 1.99
II	5	2;3 - 2;8	2.24 - 2.76
III	8	2;3 - 2;6	3.07 - 3.72
IV	3	2;6 - 2;8	4.12 - 4.38

Table 2.	Proportion of utterances containing a subject (based on Valian 1991:44-45)
Group	Mean	Range
I	69%	55 - 82%
II	89%	84 - 94%
III	93%	87 - 99%
IV	95%	92 - 97%

O'Grady, William (1997). Syntactic Development. Chicago: University of Chicago Press.

Valian, Virginia (1991). Syntactic subjects in the early speech of American and Italian children. Cognition 35:105-22.

Comments on combo:

CLAN includes a relatively powerful searching tool called combo. I will outline a couple of points here, although you should probably refer to the CLAN manual for more information.

An example of the combo command is given below:

combo +t*CHI +w2 -w2 +s"what^my" nina* > whatmy.txt

This command says:

combo: the command
+t*CHI: restrict attention to the lines uttered by the child
+w2: show me the line you find and 2 lines after it.
-w2: show me the line you find and two lines before it.
+s"what^my": search for "what" followed directly by "my"
nina*: search all of the files in the Working directory that begin with "nina".
> whatmy.txt: Save the results in a file called "whatmy.txt" in the Output directory.

This will look for "what" immediately followed by "my" in any of the nina files, returning something like this:

*** File "Moxie:CLAN:suppes:nina19.cha": line 254.
  *CHI: I want to play with you here .
  *CHI: look what my got .
  *CHI: look (1)what (1)my got .
  *MOT: I see what you got .
  *MOT: what did you get ?

You can see that we used the "^" character in the search string. This character means "immediately followed by", so what we searched for was "what" immediately followed by "my". In these search strings there are several other special characters that you can use.

x^y
- Finds x immediately followed by y. x and y are full words
*
- Finds anything
_
- Finds any one character
x+y
- Finds x or y
!x
- Finds anything except x

You can combine these in various ways to get useful effects. A couple of common things you might use are:

x^*^y
- Finds x eventually followed by y (unlike with x^y, y does not need to immediately follow x). Literally this means, search for x, immediately followed by anything, immediately followed by y.
*ing
- Finds anything that ends in ing. For example, verbs like swimming. Of course it will also get some irrelevant things like thing, boring, etc.

Some example combo commands are:

```
combo +t*CHI +w2 -w2 +s"the^*^!grey^*^(dog+cat)" nina*
```
- This will search for "the" followed eventually ("^*^" means "followed by anything followed by...") by something other than "grey" ("!grey" means "not grey"), followed eventually by either "dog" or "cat" ("dog+cat" means either "dog" or "cat"). It will not find "the grey cat" but it will find "the black cat", "the big red dog", etc.

```
combo +t*CHI +w2 -w2 +s"my^*^*ing" nina*
```
- This will search for all instances of "my" followed eventually by something that ends in "ing".

Instead of typing in the thing you are searching for each time, you can also use a "search" file. This is a text file that contains the things you want to search for. An example search file might look like this (searching for first person pronouns).

I
I'*
me
me'*
my
my'*

If you save this file as "search-1pron.txt" in your Working directory, then you could do the search with the following combo command, where the @ tells combo to look in your file for the list of things to search for.

combo +t*CHI +w2 -w2 +s@search-1pron.txt nina* > pron1-nina.txt