Last Updated June 18, 1999
Some Basic Statistics on Style
-- A Brief Essay

     The aversion of most English teachers to statistics has both puzzled and disappointed me. From some of these teachers I have gotten the impression that they believe -- or at least want to believe -- that math (and thus statistics) is part of that other, "scientific" culture, and that they are not only members of the artistic, "humanitarian," culture, but that they also scorn the scientific (and mathematical) and are proud of that scorn. I'll never forget a posting on the NCTE-Talk listserver in which an English teacher, discussing standardized tests for teachers, boasted about her inability to calculate the area of a room.  She proudly defended her ignorance by claiming that she saw no reason for an English teacher having to learn that stuff. This is not the place for me to deal with her particular ignorance and stupidity (ignorance, for not knowing; stupidity for being proud of the ignorance). I simply want to suggest that her attitude, although perhaps extreme, is typical of too many English teachers. The real problem is that such ignorance also leaves one unaware of the effects of that ignorance.
     Thus, in the 1970's the English profession was carried away by statistical studies that claimed to prove that the teaching of grammar is ineffective, even harmful. If English teachers thought, as much as they claim to think; if English teachers read the studies themselves, instead of summaries of the studies; and if more English teachers had even a moderate understanding of math and statistics, then the statistical studies would not have overwhelmed the profession and would not have led to an NCTE resolution against the teaching of grammar. But overwhelmed and resolved they were. (For more on this, click here.) Another part of the problem, unfortunately, is that most English teachers were not taught, and do not know how to teach grammar. Thus the acclaim for the statistical studies may well have been an acclaim for anything which would relieve them of a frustration with which they did not know how to deal.
     The damage done by these studies goes well beyond anything imagined by the general public -- which still assumes that students are being taught grammar in school. Just this month (June 1999), I received an e-mail from a beleaguered teacher who was very happy to learn about the existence of ATEG. She was desperately seeking something that would support her teaching grammar, for, as she noted, within her school system, even within her English department, the teaching of grammar is frowned on. As editor of Syntax in the Schools (now the official newsletter of ATEG), I have received many such messages over the last fifteen years. Many English teachers are proud of their ignorance of mathematics, but they have bowed down to statistical studies which have severely damaged the teaching of English.
     If the problem were only one of general attitudes at the professional level, it would be bad, but not as bad as it actually is. It is worse because many English teachers make comments to students that are based on unfounded statistical assumptions. Many years ago, the problem of student reinforced my interest in the statistical analysis of sentences. It was around the time of the "oil crisis," and the student was a retired government employee who wanted to write a book about what he knew and believed about the crisis. He was taking an advanced essay course with me because, he said, he wanted to improve his writing ability so he could write his book. The class met once a week, and after three or four weeks, I began to talk to him after class. From everything that I could see, his writing was just fine -- he didn't need the course -- he should start writing his book. Over the course of several such meetings, I continually probed for the source of his belief that his writing skills were weak. I both showed him and assured him that he had an excellent sense of thesis, of organization, of topic sentences and paragraph structure. What did he see? What made him think that his writing was weak? What was undermining his confidence?
     Finally, after three or four weeks of such discussions, he noted that one of his teachers had told him that his sentences were too long. That was the problem. Well, this was my Advanced Essay course, not my grammar course. In my Advanced Essay course, I didn't discuss sentence length at all because I know that most high school graduates cannot identify main clauses and thus cannot make many meaningful calculations on their own writing. But as soon as he said this, we took two or three passages of his writing, counted the words, counted the main clauses, and calculated the words per main clause. He averaged twenty-one, which is respectably within the range (19 to 21) of most professional writers. Indeed, a small study that I had done of the writing of the researchers indicated that they averaged 26. There was absolutely nothing wrong with the length of his sentences. And I pointed out to him that this was not my subjective opinion -- it was based on the research of Hunt, O'Donnell, Loban, etc. From the time he ws in middle or high school, all through his professional career, and into his retirement, this poor gentleman had believed his teacher -- and believed that there was something wrong with his writing.
     Giving this man confidence in his writing was rewarding, but it was maddening that it had been stolen in the first place? And stolen by whom? An English teacher! Based on what? Probably on her own subjective sense of how long is too long. But we can use the work of Hunt, O'Donnell, and Loban to make some educated guesses about what really happened. These researchers have shown that the number of words per main clause (a better measure than sentence length, but close to it) NATURALLY increases with age (and thus grade level). But we need to remember that these studies are based on class averages -- some students within a class write, on average, longer main clauses; others, shorter. We also need to remember that not all of our teachers are as intelligent, or as educated, as we sometimes assume. Now why would a teacher tell a student that his sentences were too long? Probably because they seemed too long to her. But it is entirely possible, in this case even probable, that she had the writing ability of the average fourth grader (8.02 Words/MC according to Loban) whereas he was writing at the level of the average eighth grader (10.37 Words/MC). Or perhaps she was at the level of the average eighth grader (10.37), whereas he was with the tenth graders (11.79). What happened, in other words, is that she probably set up her lack of syntactic maturity and her ignorance as a standard for him, thereby crippling his writing for most of his life.
     If you think that the preceding supposition is improbable, talk to a few educated parents about the comments and corrections that their children's English teachers put on papers. You will hear many a tale of not just misspellings, but also of grammatical errors in the teachers' comments and of things marked wrong when there is nothing wrong with them. How many English teachers are still telling students not to begin a sentence with "but"? And how many teachers are still telling their students, entirely subjectively, that the students' sentences are "too long" or "too short"? And our educational system, of course, has conditioned students -- and their parents -- not to ask intelligent questions. Told that his or her sentences are "too long," what student -- or what parent -- will face the teacher and demand "too long based on what?" There is a reason behind the adamant opposition of most teachers to standardized testing. Testing of students would lead to testing of teachers -- and teachers would no longer be enabled to make the subjective, unfounded, often wrong comments and corrections that they now do.

     The KISS Approach offers solutions for at least parts of the problem. As I suggest in the KISS Curriculum, starting in about seventh grade, every year students should analyze -- syntactically and statistically -- at least one short passage of their own writing. The students can work in small groups to check their analysis and their counting. Each student's statistical results can be handed in, and a simple spreadsheet can be used to calculate class averages. These averages should then be reported to the class so that students can see for themselves where they stand in relation to the class in matters of sentence and main clause length and the frequency of constructions such as subordinate clauses. Such knowledge is power. With it, subjective statements by teachers become meaningless. My experience, moreover, has been that such projects not only interest students, but they also change students' attitudes about grammar. Within the context of the group work and class averages, the student who really is writing too many short simple sentences sees for himself that his sentences are way shorter and simpler than those of his classmates. Seeing this, any instruction which clearly helps him get closer to the class average is no longer a meaningless, boring English grammar exercise. Whereas he had hated such instruction, he now seeks it.

     Because words per main clause and subordinate clauses per main clause are two of the basic measurements of syntactic maturity, I have included this information for each exercise in the Answer Keys for Level Three. To indicate the difference between words per main clause and words per sentence, I have also included statistics for sentences:
 

Some Basic Statistics on Style
# of Sentences: 11 Words per: 16.7
# of Main Clauses: 10 Words per 15.0
# of Sub Clauses: 7 SC / MC .7

The difference between words per main clause and words per sentence simply results from the fact that some sentences consist of more than one main clause. Historically, it is interesting to note that when educators were looking for some way to measure sentence maturity, many researchers attempted to count words per sentence. But the third and fourth graders messed everything up. They write very long sentences consisting of several main clauses compounded with "and." Thus, instead of a clear upward trend in their graphs, these researchers ended up with graphs that showed sharp decreases from third to fourth to fifth grades and then a slow upward trend.
     Hunt's major discovery was that if he counted words per main clause, instead of words per sentence, he eliminated the downward trend and ended up with graphs that suggested relatively regular growth. Unfortunately, he used the term "T-Unit" (for minimally Terminable Unit). Had he used the term "main clause," which is what a T-Unit is, many more teachers would probably have been interested in his work. To my knowledge, Hunt showed THAT words per main clause is a fundamental valid measurement of syntactic maturity, but neither he nor O'Donnel or Loban ever directly addressed the question of WHY.
     The KISS Psycholinguistic Model of How the Brain Processes Language provides an answer -- all the words in a main clause are chunked together in short-term memory. At the end of a main clause -- for both the reader AND THE WRITER -- the main clause in  STM is dumped to long-term memory, and STM is cleared for the next main clause. Syntactic maturity therefore basically measures the ability to handle an increasing number of words simultaneously in STM. For more on statistical analysis of syntax and on natural syntactic development, please visit my research area, "Cobweb Corner." Please also remember that both words per main clause and subordinate clauses per main clause are BASIC measures of syntactic maturity. Some constructions which definitely reflect maturity (such as gerundives -- a topic of Level Four, and appositives -- a topic of Level Five) DECREASE the number of words per main clause and the number of subordinate clauses per main clause.

     I would love to have help with statistical research on syntactic maturity -- and on style. I realize, of course, that not everyone is interested in grammar. And I realize that, thanks to our current educational system, many English teachers cannot identify clauses and thus, obviously, they can't count them. But what bothers me most, perhaps, is the aversion to statistics. English, is is implied, is a performance art, not a mathematical science. Well, baseball is also a performance art. But listen to any baseball game and what do you hear? Aaahh's, ooohh's, boo's and statistics. Is there any baseball fan who would sincerely say that we should eliminate the statistics? Is there any baseball fan who is really not interested in who has the most home runs, the highest batting average, or the lowest ERA? Is there anyone who would say that such knowledge is harmful, or that it detracts from the game?
     A few teachers who are familiar with my work object to the statistics because they see them as competitive. Well, that's true and it's not true. Unlike the situation in sports, the students' objective in syntax is not to write, on average, more words and/or subordinate clauses per main clause than anyone else does. Writers on the high end of the scale are in danger of writing clauses that are so long and complex that most people will simply have trouble reading them. Syntactic statistics are, in other words, more normative than competitive. They are more like medical statistics. I want to keep my blood pressure in the normal range. If it is too high or too low, I want to know so that I can do something about it.  For students at the lower end of the syntactic maturity scale, that does make the situation somewhat competitive, but it is a competitiveness based on a natural, self-generated, and in this case desirable wish to get closer to the norm. And once students have that wish, they are much more likely to pay attention to what we are trying to teach them. 
      Don't students have the right to know? Much has been made in the English profession during the last couple of decades about students' right to their own language. It's an admirable right -- but it's meaningless, or even harmful, if it continues to be based, as it has been, on ignorance.


This border is a reproduction of

Leonardo da Vinci's 
(1452-1519)
Study of proportions 
from Vitruvius's De Architectura, Pen and ink, Accademia, Venice 
from Mark Harden's Artchive http://artchive.com/core.html

Click here for the directory of my backgrounds based on art.

[for educational use only]