Christopher Tribble
Associate Lecturer, King's College,
London University
Visiting research fellow, SLALS, Reading University
3. Keywords: extending the analysis
4. Interpreting keyword lists: an example
5. Conclusion - making a connection with the classroom
Pedagogically useful accounts of the linguistic contrast between written genres are of potential value to language teachers and apprentice writers (Bhatia V K 1993, Flowerdew J 1993, Stubbs M 1995, Tribble C 1997). In this paper, I shall show how I have been able to develop such accounts - using as an exemplar genre project proposals submitted to the European Union's PHARE Programme (the PP Corpus),
One approach to the elaboration of a linguistic account of linguistic variation across genres has been presented in Biber (1988) and Biber D & E Finegan (1989). They have used a multi-variate, multi-functional methodology to typify texts in terms of similarity / dissimilarity in relation to a reference corpus, identifying sets of linguistic features associated with six factors or "text dimensions" which can be used to account for the textual contrasts which arise from contrasting production conditions (e.g. spoken / written). They have also demonstrated how three of these factors can be used to differentiate between written texts which contrast along an informal-conversational / formal-written axis. These are:
Factor 1: Involved versus Informational Production - "a dimension marking high informational density and exact informational content versus affective, interactional and generalised content. Two separate communicative parameters seem to be involved here: (1) the primary purpose of the writer / speaker: informational versus interactive, affective and involved; and (2) the production circumstances: those circumstances characterised by careful editing possibilities, enabling precision in lexical choice and in integrated textual structure, versus circumstances dictated by real-time constraints, resulting in generalised lexical choice and a generally fragmented presentation of information." (Biber D 1988:108)
Factor 3: Explicit versus Situation Dependent Reference - "corresponds most closely to the distinction between endophoric and exophoric reference (Halliday and Hassan 1976)" (Biber D 1988:110). Typically, texts with high exophoric reference are associated with spoken, unplanned, or markedly informal discourse.
Factor 5: Abstract versus Non-abstract Information - ".... marks informational discourse that is abstract, technical and formal, versus other types of discourse...." (Biber D 1988:112-113)
Although Biber's and Biber & Finegan's approach has its limitations (Altenberg B 1989, Nakamura J 1993), it nevertheless offers a reasonably robust basis for the linguistic specification of a genre, and an analysis based on Biber 1988 can provide pedagogically useful insights into a new genre. Take, for example, this randomly selected text extract from the PP Corpus:
"Since August 1990 our staff have been working with colleagues in both Republics to assist them to establish and develop their Employment Services. We have designed and delivered a range of training courses (including counselling skills, management skills, labour market information gathering, assessment analysis and usage) to various groups of personnel. We have also delivered courses to Czech and Slovak staff trainers in training techniques and skills and participative training methods and provided them with modules of training to cascade to colleagues." (Text BK-11, PP Corpus)
My own study (Tribble C, forthcoming) of this corpus using the Biber framework offers an account of where the PP corpus is situated on an "oral / literate" stylistic cline in relation to other genres in Bibers's "enhanced" LOB Corpus (LOB+). Using an analysis of statistically prominent features in the PP Corpus, it has been possible to describe how the one set of texts is different from other, comparable written texts. These text features can be seen as either positive marked - i.e. there is a preponderance of a feature in the research corpus when compared with the reference corpus; or negative marked - i.e. a feature is strikingly absent in the research corpus. The most marked individual contrasts between PP and LOB+ centred on:
| POSITIVE |
NEGATIVE |
| attributive adjectives |
adverbs total |
| nominalisations |
third person pronouns |
| phrasal co-ordination |
private verbs |
| predictive modals |
Table 1 – prominent linguistic
features of PPs
This sort of analysis can be pedagogically useful, because it provides a teacher with a reasoned basis for drawing learners' attention to linguistic features specific to the target text. By analysing how the wording of the text has contributed to its peculiar style, and appreciating how these linguistic features contribute to the construction of the reader-writer relationship, and then learning how to exploit such textual features in their own writing, students are best positioned to begin to develop written styles appropriate to their own circumstances. Working with the example given above, and assuming that learners are already familiar with the main features which differentiate between texts in terms of their relative "spokenness / writtenness - informality / formality" (e.g. predominance of nouns over verbs, abstract or impersonal grammatical subjects-themes etc.), a teacher could ask students to undertake an analysis which might give the following additional information about, and exemplification of, three positively differentiating features of PPs:
phrasal co-ordination (marked thus below)
nominalisations - often used as attributive pre-modifiers of nouns (marked thus below)
attributive adjectives (marked thus below)
Since August 1990 our staff have been working with colleagues in both Republics to assist them to establish and develop their Employment Services. We have designed and delivered a range of training courses (including counselling skills, management skills, labour market information gathering, assessment analysis and usage) to various groups of personnel. We have also delivered courses to Czech and Slovak staff trainers in training techniques and skills and participative training methods and provided them with modules of training to cascade to colleagues.
From this analysis, combined with the analysis of comparable passages from across the genre, learners would be able to comment e.g. that:
there is a marked degree of phrasal co-ordination in the passage, and that this is associated with a rhetorically motivated emphasis of the competence and experience of the bidding organisation
nominalisations figure largely in the text. In this extract they are predominantly in noun + noun combinations, and their use creates associations between the authoritative, objective and economic discourse of science and administration and the language of the proposal
although attributive adjectives are not a marked feature of this particular passage, the use of participative is interesting as it is a buzzword which would not be used outside training and development cultures of a certain politically correct colour
A pedagogic procedure for writing instruction which focuses on these kinds of linguistic features in the target texts can move from the identification of the features, through to an account of the impact that they have on the reader, and thence to practical exercises in text transformation and text editing (working with the texts of other learners or their own texts). Such an approach can have a major impact on apprentice writers' capacity to identify salient stylistic features of texts in which they are interested and to address inadequacies in their own written performance (Flowerdew J, 1993; Tribble C, 1997).
However, such a stylistic analysis can only take students so far. The strength of a study based on the syntactic and grammatical categories used in Biber 1988 is that it helps students to become aware of the meaning potentials available to the expert writer in their field. In doing so, it can also give them opportunities to develop a capacity to offer informed, critical reviews of their own production and that of colleagues. What the Biber framework will not do is to give students a direct insight into the ways in which expert writers draw on (often highly patterned and conventional) lexical resources when they are developing texts for specific content domains. Additionaly, from the point of view of language teachers, the Biber approach has several practical disadvantages. The first, and most important, is that it requires a POS marked up version of the research corpus, and such a resource is not generally available to classroom teachers. The second is that (despite the published algorithms) the Biber study is very difficult to reproduce, and, therefore, to use reliably in new studies. One reason for this is that the original text analysis software and markup scheme are no longer available, another is that the original POS coding markup is not as comprehensive or reliable as contemporary corpus POS markup systems (e.g. CLAWS7 tag set).
However, despite these limitations, Biber's 1988 results can be used as a starting point for new studies in combination with other tools and conceptual frameworks - notably Hoey's recent work on semantic prosody and lexical analysis (Hoey M 1997a, 1997b), Scott's notion of keywords (Scott 1997a, Scott 1997b), and Mike Scott's software suite WordSmith Tools (Scott 1996). In the rest of this paper I shall show how, building on the linguistic insights offered by Biber and Finegan 1989, it is possible to identify criterially significant differences between an un-POS marked corpus of examplars of a specific genre and a reference corpus. Although it is not comprehensive, such an account has pedagogic value, and also has the virtue of being something that can be undertaken by teachers and students with access to "entry" level PC technology. When taken alongside insights from Biber 1988, and later work by Biber and Finegan into linguistic variation between texts along a "spoken" to "written" cline, such an approach provides the basis for a pedagogically useful linguistic specification of new (and thereby difficult) texts..
Keywords: extending the analysis
Hoey has proposed a practical framework for the analysis of vocabulary in context (1997a, 1997b), offering the following set of questions as a basis for further work:
"There are a number of questions that we need to ask of any set of concordance lines. Many of them are questions which we are used to routinely asking, but there is still some value in articulating them:
What lexical patterns is the word part of?
Does the word regularly associate with particular other meanings?
What structure(s) does it appear in?
Is there any correlation between the word's uses / meanings and the structures in which it participates?
Is the word associated with (any positions in any) textual organisation?" (Hoey M 1997a:1)
These questions constitute the startting point for a comprehensive complementary analytic framework to go along-side Biber 1988. They are, however, only a starting point as, although they provide a basis for the study of words intexts in general, they do not provide a rational means for deciding which words to study. What we now require is a way of taking such a first step.
I have outlined elsewhere a simple pedagogic methodology for exploiting electronic texts (Tribble C & G Jones 1997:36), proposing that the most effective starting point for understanding the overall orientation of a text or text collection is a frequency sorted wordlist. Frequency sorted lists of this sort have long been tools for lexicographers and linguists - "Anyone studying a text is likely to need to know how often each different word-form occurs in it." (Sinclair J, 1991:30) - and they can provide insights into where a text is 'coming from'. However, while the frequency wordlists represents a major step forward in text analysis, it raises almost as many questions as it answers. Although the lexical items in a wordlist appear to provide a starting point for a study of the research corpus, many of the other words in the lists are much more difficult to come to grips with, even though they may have a significant contribution to make to our understanding of a text. A frequency sorted list cannot provide a way of identifying the words which "matter" in the texts we are studying, and we need a list of these words in order to start using Hoey's 'Five Questions' as a framework for further analysis (see PP frequency list in Table 2 below).
This is where Wordsmith tools comes in. By adopting a radically different approach to deciding which words might be revealing of a text's or text collection's orientation, Scott M R 1997a does provide a means for choosing the words to focus on in a genre we find interesting. Unlike Williams 1976 and Stubbs 1996, Scott starts from the position that texts are central categories for linguistic study. For Scott (Scott M R, 1997:234), keywords are key in relation to a whole text, and are identified by making comparisons between one text or collection of texts, and other, larger text collections. Although taking the text as a central category for analysis presents problems if one wishes to use any of the large, publicly available corpora, which for reasons of copyright or principle are usually composed of text fragments (Sinclair 1987; Garside R, Leech G & G Sampson,1987), it has immediate and significant advantages for anyone with an interest in genres and the whole area of language in social context. By developing the Keyword program in WordSmith Tools (Scott M R, 1996), Scott has provided an adequate and robust means for identifying statistically prominent words in a text (or collection of texts).
WordSmith Tools finds keywords in the following way:
frequency sorted wordlists are generated for a reference corpus (a collection that is larger than the individual text or collection of texts which will be studied), and for the research text or texts.
each word in the research text is compared with its equivalent in the reference text and the program makes judgement as to whether or not there is a statistically significant difference between the frequencies of the word in the different corpora. The statistical test evaluates the difference between counts per token and total words in each text and can be based either on a chi-square test for outstandingness or on a log-likelihood procedure (Scott MR 1996, WordSmith Keywords Help File)
the wordlist for the research corpus is reordered in terms of the keyness of each word.
Unlike a frequency sorted wordlist in which the counts of each word type are absolute for a given text or set of texts, a WordSmith Tools keyword list is the result of a comparative process in which a specific corpus is analysed in relation to a larger reference corpus. Keyword lists contain two main categories:
positive keywords are those which are unusually frequent in the target corpus in comparison with the reference corpus
negative keywords are unusually infrequent in the target corpus.
An edited list of the top 30 positive keywords in PP is given in Table 2 - PP/BNC - PP Frequency, along with the top 30 words from the original PP frequency list:
| PP/BNC keywords top 30 |
PP frequency top 30 |
| X = non-lexical item |
* = lexical item |
| TRAINING |
the |
| PROJECT |
and |
| DEVELOPMENT |
of |
| MANAGEMENT |
in |
| PROGRAMME |
to |
| AND X 1 |
a |
| ASSISTANCE |
for |
| (CONSULTANT NAME) |
will |
| BUSINESS |
training 1* |
| CONSULTANTS |
be |
| EDUCATION |
with |
| ENVIRONMENTAL |
is |
| EU |
project 2* |
| EXPERIENCE |
development 3* |
| IMPLEMENTATION |
on |
| OF X 2 |
this |
| PHARE |
management 4* |
| PHASE |
programme 5* |
| PMU |
as |
| POLAND |
by |
| POLISH |
we |
| PROGRAMMES |
has |
| PROJECTS |
that |
| (CONSULTANT NAME) |
are |
| SKILLS |
team 6* |
| STAFF |
which |
| SUPPORT |
staff 7* |
| TEAM |
have |
| TECHNICAL |
business 8* |
| WILL X 3 |
an |
Conclusion
- making a connection with tTable 2 –
PP/BNC - PP Frequency
The first feature you notice when comparing a keyword list with a frequency list for the same data is that they are usually very different. Thus while PP Frequency only contains 8 lexical items, PP/BNC contains 27. Similarly, while the top 5 keywords are all nouns (three of them being nominalisations), the top five in PP Frequency are the definite article, a conjunction and prepositions.
An exciting implication of this capacity of WordSmith Tools to identify positive and negative keywords is that the program can not only be used to identify the "aboutness" of texts: it also has the potential to provide important stylistic information about - and to do this automatically. The procedure derived from Biber 1988 to typify texts is lengthy and complex, and requires carefully marked-up corpora. WordSmith Tools, by contrast, appears to be able to give a rough profiling of texts on the basis of positive and negative keywords. An analysis of the top ten positive keywords and the five negative keywords that WordSmith Keywords identified from the Romantic Fiction (RomFict) set in the LOB corpus illustrates this potential.
| POSITIVE Keywords: RomFict |
||
| N |
WORD |
FREQ. |
| 1 |
SHE |
566 |
| 2 |
HER |
559 |
| 3 |
I |
656 |
| 4 |
HE |
575 |
| 5 |
YOU |
512 |
| 6 |
N'T |
266 |
| 7 |
HAD |
373 |
| 8 |
HIM |
180 |
| 9 |
WAS |
530 |
| 10 |
NIGEL |
45 |
Table 3 – RomFict: positive keywords
Unlike in PP, the top keywords in RomFict are personal pronouns, the past forms had and was, the negative particle n't and, last but not least, Nigel. First and second person pronouns are associated with Biber's text dimension #1 "Involved versus Informational Production" - a high factor score indicating emphasis on relationship building rather than on factual information - a typical feature of spoken language use. Third person pronouns, synthetic negation and past tense verbs are all associated with the Biber text dimension #2 "Involved versus Informational Production". Although Biber & Finegan 1989a do not make use of this dimension in their classification of texts on a literate oral cline, it is relevant here given the large amounts of dialogue or reported speech which occur in RomFict. With a sufficiently extensive basis for comparison, it is probable that evidence from the positive keyword counts would indicate that, along two text dimensions, RomFict is significantly unlike many other written texts - sharing more features with spoken communication.
The findings we can obtain from negative keywords in RomFict are in some senses even more revealing, and contain two surprises - the and of.
| Negative Keywords: RomFict |
||||||
| N |
WORD |
FREQ. |
RomFict % |
FREQ. |
BNC % |
KEYNESS |
| 48. |
IN |
394 |
1.27 |
21,184 |
1.96 |
85.2 |
| 49. |
BY |
52 |
0.17 |
5,908 |
0.55 |
110.2 |
| 50. |
IS |
120 |
0.39 |
9,954 |
0.92 |
121.1 |
| 51. |
OF |
533 |
1.72 |
32,656 |
3.02 |
206.6 |
| 52. |
THE |
1,258 |
4.06 |
67,075 |
6.21 |
271.0 |
Table 4 - RomFict: negative keywords
When compared with BNC written core, the most negative (i.e. the relatively most prominently infrequent) keyword in RomFict is the. As the definite article is usually the most frequent word in any general text corpus, it is strongly counter-intuitive to find it occupying this position in the keyword list for a genre. One possible explanation for why this should be is that there is a significantly smaller proportion of nouns in RomFict than in BNC. We have already seen from our discussion of positive keywords that RomFict appears to be more "oral" than "literate" (and this is supported by the position that it occupies in the results of the Biber study where it is consistently positioned alongside spoken or informal epistolary texts). If, therefore, RomFict is more "oral" than many other kinds of writing, we would expect it to have a relatively low proportion of common nouns (Halliday 1989) - and this, allied with large numbers of proper nouns and personal pronouns of could well result in a relatively low frequency of definite articles.
Such a view is supported by making a keyword list for the 1 million word Spoken Component of Core BNC referenced against the 95 million word Guardian corpus. In this list of 1,286 keywords, the is also the most negative keyword (interestingly, the does not appear at all in the keyword list resulting from a comparison of the Written Component of Core BNC with the same corpus). The most negative keywords in this instance being source, Guardian (!!) date, page and pounds.
The other surprising negative keyword is of . We already have an indication that RomFict is different from general text populations from a frequency sorted wordlist derived from the corpus (Table 5 - Romfict: frequency). Rather than occupying the second or third position, as is the case in general collections of written texts, in this instance of is #9 and only represents 1.72% of the total words in the corpus.
| N |
Word |
Freq. |
% |
| 1 |
THE |
1,258 |
4.06 |
| 2 |
TO |
927 |
2.99 |
| 3 |
AND |
805 |
2.60 |
| 4 |
I |
656 |
2.12 |
| 5 |
A |
633 |
2.04 |
| 6 |
HE |
575 |
1.86 |
| 7 |
SHE |
566 |
1.83 |
| 8 |
HER |
559 |
1.80 |
| 9 |
OF |
533 |
1.72 |
| 10 |
WAS |
530 |
1.71 |
Table 5 - Romfict: frequency
According to Sinclair of is typically more than 2% of the words in a corpus "regardless of the kind of text involved" Sinclair J, 1991:84, so there is something odd going on in RomFict (1.72%). Again, the relatively low frequency could be to do with the large amounts of dialogue or indirect speech which occur in romantic fiction. Two random extracts from the original text of RomFict and the PP corpus demonstrate how this contrast might arise. Postmodifying uses are marked thus: of; other uses are marked of.
"There were few passengers on the plane and Gavin was
quickly through the customs. 'Gay! Gavin!' The girl and her luggage had disappeared
and they were alone together. The porter brought Gavin 's bag out to the taxi.
'Just a moment, darling,' Gavin pressed her hand and smiled. 'I want to check
up on the flights back.' Gay went out to the waiting taxi, and then found
that in the excitement of meeting Gavin she had left her sun-glasses
on the veranda. She went quickly back to fetch them. Gavin and the girl who
had got off the plane with him were talking. He was writing something in his
pocket-book, with a sick feeling of despair Gay knew that of
course it was her address. Gavin joined her and at once dispelled her fears.
'That little bit you saw me talking to, her father is a big land agent, she
says that he sometimes has farms for lease... you know that's what I want,
Gay, a farm and you!'"
169 words: LOB Romantic Fiction Collection
"This first of these examples is particularly
significant as it is an almost exactly parallel project to the one proposed
for the xxx. The Latvian Development Agency was established with assistance
from xxx. Over the last two years, we have supported the planning and development
of its main activities, attracting inward investment and encouraging
export development, we have assisted its promotional programme, undertaken
a series of action-orientated industrial sector studies and
provided a programme of training activities for key staff. The
scale, terms of reference and work undertaken by xxx in Latvia
are very similar to those proposed for xxx, in a country at a similar stage
of economic, social and political development and undergoing
the same fundamental transition from a centrally planned command economy to
an open market economy. Xxx therefore has current and directly comparable
experience to bring to this project in Bulgaria."
145 words. SQ-BULG.FMT PP Corpus (xxx indicates
a proper noun which has been substituted to maintain confidentiality.)
In the RomFict extract there are only three instances of of and of these one (of course) is an idiom. All instances occur outside the dialogue and are associated with the elaboration of the description of the emotions of the actors (excitement of meeting, sick feeling of despair). In the PP there are 6 instances in a slightly shorter passage. All of these are postmodifying qualifications of relatively general superordinate terms - first / development* / series / programme* / terms / stage - two of which (marked *) are keywords in PP. It is also of interest that of is itself a positive keyword in the PP corpus. Sinclair 1991 argues that of can be thought of as a partitive or quantifier rather than a preposition (Sinclair J, 1991:87). In this role it is frequently used in N1 + of + N2 patterns, where N2 is best interpreted as the headword of the nominal group as it is the "principal reference point to the physical world" (ibid:87). What we are possibly seeing in the contrasting frequencies of occurrence of of in RomFict and PP is a contrast between texts in which there are proportionately fewer nouns - and where there is a low level of need to elaborate their meanings; and informationally dense texts which use such meaning elaboration in the noun phrase in achieving their effect. Again, further confirmation is provided by the fact that of is the second most negative keyword in the Spoken BNC list (referenced against Scott's Guardian corpus).
This brief review of major positive and negative keywords in RomFict gives an indication of the WordSmith program's potential value in genre analysis. While Biber's approach to genre differentiation remains the most comprehensive available (although in need of further refinement), it would be unrealistic to assume that a teacher or student would undertake a similar study in order to come to grips with an unfamiliar genre. WordSmith Tools, by contrast, does appear to offer teachers and students a way into a text or text collection. It also seems to have the potential to help teachers and learners start to come to conclusions about how lexis, grammar and communicative purpose are interacting in specific genres in a way which no other easily available software can do. While these results are not definitive, they do indicate that where will be a benefit in using the Keyword program in the analysis of spoken / written differences and the stylistic typification of specific genres.
Interpreting keyword lists: an example
There will not be sufficient space here to offer a full analysis of keywords in the PP Corpus using Hoey's 5 questions. I feel, however, there will be value in presenting a partial study of a single lexical item as this gives an insight into the kinds of understanding of a genre which can be gained by viewing lexis from such a point of view. The item I will consider - experience - is derived from a list of what Scott calls key-keywords,that is the keywords which occur in all, or most, of the texts which make up the corpus you are studying (Scott M, 1997a:237).
| N
WORD OF 14 1 TRAINING 14 2 PROJECT 14 3 MANAGEMENT 12 4 DEVELOPMENT 12 5 PROGRAMME 12 6 EXPERIENCE 11 7 IMPLEMENTATION 11 |
Table 6 - PP key-keywords
Table 6 contains the 7 most prominent key-keywords in PP. Training and project occur in all 14 texts, followed by the others in the sequence given. I have selected experience because it was the least predictable.
In this brief example I will only make use of the second of Hoey's 5 questions: Does the word regularly associate with particular other meanings? Here Hoey is referring to the idea of semantic prosody introduced in Louw B 1993. He extends Louw's notion in line with Stubbs 1996 - who shows how cause and happen are associated with negative events, and Campanelli and Channell's (1994) study of train as a - which is seen as having semantic prosody with occupation. In this analysis semantic prosody not only entail positive or negative associations but can also entail categories like occupation or negative events.
While accepting the usefulness of the notion of semantic prosody as outlined by Louw and Stubbs, Hoey argues that it should be extended so that it not only covers broad categories such as 'unpleasant' or 'positive' but also includes more specific prosodies such as e.g. "occupation". Hoey justifies this by drawing attention to a phrase such as train as a which not only occurs with common collocates such as teacher, nurse, or lawyer, but is also found in rare combinations e.g. sathin, boxing second, kamikaze pilot (Hoey 1997a:2). These latter would never be thrown up as obvious collocates of train as a, but they are, nevertheless, strongly implicated in a semantic prosody: PROFESSION.
If Hoey's second question is asked of a word as it is used in general, we gain insights into what we will call its global semantic prosody. Hoey 1997a gives the example of a study of consequence (in its meaning of 'effect' as opposed to 'importance') in a large general corpus. Hoey identifies four semantic prosodies for consequence:
the logic of underlying processes - 56% (inevitable, inexorable, likely, probable …)
the badness of an outcome - 15% (dire, appalling, regrettable …)
the seriousness of an outcome 11% (important, decisive …)
the expectedness or otherwise of an outcome - 9% (unintended,
odd …)
(Hoey M 1997a:3)
A similar study by Stubbs (Stubbs M, 1995:247) identified a predominantly negative semantic prosody for the verb CAUSE (e.g. causean accident, cancer, death, pain etc.)
For the lexicographer or student of lexis these insights are important. For our present purpose, however, it may be more helpful to ask this question in relation to the local semantic prosody of the word under scrutiny (although this may be a component of the global semantic prosody of the word). I have discussed elsewhere (Tribble C, forthcoming) an example of the positive semantic prosody of international in PPs, and proposed that words in certain genres may establish local semantic prosodies which only occur in these genres, or analogues of these genres. So, for the purposes of this study, I will refine Hoey's second question to read "Does the word regularly associate with other particular meanings in this context?" I am not assuming that all keywords in a text will have specific local semantic prosodies, but I am proposing that this is an aspect of language use worth considering as it will constitute important local knowledge for writers in a specific genre. What I have found interesting in the case of experience is that there do appear to be identifiable differences between the meanings with which experience is associated in PP and its meaning in a general population of texts
The local semantic prosody of experience in PP results from the predominant associations it takes on in this environment - and these associations are common to all the proposals in the PP corpus as experience is a key-keyword. The typical left and right contexts of experience are:
| Preceded by |
# |
% |
Preceded by |
# |
% |
| and |
38 |
9.77 |
EU |
7 |
1.80 |
| has |
24 |
6.17 |
have |
7 |
1.80 |
| considerable |
23 |
5.91 |
their |
7 |
1.80 |
| Our |
21 |
5.40 |
This |
6 |
1.54 |
| years |
17 |
4.37 |
with |
6 |
1.54 |
| - |
16 |
4.11 |
direct |
5 |
1.29 |
| of |
16 |
4.11 |
substantial |
5 |
1.29 |
| . |
15 |
3.86 |
) |
4 |
1.03 |
| extensive |
15 |
3.86 |
broad |
4 |
1.03 |
| International |
14 |
3.60 |
on |
4 |
1.03 |
| the |
12 |
3.08 |
relevant |
4 |
1.03 |
| practical |
11 |
2.83 |
consultancy |
3 |
0.77 |
| wide |
11 |
2.83 |
depth |
3 |
0.77 |
| Project |
10 |
2.57 |
His |
3 |
0.77 |
| - |
7 |
1.80 |
professional |
3 |
0.77 |
Table 7 - PP experience left context (82.53% of all instances)
| Followed by |
# |
% |
Followed by |
# |
% |
| of |
148 |
38.05 |
is |
5 |
1.29 |
| in |
94 |
24.16 |
from |
4 |
1.03 |
| and |
20 |
5.14 |
consultant name |
3 |
0.77 |
| elsewhere |
12 |
3.08 |
both |
2 |
0.51 |
| to |
10 |
2.57 |
for |
2 |
0.51 |
| . |
9 |
2.31 |
includes |
2 |
0.51 |
| , |
7 |
1.80 |
over |
2 |
0.51 |
| RELEVANT |
7 |
1.80 |
which |
2 |
0.51 |
| as |
6 |
1.54 |
with |
2 |
0.51 |
Table 8 - PP experience right context (86.6% of all instances)
The overall picture of experience in the context of PP is that it is:
frequently linked to another noun with and
frequently associated with the verb have
frequently qualified by a noun or adjective (considerable, extensive, international, practical, relevant, wide, EU, project, training, working, years) which emphasises the superior quality of the experience
frequently followed further specification of the kinds of experience in question (postmodifying prepositional phrase introduced by of or in - 62.2% of all right context words)
Taking these immediate contexts and a review of the broader settings in which experience occurs in PPs, an interpretation of experience in this context can be summarised by means of a COBUILD dictionary style definition (my apologies to professional lexicographers!!):
DEFINITION: Experience is a form of professional capital which can be used to warrant opinions or recommendations and establish the authority of one consulting or management agency over and above that of others.
EXAMPLE: company name has gathered extensive knowledge and experience in transferring and adapting …
…The wider knowledge and experience of management training…
…Our input will be to draw on successful experience elsewhere and help local
staff… … These examples of project experience demonstrate not only a strong
…
…DHV has accumulated the knowledge and experience to assist institutions…
Such a definition stands in contrast (over and above the quality of defining style) with those provided by the COBUILD dictionary itself:
Experience is knowledge or skill in a particular job which
you have gained because you have worked at the job for a long time.
EXAMPLE: I had no military experience...
...in my experience as a teacher...
...experience of working with children...
He was senior to me in experience...
Experience is the state or process of feeling something
or being affected by it.
EXAMPLE: The experience of colour is wholly subjective...
...the experience of fear.
Experience is all the events, knowledge, and feelings
that make up an individual's life or the character of a society.
EXAMPLE: Everyone learns best from his own experience...
...speaking from personal experience.
An experience is something that happens to you or something
that you do, especially something important that affects you.
EXAMPLE: The funeral was a painful experience...
...my later experiences in the village.
(Sinclair et al, 1987)
While COBUILD Definition 1 does in some senses include the PP meaning of the word, it has not been designed to accommodate the specific local semantic prosody which experience gains in the PP environment. This is not a criticism of the dictionary; rather, it is a comment on the way in which a particular environment (co-texts, readership) colours the meaning of a word. The 1 million word spoken subset in BNC Core offers 100 instances of experience which do conform, more or less, to the senses offered in the COBUILD definitions. Examples are given in the table below.
| Definition 1.
|
Definition 3.
|
| Definition 2.
|
Definition 4.
|
Table 9 - BNC Spoken Corpus - Experience: semantic prosodies other than "professional"
The spoken corpus also provides examples of the sense in which I feel experience is used in PP.
|
Table 10 - experience: BNC spoken corpus data
Significantly, these 'professional experience' uses occur in three particular environments:
non-phrasal coordination - already commented on as significant in the formal suasive wording of PPs (and paired with knowledge - a strong collocate of experience in PPs)
as the object of lexical verb to have + following preposition in - again, a feature of experience in PPs
qualified by considerable - once again, a strong collocation in PPs
The contrast between experience in PPs and in Spoken BNC Core is now clear: while Spoken BNC Core contains most meanings offered by the COBUILD lexicographers, PP contains no instances of COBUILD Meanings 2, 3 and 4, and the two isolated instances of Meaning 1 are exceptions which prove the rule. In example 1, "work experience" is part of the consultant's earlier profile, but he has since taken on a much broader set of professional interests. In example 2 "work experience" is what the professional provides for other people…
His education, and earlier work experience, was an industrial chemist, but since 1988 his main interest has been in SME development.
… selection; implications in training, work experience and placement; sexual harassment … (BNC Spoken Corpus data)
What we see in operation in PP is a local semantic prosody which has been specific to this genre or to PPs and other analogous genres. Experience has not been given a new, technical meaning in PPs. Rather, a local semantic prosody (implicit in the COBUILD 1 definition) which may be unique to PPs - though intuitively I do not consider this to be the case - has been similarly exploited by writers in three different organisations as part of the suasive rhetoric of the proposals.
Conclusion - making a connection with the classroom
In Tribble 1997 I proposed that writers require 4 kinds of knowledge in order to be able to respond adequately to a specific writing task. These are:
| content knowledge knowledge of the concepts involved in the subject area writing process knowledge knowledge of the most appropriate way of preparing for a specific writing task context knowledge knowledge of the social context in which the text will be read, and co-texts related to the writing task in hand language knowledge knowledge of those aspects of the language system necessary for the completion of the task (Tribble C 1997: 43) |
In this paper I have shown how it is possible to use a corpus of instances of a specific genre to provide learners with access to aspects of both language knowledge and, as a result of further analysis, context knowledge. I have also shown that although a POS marked corpus will provide the fullest account of the linguistic characteristics of a genre, an analysis of keywords also offers a powerful means of establishing which words (and phrases) matter in a collection of examples of a genre. Once these words have been identified Hoey's five questions provide a systematic basis for teachers and learners to investigate this lexis, and in so doing reduce the strangeness of a genre and its associated texts, and thereby reduce the difficulty of writing into the new genre.
Altenberg B (1989) Studia Linguistica 43 (2), pp.167-174.
Bhatia V K (1993) Analysing Genre: Language use in professional settings Longman Harlow
Biber D (1988) Variations across Speech and Writing Cambridge University Press Cambridge
Biber D & E Finegan (1989) "Drift and the evolution of English style: a history of three genres" Language 65: 487-51
Flowerdew J (1993) "An educational or process approach to the teaching of professional genres"ELTJ 47/4305-316 Oxford University Press Oxford
Hoey M (1997) "From concordance to text structure: new uses for computer corpora" in Melia J & Lewandoska B (eds) Proceedings of PALC 97 odz University Press odz
Louw B (1993) "The diagnostic potential of semantic prosodies"in Baker M, Francis G & E Tognini-Bonelli (1993) Text and technology: in honour of John Sinclair :157-176
Nakamura J (1993) "Statistical methods and large corpora: a new tool for describing text types"in Baker M, Francis G & E Tognini-Bonelli (eds) Text and technology John Benjamins Philadelphia / Amsterdam
Scott M R (1996) WordSmith OUP Oxford
Scott M R (1997a) "PC Analysis of key words - and key key words"System 25/2:233-245
Scott M R (1997b) "The right word in the right place: key word associates in two languages"AAA Arbeiten aus Anglistik und Amerikanistik 22/2:239-252
Stubbs M (1995) "Corpus evidence for norms of lexical collocation"in Cook G & Seidlhofer G (eds) Principle and Practice in Applied Linguistics: studies in honour of HG Widdowson OUP Oxford
Stubbs M (1996) Text and Corpus Analysis Blackwell Oxford
Tribble C (1997) Writing OUP Oxford
Tribble C & G Jones (1997) Concordancing in the classroom new edition Athelstan Houston Tx
Tribble C (forthcoming) Writing Difficult Texts unpublished PhD Thesis Department of Linguistics and Modern English Language Lancaster University
Williams R (1976) Keywords Fontana London
return to Contents