Main navigation

  • Home
  • Corpus description
  • Corpus annotation
  • Corpus
    • Catalan recordings
    • English recordings
    • German recordings
    • Italian recordings
    • Spanish recordings
  • Publications
  • Collaborators
NOCANDO

Welcome to the Nocando Project

Corpus description

Corpus collection

The NOCANDO Corpus is a corpus of spoken narrative texts. It was created by recording free picture-based narrations of native speakers in five different languages: Catalan, Italian, Spanish, English, and German.

Speakers

The participants were mostly students at the Universitat Pompeu Fabra in Barcelona. A smaller number came from different working environments.

Catalan and Spanish speakers were undergraduate or graduate students, mean age 22 for Catalan (between 18 and 30) and 20 for Spanish (between 17 and 29). They were all from Catalonia except one (Catalan speaker) from the Comunidad Valenciana and one (Spanish speaker) from Castilla y León.

Italian, English and German speakers were mostly recently arrived Erasmus students, mean age 29 for Italian (between 20 and 56), 27 for English (between 20 and 41), 34 for German (between 22 and 67). Italian speakers came from different parts of Italy. English speakers came from the United States and the UK. German speakers came from different parts of Germany.

Methodology

One frog too many

Speakers were asked to tell a story by following the pictures of three text-less books:

  • Mayer, M. (1973). Frog on his own. New York: Dial Books, Penguin.
  • Mayer, M. (1974). Frog goes to dinner. New York: Puffin Books, Penguin.
  • Mayer, M. and Mayer, M. (1975). One frog too many. New York: Dial Books, Penguin.

For 40 speakers, recordings were done in an acoustically isolated room provided by the Universitat Pompeu Fabra. For the remaining 28 speakers, they were done in a regular room at the Universitat Pompeu Fabra, with a digital recorder.

The three books were given in a random order for each speaker. The speaker could browse the book before starting the narration.

  • Total number of speakers: 68
  • Total number of narrations: 222
  • Total duration: ca 16 hours (2' to 9' per narration)
Table: Quantitative information on each language represented in the NOCANDO corpus.
 CatalanItalianSpanishGermanEnglish
Speakers191613911
Recording time4:02:43 h4:04:32 h2:35:20 h2:09:13 h2:32:20 h
Word count37,555 w27,392 w25,077 w15,944 w21,970 w (estimated)
Segment count5,856 seg4,306 seg3,801 seg2,154 seg3,140 seg (estimated)

Please cite the corpus as:

Brunetti, L., S. Bott, J. Costa and E. Vallduví (2011) . 'A multilingual annotated corpus for the study of Information Structure', in Konopka et al (eds), Grammar & Corpora 2009, 3rd international Conference, Mannheim, 22-24 sept. 2009, Gunter Narr Verlag.
 

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

GLiF
UPF
Powered by Drupal

User account menu

Copyright © 2025 GLiF - All rights reserved

Developed & Designed by chandia.net