

===================================================
FestCat: Speech Synthesis in Catalan using Festival

  http://www.talp.upc.edu/festcat
  Antonio Bonafonte
  TALP Research Center
  Barcelona, November 2007

===================================================

1. WHAT? 
2. WHO?
3. TERMS AND CONDITIONS
4. REQUERIMENTS
5. INSTALLATION
6. EXECUTION
7. THANKS

===================================================


1. WHAT?  
"Festival parla català"

The FestCat package consists of a library providing
analysis of Catalan text, and the data to extend  
Festival so that it can speak Catalan.

This project has been originally developed by
the TALP Center, Universitat Politecnica de Catalunya, Barcelona.

http://www.talp.upc.edu/festcat

Basically, there are two components:

(1) Linguistic data and code to extend Festival for Catalan.
    Dictionaries, tokenizer, lts rules, POStagger data, etc.

    This includes two folders:
    dicts/upc (basically dictionaries)
    upc_catalan (basically code)

(2) Voices: speaker dependent data. 
    There is one folder for each voice
    voices/catalan/upc_ca_'speaker-name'

    Several voices have already been developed.
    Check the web page to get the latest downloads.

2. WHO?

Most of the code and data has been specifically developed for this project
by the TALP Research Center at UPC

	www.talp.upc.edu/festcat
	www.talp.upc.edu
	www.upc.edu

A significant exception are the dictionaries.

The main source for building the dictionaries is the Catalan lexicon
provided by the FreeLing project, also developed by the TALP Research Center
and others: please, visit FreeLing web site for more information:
	http://garraf.epsevg.upc.es/freeling/

The lexicon has been enriched in the following way:
 - Phonetic transcriptions have been automatically generated using the
   the TALP phonetic transcription toolkit

 - New word forms have been added using frequent words found in our corpus
   and words found in our 'speech' data to ensure better coverage when
   designing the voices.


3. TERMS AND CONDITIONS
   All the code and linguistic resources are provided under the 
   LGPL license (see the COPYING file).


4. REQUERIMENTS
   You need a working Festival system.
   Check in your Linux distribution or in the Festival home page
   http://www.cstr.ed.ac.uk/projects/festival/

   We have been working with version 2.1 November 2010
   (Execute $ festival --version )

5. INSTALLATION

   We have developed several catalan voices.
   All of them share a common library, which is language
   related. Therefore, you need the 'base' package plus
   the specific voices you are interested in.

   You just need to copy several folder in the datadir of Festival.
   To find this directory, you can execute
   $ festival -b '(print datadir)'
   If this directory is not defined, you should use the 'libdir'
   directory:
   $ festival -b '(print libdir)'


   * COMMON PACKAGE *
   Download the file upc_ca_base.tgz and extract the files:
      $ tar -zxf upc_ca_base.tgz
   Move the extracted files to the festival 'datadir':

   a) Dictionaries: 
      Copy the folder dicts/upc to 'datadir'/dicts/upc

   b) Catalan tokenizer, tagger, etc. 
      Copy the folder upc_catalan to 'datadir'/upc_catalan

   c) If you want that Festival understand the --language 
      option and exports the catalan speakers to other applications, 
      you need to update the languages.scm file to add Catalan. 
      We provide this file:
         languages.scm => 'datadir'/languages.scm


   * VOICE SPECIFIC PACKAGES *

   Download the file of each voice (check the web for updates,
   http://www.talp.upc.edu/festcat ) and extract the content
   Ex: 
      $ tar -zxf upc_ca_ona_hts.tgz
   
   d) Copy each catalan voice, ex: upc_ca_ona_hts, in the voices
      directory. Example:
      upc_ca_ona_hts => 'datadir'/voices/catalan/upc_ca_ona_hts

6. EXECUTION

   There are several front-ends to be used with Festival, as 
   gnopernicus, or emacs-speak ... Here we only mention the direct use of
   Festival.

   WARNING WARNING WARNING !!!
     Festival expects ISO-8859-15 encoding. Be sure that you use
     this encoding in your terminal or files. If your system uses UTF-8 (as
     do many distributions today) you need to convert the file before reading.
     Some front-ends, as gnopernicus, do the conversions for you.

     You can use the "save as" options in gedit; or use programs to convert the 
     format, as iconv:
     $ iconv -f utf8 -t ISO-8859-15//TRANSLIT myfile_utf8.text > myfile_latin1.text

   !!!


   * A quick test:
     $ echo "Bon dia, Catalunya" | festival --tts --language catalan

   * You can also execute Festival in interactive way:
     $festival
     (language_catalan)
     (intro-catalan)
     (SayText "Bon dia, Catalunya.")
     (SayText "Bona nit.")
     (exit)

     If you want to specify the speaker, introduce the command to 
     select the speaker instead of the language selection command; 
     or just use it to change the speaker:

     (voice_upc_ca_ona_hts)
     (SayText "I tu, qui ets?")
     (voice_upc_ca_pau_hts)
     (SayText "Jo sóc, el que tu ets, i si et faig mal, em faig mal a mi mateix.")
     (voice_upc_ca_ona_hts)
     (SayText "Que maco. Això és de l'assemblea dels infants, oi?")
     (exit)

     O per llegir un fitxer de text, per exemple "bon_dia.txt": 

     $ echo "Bon dia, Catalunya." > bon_dia.txt
     $ festival
     (language_catalan)
     (tts_file "bon_dia.txt")
     (exit)

   * Or use the text2wave script to create a .wav file:
     $ text2wave -o bondia.wav   -eval '(language_catalan)' bon_dia.txt 

     If you want to specify the speaker:
     $ text2wave -o bondia.wav   -eval '(voice_upc_ca_ona_hts)' bon_dia.txt 


7. THANKS
   This work has been supported by the Catalan Government
   www.gencat.net

   The project was promoted by several departments from the Catalan Government
    - Departament d'Educació
    - Secretaria de Telecomunicacions i Societat de la Informació 
      del Departament de Presidència. 

   and from the Universitat Politècnica de Catalunya (UPC)

    - TALP Research Centre
    - Càtedra d'Accessibilitat
    - Càtedra de Programari Lliure


   Read the THANKS file to see the list of people that have 
   contributed to this project.
