By Luis Marcelino, Instituto Politécnico de Leiria
On Monday, August 11th 2014, 11:00
By João Oliveira
On Tuesday, August 12th 2014, 11:00
By Pedro Manha and Ricardo Filipe, Centro Interdisciplinar de Documentação Linguística e Social
On Wednesday, August 13th 2014, 11:00
In any team working on a given project, especially if that team is dispersed through one or more countries, there are a couple of problems, for instance: sharing the work being done by each member and debating the state and evolution of the project.
This workshop aims to give attendees basic knowledge of a couple of tools that can help mitigate the impact of the described situations. These tools are:
After this workshop, it will be easier for attendees to start contributing to any project that uses these tools and create projects of their own.
On Thursday, August 14th 2014, 11:00
Normalization of dialects and variants is an interesting are of the NLP. It is used for Information Retrieval on historical texts dialectal texts, preprocessing of dialectal variants (before processing these corpora using standard NLP tools), conversion speech-text...
There are two main approaches: rule-based and data-driven. Finite-state technology is adequate for both approaches.
In the first approach a grammar is written based on a linguistic description of the changes among the variant and the standard (or pivot) language. Toolkits for describing phonological/morphological changes are used. Foma will be the toolkit we will use for this.
The data-drive approach is based in a parallel list of words (pairs the equivalent words in the variant and the standard language). The toolkit is oriented to learn from this list and to generalize the changes. The noisy-channel model is very popular for this kind of task. Phonetisaurus will be the toolkit we will use for this.
Basic material: slides_inaki.zip
Rule-based (foma): http://code.google.com/p/foma/
Data-driven approach: Phonetisaurus (http://code.google.com/p/phonetisaurus/)
Development of a simple normalization-tool for a language/dialect (it would be interesting if the students propose a real problem where a list or a formal description is available).