吳實錄

Annals of Wu

漢藏緬語々言研究ㄟ博客
a sinotibetoburman linguistics blog
2014-09-12

An Move To Better Transcriptions And Offline Support phonemica - development

We're at a point in our lives where we need to make some hard decisions about Phonemica. It's not sustainable to keep it going the way it is. Expenses are out of pocket, and we're just not able to keep funneling earnings from our day jobs into the project when there are so many other bills that also need paying. Since we're not ready to give up on the project (and I probably won't ever be at that point), one of the things that absolutely has to happen with the project is better automation. We need to be able to give users more control over the data, and make things so ridiculously easy to use that our miniscule but awesome team doesn't need to be in there manually processing everything. A huge step toward that end is the new transcript system, which I've recent about previously.

Another important step is to get more linguists involved. Not necessarily in consultant roles or active with other people's recordings. Instead, we'd like to see more people using Phonemica as a platform for sharing their data with the world. While we understand the desire to keep things under wraps while in the publishing process, we're strong believers in open data and making things available. We would also love to see people fine tuning transcriptions and improving on the data.

One way we're doing that is by offering exports of the transcript data for use with software like EXMARaLDA. Starting now, transcripts from entries can be downloaded as .eaf transcript files for use with ELAN. These are generated on the fly, so the file will reflect the transcript as it is the moment you download it.

In addition to this, I'm nearly done coding an .eaf importer. It will be made available for people who wish to work on transcripts on their computers and then upload the files to Phonemica to be converted into an online entry. To better integrate this, we need to establish better entry ownership so that people can lock and hide their own submissions while they work on the data offline. In general this is somethign we want to do anyway just to give people more control, and so that they can prepare the entry to completion before letting it be public.

The new transcript system is going up tonight. The .eaf downloads are already there, but a little bit hidden at the bottom of the entry pages. With a little luck the .eaf importer will be done some time next week.

These are all thigns we wanted to have from the start but never had the resources. Now that the site is up and stable and has the data and users to test new features with, we can start to integrate these ideas that until now had never seen the light of day. There's much more we want to include in the future, and we will keep trying to get that up.

In the mean time, we think these last few changes will really impact the way you use the site in a very positive way.

    Leave a comment




    About

    A semi-academic linguistics blog about Sinotibetan, previously focused primarily on Wú, a Sinitic language spoken in the Yangtze Delta region. Topics now include historical linguistics, documentation, language rights, sociolinguistics and learning materials, as well as acting as the dev blog for Phonemica from time to time.

    I'm a linguist based in Asia, working on documentation and historical development of Sinotibetan. In addition to academic research, I'm heavily involved in Phonemica, an organisation that promotes crowd-sourced preservation of local languages.

    I'm currently in the field, so getting in touch isn't easy. However you can try to email me at the following address and I'll respond as soon as I'm able:

    yhilan.ko@gmail.com
    © 2009-2017