Thursday, January 07, 2010

LingMA update, 1-7-2010

So, as I mentioned a few weeks ago, I reset my deadline for the first cut of my project software for December 31. Recall that this software would:

  1. parse input segmentation file, determine lexical items
  2. invoke subject field detection module
  3. parse input termbases for lexical item matches

Item 1 is a simple parsing task. Easy enough to implement, but only doable once I have input files to parse in the proper format. This is a task I neglected to itemize in my initial project plan, which is unfortunate because it will take me some time. I plan to work on that over the next week, so let's say January 15 as a target date by which I'll have at least 2 or 3 input segmentation files.

Item 2 is now unnecessary, as my committee chair suggested I avoid the need to integrate a subject field detector by having the user specify subject field at processing time. One open question in my mind is how we'll represent the subject field: is it simply free-text, which is much less useful, or is it from a specific ontological system? If the latter, what ontology should we use? This seemingly simple subtask suddenly looks complex again.

Item 3 is the simplest subtask in this first incarnation of the software - all we'll need is a good XML/XPath parser to do the lookup. However, as with the segfiles, writing the parser isn't enough if there's no input data to parse. So after producing some sample segfiles with identified target terms, I'll need to hand-create a TBX file with enough entries to be useful. Setting January 22nd for this deadline is likely over-optimistic — even January 29th is probably pushing it, but that's the date I'm going to set.

I just got an email from the BYU Linguistics dept secretary, saying that in order to graduate this semester I'd have to have my final thesis / project writeup delivered to my committee by the first week of February. At this stage in the process, I think it's pretty obvious that I won't be ready to defend by then. Soooooo... Now shooting for Summer term.

Labels:

0 Comments:

Post a Comment

<< Home