Sunday, May 6, 2018

Automatic writing with Deep Learning: Preface

Quite many machine and deep learning problems are directed at building a mapping function of roughly the following form:

Input X ---> Output Y,


X is some sort of an object: an email text, an image, a document; 

Y is either a single class label from a finite set of labels, like spam / no spam, detected object or a cluster name for this document or some number, like salary in the next month or stock price.

While such tasks can be daunting to solve (like sentiment analysis or predicting stock prices in realtime) they require rather clear steps to achieve good levels of mapping accuracy. Again, I'm not discussing situations with lack of training data to cover the modelled phenomenon or poor feature selection.

In contrast, somewhat less straightforward areas of AI are the tasks that present you with a challenge of predicting as fuzzy structures as words, sentences or complete texts. What are the examples? Machine translation for one, natural language generation for another. One may argue, that transcribing audio to text is also such type of mapping, but I'd argue it is not. Audio is a "wave" and the speech detection is an okay solved task (with state of the art above 90% of accuracy), however such an algorithm does not capture the meaning of the produced text,  except for where it is necessary to do the disambiguation of what was said. Again, I have to make it clear, that audio->text problem is not at all easy with its own intricacies, like handling speaker self corrections, noise and so on.

Lately, the task of writing texts with a machine (e.g. here) caught my eye on twitter. Previously, papers from Google on writing poetry or other text producing software were giving me creepy feelings. I somehow undermined the role of such algorithms in the space of natural language processing and language understanding and saw only diminishing value of such systems to users. Again, any challenging tasks might be solved and even bring value to solving other challenging tasks. But who would use an automatic poetry writing system? Why would somebody, I thought, use these systems -- just for fun? My practical mind battled against such "fun" algorithms. Again, making an AI/NLProc system capable of producing anything sensible is hard. Take the task of sentiment analysis, where it is quite unclear what the agreement between experts is, not to mention non-experts.

I think this post has poured enough of text onto the heads of my readers. I will use this post as a self-motivating mechanism to continue the research with systems producing text. My target is to complete the neural network training on the text from my Master thesis and show you some examples for your judgement of the usefulness of such systems.

Saturday, May 5, 2018

AI for lip reading

It is exciting to push your imagination for where else can you apply AI, machine learning and most certainly -- deep learning, that is so popular these days. I came across this question on quora that provoked me to think a bit how would one go about training a neural network to lip read. I don't actually know what made me answer this question more: that found myself in an unusual context sitting on an Angularjs meetup at Google offices in New York City (after work, usual level tired) or the question itself. Whatever the reason, here is my answer:


I would probably first start with formalizing what is lip reading process from a human understandable algorithm point of view. May be it is worth to talk to a professional, like a spy or something. Obviously you need training data. Understanding, what is lip reading from the algorithm perspective will affect on what data you need.

    1. To read a word of several syllables you’d need a sequence of anchor lip positions, that represent syllables. Or probably vowels / consonants. See, I don’t know, which one is best. But you’d need to start with the lowest level possible out of which you can compose larger sequences, like letters -> syllables -> words. Let’s call these states.
    2. A particular lip posture (is that the right word?) will most probably map to ambiguous states.
    3. Now the interesting part is how to resolve the ambiguities. Number 2 produces several options. Out of these you can produce a multitude of words that we can call candidates.
    4. Then you need to score candidates based on some local context information. Here it turns into a natural language understanding.
    5. I'd start with seq2seq.

    Tuesday, January 16, 2018

    New Luke on JavaFX

    Hello and Happy New Year to my readers!

    I'm happy to announce release of completely reimplemented Luke -- using JavaFX technology.  Luke is the toolbox for analyzing and maintaining your Lucene / Solr / Elasticsearch index on low level. 

    The implementation was contributed by Tomoko Uchida, who also did the honors of releasing it.

    The excitement of this release is supported by the fact, that in this version Luke becomes fully compliant with ALv2 license! And it gets very close to be contributed to Lucene project. At this point we need lots of testing to make sure JavaFX version is on par with the original thinlet based one.

    Here is how load index screen looks like in new JavaFX luke:

    After navigating to the Solr 7.1 index and pressing OK, here is what luke shows:

    I have loaded an index of Finnish wikipedia with 1,069,778 documents, and luke tells me that the index does not have deletions and was not optimized. Let's go ahead and optimize it:

    Notice, that on this dialogue you can request only expunging of deleted docs, without merging (the costly part for large indices). After optimization's complete, you'll have a full log of actions in front of you to confirm the operation was successful:

    You could also opt for checking the health of your index via Tools -> Check index menu item:

    Let's move to the Search tab. It has changed slightly in that search box has moved to the right, while search settings and other knobs were moved to the left.

    Thinlet version:

    JavaFX version:

    It is more intuitive UI now in terms of access to various tools like Analyzer, Similarity (now with access to parameters of new BM25 ranking model, that became default in Lucene and default in luke) and More Like This. There is a new Sort sub-tab that lets you choose a primary and secondary field to sort on. Collectors tab however is gone: please let us know, if you used it for some task -- would love to learn.

    Moving on to the Analysis tab, I'd like to draw your attention towards really cool functionality of loading custom jars with your implementation of a character filter, tokenizer or token filter to form your custom analyzer. Test these right in the luke UI without the need to reload shards in your Solr / Elasticsearch installation:

    Last, but not least is Logs tab. Essentially you should have been missing it for as long as luke exists: getting a handle of what's happening behind the scenes during an error case or a normal operation.

    In addition, this version of Luke supports the recently released Lucene 7.2.0.