I was writing a post and it turned out a bit too long, so I figured an article would suit better.
Last year we won the IBM challenge at the TechCrunch hackathon in Paris during the Viva Technology event. Quick recap below and a cameo of me doing some explaining. (After a 6-hour road trip and no sleep for over 32 hours, can you tell?)
But I still owed you a video of our winning concept! We built an app to translate sign language to (spoken) text and integrated text-based emotion recognition on the IBM Watson Platform, using the Vision API:
Pretty cool if you ask me 🙂 Keep in mind, none of us were actual sign language users. We just gave it our best shot at learning overnight and invented our own sign for the non-existing word ‘hackathon’.
Non-data science enthusiasts may stop reading now. It’s gonna get a bit more technical.
After the hackathon and Watson trial, I built another PoC using the open source tool DarkFlow. (After I was able to ambush my cousins for more training data during a family meet-up.)
“This Hackathon is awesome!”
It’s just 4 words and took over an hour for 100 pictures per class on a Geforce GTX 1080Ti to get it working on out-of-sample yours truly. You can imagine the problem grows exponentially when you need to identify and distinguish more signs/gestures. And to string words to sentences you’d need to interpret context because it turns out in sign language, not all verbs can be inflected. But for a limited vocabulary, this concept would definitely work. And the network itself can still be tweaked as well of course:
Between working for my start-up Wavy and my current client, it’s a shame I don’t have any time to develop it further. But if anyone else would want to give it a try: DarkFlow allows data scientists to train their own real-time object detection models using a TensorFlow implementation of the famous Darknet framework. Some of you have seen it without even knowing it, but it’s the framework behind the real-time object detection videos you see made with YOLO.