A while back I took the plunge and started to look into programming for Alexa and Google home devices. Here’s what I discovered.
Firstly, both Google and Alexa appear to work in much the same way. Forget all those ideas you had about big super-computers in the background analysing everything you say and then trying to match it to what you want to happen. My discovery is it works quite differently to what we expected. It’s close but not surprisingly there’s much more human input.
For example consider this question: “Alexa, turn on the upstairs light”
For this to work, Alexa needs to understand the intention (or intent) which in this case is turn a device on. It also needs to know which device if there is more than one to choose from. So we might have multiple lights but only one of these has been called the ‘upstairs’ light. If so, Alexa will be able to complete this action. If not, she will revert to a back-up action which will be something like “I’m sorry but I don’t understand”. This is called a fallback and is equivalent to the else part of an if-then structure.
So, programmatically it looks a bit like this:
If intent=”turn on” & device=”upstairs light” then
turn on the device called “upstairs light”
respond with the default fallback response
Further to this, it’s left to the coder to think up all the possible combinations of speech which could be used to trigger this action. So when we set-up the intent block, we include as many phrases as we can think of which would trigger this action. So the human programmer is specifying all the possible combinations of words that a human might use to trigger the action and the machine just needs to match any one of them to perform the desired action. So the magic is gone and now you know those little round microphones simply convert what you say into text strings which is a great acheivement, but not quite the same as giving you access to a super-intelligent computer the size of a city that can work out the meaning of life, the universe and everything in the spare cycles between turning your bathroom light on & off.
Still it’s a great weekend project and you will get quick results from trying this out so why not have a go. I would recommend using the Google eco-system to start with as I only got so far with the Amazon tutorial before it started asking for credit card numbers and implying that I might want a plethora of AWS services without any explanation of why or exactly how I might trigger charges before the end of my free 12-month trial so long as I don’t break any of the poorly described rules. I decided to cancel at that point and since there wasn’t a cancel button I just closed the page without submitting so it will be interesting to see what happens. If you want to try it though, here’s the article that got me this far:
The Google experience was much better. I followed a link which then took me to their dialogflow.com site. It had the “sign up for free message” which allowed me to login with my Google account (and on future visits you just need to click on the ‘go-to-console’ link at the top-right of the page). My first app didn’t take long to put together and so far I’ve heard nothing about needing to pay for virtual severs or other infra-structure which is great and makes it useful to organisations like Wavemaker where this can be taught to young coders who won’t even have their own credit cards so kudos to Google for this. Anyway here’s a great link I used to get started:
Have a go and leave a comment if you discover anything amazing