line graph, bar graph, two women looking at data on a computer, large group of people walking and wearing masks in a city, and two males a manager and frontline employee checking on products in a warehouse
March 25, 2020
Technology

Working Better at RevUnit: Getting Started with Alexa

At RevUnit, integrating emerging technologies like Alexa into value-added workflows is part of who we are. We like creating new solutions that solve problems and want to take a moment to show you an example of how we do this.

To illustrate how we identify and solve problems, let me pull you into our kitchen. There is an exterior door in the kitchen that we usually keep locked. This leads to interruption of meaningful conversations when someone gets up to open the door for guests. Fortunately, we like seeing how to use emerging technologies to solve these types of problems, so I am going to walk you through how you can build a skill for Alexa that can engage with the door locks to facilitate verbal commands for opening the door.

First off, a disclaimer. The purpose of this is to show how you can do something meaningful (and fun) with Alexa, not how you build a bulletproof, secure, voice command door system.  Now let's get started!

Kisi Door Locks

Kisi makes some really cool door locks that typically are engaged by your smartphone. Kisi adds value because it is really easy to provide granular access control with. Moreover, it has an attractive REST API that allows you to integrate new functionality easily on the basis of a particular user by simply authenticating and then unlocking the door.

In order to authenticate, you need your email and password.  A token that Kisi calls the secret is returned that can be used in subsequent requests.  Using urllib2 in the Python 2.7 runtime, you can authenticate like this:

   import urllib2
   import json
   req = urllib2.Request(
       'https://api.getkisi.com/users/sign_in',
       json.dumps({
           'user': {
               'email': <email>,</email>
               'password': <password></password>
           }
       }),
       {
           'Accept': 'application/json',
           'Content-Type': 'application/json'
      }
   )
   f = urllib2.urlopen(req)
   secret = json.loads(f.read())['secret']

Now that we have the secret token, we can turn around and make the request to unlock the door.  Of course, you have to know which door_id it is.  This can be found either using the API or from their web application. Just send an unlock request like this:

   req = urllib2.Request(
       'https://api.getkisi.com/locks/<door_id>/unlock',</door_id>
       json.dumps({}),
       {
           'Accept': 'application/json',
           'Content-Type': 'application/json',
           'X-Login-Secret': secret
       }
   )
   urllib2.urlopen(req)

That's it. Your door unlocks. Click.

Amazon Lambda

If you haven't used Amazon's Lambda service before, you are missing out. It's cool. Instead of renting hardware, virtualized hardware, or services notionally related to virtualized hardware, you are renting execution environments that are ideal for event-driven systems, with granularity in milliseconds. Our door lock problem is precisely where this paradigm shines, and what sweetens the deal further is that Amazon makes routing an event from one of their other services like Alexa to Lambda really easy and automatically secure.

Go ahead and fire up your AWS services console at https://console.aws.amazon.com and select Lambda. In the configuration designer, just click on the Alexa Skills Kit option. This tells Lambda you want to route Alexa events to this application. We will circle back to this because you will need the Alexa Skills Kit Amazon Resource Name (ARN), which we will know only after we set up the Alexa Skills Kit. Amazon CloudWatch Logs are automatically associated with the output of the Lambda function so that you can easily review any output normally directed to stdout or stderr from a traditional runtime. Now just pick the specific runtime and the function entry point, which is the file name and function name.

The declaration of a Python Lambda function looks like this:

   def open_door(event, context):

The event and context are passed in from Amazon's event handling system. To make configuration simpler for you, environment variables can be set easily in the Lambda environment. You can access them in your code like normal:

   import os
   email = os.environ.get('email')

Since we aren't using event and context in this example, we can just drop the code from above that unlocks the door into our function. We do need a response so that Alexa will know what to say upon completion, and this is just a dict that contains some simple control and SSML, which is markup for speech synthesis. Here is an example:

   return {
       'response': {
           'outputSpeech': {
               'type': 'PlainText',
               'text': 'Door unlocked',
               'ssml': '<speak>Door unlocked</speak>'
           }
       }
   }

The only thing left is to note the Lambda ARN that is at the top right of the Lambda console. This is the identifier that we need to tell Alexa about so that she knows where to send requests.

For those who prefer the command line, all of this can also be done using the aws Lambda administration functions.

Alexa and her Skills

Alexa skills are built around the paradigm of opening an application and engaging in zero or more subsequent verbal interactions.  The difficulty of the general problem of interacting with spoken language is recast as a sequence of commands from which known tokens can be extracted and interacted with. The vernacular used by Alexa to describe this is utterances and slots.  Utterances are samples of how humans might make a simple statement. For example:

   "Open the door"
   "Open this door"
   "Open that door"

We need to enumerate reasonable variations that might be encountered in practice so that the Alexa model identifies the intent of the utterance over the space of realistic human speech patterns.  The notion of a slot is an element of an utterance that is enumerated and can be extracted.  For example:

   "This is a door"
   "This is a table"

In this case, we can define a slot like this:

   "This is a {thing}"

Where thing takes on the enumerated values door or table. Clever, isn't it? The way that a much harder problem is solved is to break it into much smaller sub-problems and to facilitate better quality detection through enumerations.  This is also suggestive of the limitations of the technology — if the utterance is not bite-sized and pattern-based, it doesn't fit the current paradigm well.

To create a skill, go to the Amazon Developer Console at https://developer.amazon.com and click the Alexa Skills Kit under Alexa. Here you will select to add a new skill.

This brings you to the Skill Information form. You will need to name the skill and set the invocation command. There are some restrictions on the invocation command, but it can be mostly whatever you want. When you launch a basic Alexa skill, you speak to Alexa in terms of opening a skill:

   "Alexa, open ___"

The ___ part of the utterance is what constitutes the invocation command. Alexa allows you to pass information to the skill in the invocation, so an utterance like this is also valid:

   "Alexa, ask the weather skill about today's weather"

In this case, Alexa understands that weather skill is the invocation command for the skill and today's weather is an example of an option for a known slot specifying more granular detail that will be passed to the skill handler. The built-in logic for launching a skill is actually quite complex (and limiting), and our two examples above barely scratch the surface of it. Regardless, we aren't dwelling on the more complex invocations here, just simple command and control.

As it turns out, Amazon is watching out for our security, so invocations actually cannot contain the word door. To get around this, just use a synonym like doorway:

   "Alexa, open that doorway"

You will note that there is an Application Id on the Skill Information form. This is what you need to copy-paste over to Lambda as the Amazon Skills Kit ARN. You can select No for all of the options under Global Fields.

Next, you want to visit the Interaction Model for the skill. You don't actually need to do anything except build the model (button at the top) for this particular skill since it doesn't take advantage of new intents or define slots. It is very worth your while, however, to look at how you define intents using utterances and how to set up slots, as that is the bread and butter of how more complex interactions are created.

Remember, you build intents, potentially using slots, that map to user utterances using the Alexa Skills Kit. These are handed off to your Lambda code to evaluate, accessible in the event argument of your function. You build logic in your Lambda function that responds to Alexa regarding what to do next. Simple, but you can see where the complexity is in making sure you have adequate utterances and slots defined, particularly for more involved interactions.

The main Configuration for the skill is setting if you want to use an ARN or if you want to use an external HTTPS service. In our case, you just copy-paste the Lambda ARN in here. One of the benefits of using Lambda is that security and access control is taken care of for you with the exchange of the ARNs under your account. You can select No for the geographical region endpoints and account linking. No need select anything in the Permissions category.

If you have everything set up correctly so far, then you can test it using the Test screen. This allows you to hear what Alexa would sound like for various phrases, but more importantly, it allows you to type in an utterance and make sure that the response from your Lambda function behaves as you expect. This is very basic invocation testing, just a dummy light to make sure everything is connected and working.

There is nothing you need to do with Publishing Information or Privacy & Compliance. At this point, your personal device will engage your new skill if everything is set up properly and associated with the same account. These last sections are relevant to publishing a public skill. There are three things you can do with your newly created skill:

  1. Use it as a private custom skill with your account only
  2. Put some polish on it and submit it for review as a public skill
  3. Use it as a skill in the context of Alexa for Business

Public skills involve using the Publishing Information or Privacy & Compliance screens. Alexa for Business allows you to manage devices and skills for a fee.

What's Next?

If you have followed along so far and happen to have a similar problem with your door locks, you should be able to say:

   "Alexa, open that doorway"

The door should open! Click.

More importantly, if you haven't looked into working with Alexa, you should have a better appreciation for what we can do with it and what we can't. We live in an exciting time when rapid advancement is happening with the underlying technology. While there are limits imposed by the technology, a wide range of interactions are possible using our own creativity in solving problems.

Data Studio
Google Cloud Build
App Engine
Cloud SQL
Cloud Storage
Cloud Functions
BigQuery ML

What if you could finally give your users a tool they don’t just tolerate, but love?

Application Development Services

Get your digital product off the ground, retire antiquated systems, and create continuity across your organization. Learn more about our Application Development Services.

Tell me more →
RevUnit icon
WRITTEN BY
Colin Shaw
Director of Machine Learning, Colin comes from an analytical background in computational math, physics, and programming. He was one of the first people to graduate Udacity's self-driving car program, and has code that runs on a Lincoln MKZ. At RevUnit, Colin helps identify solutions and implement them to help our customers work better.

Related Posts