Those who are a little apprehensive about the inevitable robotic uprising may want to skip this one.
Amazon has just announced three new AI services for AWS users at this year’s re:Invent conference. The services, ranging from intelligent conversational user interfaces, to smart image recognition, and human-like text to speech, which all allow AWS customers to engage their users in a way that is straight out of science fiction.
People excited about the next big thing tech is going to bring to us needn’t look further than something we’ve been doing since the dawn of mankind: talking. To this point, we’ve been poking, prodding, and swiping on our devices to get them to do our bidding. Speech has sort of started to surface, but the limited power of our devices has made that more frustrating than actually usable. The cloud changes things, and does so in a huge way. This brings us to the first of Amazon’s new AI services, Lex.
Amazon Lex gives you the tools needed for creating conversational interfaces with voice and text through automatic speech recognition, and natural language understanding. To put it in a much less technically, Lex allows your interface to understand speech the way an actual person decodes speech. By deriving the meaning behind your speech, rather than forcing you to adhere to a particular syntax, Lex makes voice and text interaction as simple as talking to a human being. It also allows layered conversations that can reference previous levels, just like an actual human agent, so you don’t always have to keep repeating yourself.
Twilio, a huge, global company that is behind the automated messages from a lot of serices, such as Uber, ebay, WhatsApp, eHarmony, and many others, had nothing but praises fir the new service. applications. “Developers and businesses use Twilio to build apps that can communicate with customers in virtually every corner of the world,” said Benjamin Stein, Director of Messaging Products, Twilio. “Amazon Lex will provide developers with an easy-to-use modular architecture and comprehensive APIs to enable building and deploying conversational bots on mobile platforms. We look forward to seeing what our customers build using Twilio and Amazon Lex.”
Another new service announced at re:Invent was Amazon Polly. Where Lex allows for understanding human speech, and letting language be understood the way humans do, Polly is a service that lets the machines talk back to us in turn. The benefits are far-reaching. From mobile apps, devices, appliances, and other interfaces, Polly lets developers input text, that is then streamed as audio. They have 47 convincingly human-sounding voices across 24 languages, in both male and female voices, and a variety of accents. The speed, consistency, and quality of Polly is unparalleled, and charging, as with most all of the Amazon Cloud services, is based solely on the amount of text converted.
As a widely respected, reliable, and far-reaching organization, The Washington Post needs no introduction. With an output of more than 1200 stories a day, Polly is of particular value to them. “We’ve long been interested in providing audio versions of our stories, but have found that existing text-to-speech solutions are not cost-effective for the speech quality they offer,” said Joseph Price, Senior Product Manager, The Washington Post. “With the arrival of Amazon Polly and its high-quality voices, we look forward to offering readers more rich and versatile ways to experience our content.”
The last of the three AI services launched at this year’s massive re:Invent was Amazon Rekognition. This lets devs create applications with strikingly effective image recognition capabilities through deep learning. By looking at borders, colors, shapes, and other characteristics of a given image, Rekognition can tell you whether you’re looking at a boat, cat, forest, or ice cream sundae. One of the more interesting applications of Rekognition shown off at the launch was integration with a travel app that gave you photo suggestions for destinations the user may be interested in based on a series of questions, without the use of previously embedded tags or image names.
It also has value with facial search and recognition. SmugMug, whose customers store billions of photos daily, has obvious use for Rekognition. Rather than have their users spend most of their time searching and tagging manually, the Rekognition service does that all quickly and accurately. This allows them to focus on sharing and reliving the memories, rather than slogging through endless thumbnails and folders.
In addition to these services, AWS recently announced it is investing significantly in MXNet, an open source distributed deep learning framework, initially developed by Carnegie Mellon University and other top universities, by contributing code and improving the developer experience. MXNet will enable machine learning scientists to build scalable deep learning models that can significantly reduce the training time for their applications. For more information on AWS support for MXNet, visit: http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html.
Whether or not we’ll be fighting our toasters for control over our homes remains to be seen.