Intelligent Smart Home Assistant (SHA)
This project builds upon some of our previous projects to add a text interface to our Home Control System (HCS) and the addition of voice controlled home automation to our smart home. This project also builds upon our smart home query interface, which exposes all elements of our smart home.
As well as integrating these capabilities, so that they can be considered part of our smart home Home Control System (HCS), this project adds:
- Awareness of every single object modelled in our smart home, including all object attributes and current values. Knowledge of these elements is not programmed but extracted from the same XML configuration files that our Home Control System (HCS) uses.
- Full context and learning (memory) across numerous requests and the ability to undo previous requests.
- A fully personalised experience for all family members and the ability to handle multiple concurrent interactions. The SHA also supports requests from guests and unidentified people.
- A rich context that includes the time, the person holding the dialogue, the zone and location, the object being discussed, the object type, the specific attribute(s) and whether a question has been posed or a command made.
- The ability to track the location of family members and infer further things such as direction of travel, mode of transport, estimated time of arrival, etc. This includes knowing which family members are at home and and the zone they are in.
- Understanding of time and date references. The SHA understands the concept of yesterday, tomorrow, later today, this afternoon, etc. as well as specified dates or days of the week.
- The SHA has a calendar and memory or events and requests, which it can maintain and query.
- Previous work in this space was based upon a simple request-response mechanism. This project aims to use and extend the interactions, to enable information to be pro-actively provided that was not explicitly asked for.
- Taking this pro-active concept further, we also plan to add serendipity to the SHA interactions and have it provide personalised information of interest that is not necessarily relevant to the current dialogue.
As part of this project we have also given our smart home an identity and a name.
Put simply, this project aims to provide an interface to our smart home that has a user experience very similar to Apple's Siri (Speech Interpretation and Recognition Interface) personal assistant. This includes both speech (spoken voice) interaction (using speech recognition) and text-based (typed) interaction. Because the scope of this SHA is limited to our smart home only, we feel our objectives are realistic and achievable.
Many people focus on speech recognition when considering natural language interfaces. This is because voice control of things is intuitive and impressive (when it works). It is also very convenient in situations where you hands are being used to do other things. It is also useful in improving accessibility. Perhaps the biggest advantage though is that no learning is required. All of the family can use our smart home assistant, without any form of training.
The reason we have not focussed on speech recognition alone is that it doesn't work well in all environments. Background noise is a problem and sometimes you need the the accuracy and confidence of text input, to ensure you will be understood. Natural language text input also has the advantage of providing privacy both in the request and the response. Other significant advantages are that our text-based interfaces also provide both user identity and authentication. They also provide better remote access, when outside of our smart home.
Most voice control systems that run on mobile devices still require you to press a button on your device before you can speak and be recognised. There are some that look for trigger commands and provide complete hands-free operation though but, they typically require bespoke and expensive hardware.
A generic voice controlled interface to your smart home is only as secure as the physical access to the hardware. Typically, anyone can control you home, once they have access to the interface.
Many systems are also slow to respond but, we have taken an approach that is extremely responsive.
The SHA extracts knowledge from the XML configuration files used by our Home Control System (HCS) and can therefore reload them automatically (on detecting file changes), to update its knowledge of our home, its configuration and objects within it. It creates a model of our home and all the components within it and uses this to answer requests. It therefore has an understanding of all zones, sensors, devices, etc. (all objects) and their attributes.
Our approach is based around extracting all of the context information from each new request. Any new context is added to previously held context information or used to update context elements. This context has a limited lifetime and is lost after a defined period of time. It will also be reset by certain requests. We maintain a history of each (successful) request and its associated context, so that we can 'undo' previous requests.
All requests and their associated context are logged, so that we can see and analyse all requests and learn from the on-going user interaction.
Once we have extracted the new context information, we then process it to determine the response and actions required.
Our SHA is programmed to respond and like a real person and use simple but courteous language in a realistic (normal) fashion. It doesn't call me 'sir' or use long-winded sentences to convey small pieces of information. Acknowledgments are concise and simple, e.g. "Done". It will use greetings such as 'Good morning', 'Good afternoon' and 'Good evening' when appropriate but, only at the first interaction in the each of these time periods.
We envisage two interfaces in order to access the SHA:
- A physical device roughly the size of a wall switch faceplate with a single 'press to talk' button and microphone. This may be fixed or portable.
- A Smartphone app with a single 'press to talk' button and text input area. Responses via the app will be spoken out or in text form, depending on how the request was made. They can be used interchangeably throughout a session.
Context - Not everything needs rich context. Some simple commands and keywords are quickly and easily understood, without any complex processing. When these are identified it is better to action them directly and ensure rapid response.
Calendar - With an identity and calendar, the SHA can keep track of certain requests over time. This could be things like holiday, etc. The calendar itself is another interface and way to interact with our Home Control System (HCS) directly.
Objects - The SHA understands every single object in our home as defined in the XML configuration.
Object Type - Every object has an associated type, e.g. light, door, etc. The SHA understands object types and can use them to ask questions and clarify which specific object is in context if need be.
People - Our whole Home Control System (HCS) and SHA share a common model and understanding of people within our smarthome (and outside of it). This includes current zone, location, presence, devices, etc. It can infer which of the known people are in context or the target of any queries and requests.
Performance - Our current system separates out the 'intelligence' into a Java class that has application across all interfaces. It can return a meaningful result with no noticeable time delay.
Pro-active - If the SHA is going to pro-actively provide information that was not explicitly asked for, then this needs to be done in such a way that it doesn't become an annoyance to the user.
Serendipity - This is a further extension to the concept of providing pro-active information. It has even more scope and likelihood of being annoying if implemented badly.
Scenes - We don't have a lot of scenes but, our SHA knows about those that do exist and understands requests to run them.
Scheduled - Some events cannot be actioned immediately and need to be scheduled and checked until they occur. An example of this would be 'Let me know when <person> get home'.
Speech Recognition - Accurate speech recognition is required, to ensure this interface doesn't frustrate but the speech recognition and speech announcements are just the transport. The real intelligence is independent of both. This independence enables us to support both speech and typed interactions.
Targets - The SHA understands 'targets'. These are explicit and inferred values that the interaction is aiming for. This includes things like 'open', 'closed, 'on', 'off', 'brighter', 'cooler', etc. As you can see, these are both absolute and relative and are validated within the context of the object or object type under discussion.
Time - Our SHA has an awarenes of the current time and relative time. It understands the concepts of yesterday, tomorrow, this afternoon and more fuzzy concepts such as later. It also understands absolute dates and times. We plan to add more advanced relative date handling, such as 'next week', 'next Sunday' and 'in 2 weeks time'.
User Identity - In most instances, the interface itself provides the identity of the person making the request. The SHA will handle requests from shared interfaces, where this identity information is not available and then must decide if it needs to ask for this information, in order to complete the request. Quite often, the identity of the person making a request simply does not matter.
Zones - Our SHA understands and recognises zones. It can determine which zone the person currently interacting is within. It also models the whole house as as zone and concepts such as 'outside' and 'external'.
The following videos have inspired us to undertake this project:
A Day Made of Glass... Made possible by Corning. (2011)
Fibaro - Your home, Your Imagination
Jarvis v0.1 (Artificial Intelligence based Operating System)
Jarvis project website
Temperature - A simple request such as "What's the temperature?" would assume it related to the current zone (room) within our home. Recognised words such as 'external' or 'outside' would move the zone context to be 'Global' and a different result would be returned.
Weather - A question about the weather will result in our SHA assuming the question is about today's weather, unless the query identifies something in the request that implies a different time context. Following up the query with a statement such as "and tomorrow?", would simply change the time context of the original request and result in a suitable response.
Who - A simple question such as, "Is Rob at home?" shows the level of integration required between devices, services and internal sensors. In this instance 'Rob' is the person context and 'home' is the location context.
An example of providing information proactively is weather warnings (temperature is below 1°C) in the morning, messages on the answer machine (PSTN fixed phone line), missed callers at the front door, etc.
The plan is to have some initial demos here showing the text-based interface soon. This will be followed soon after with speech interaction demos. The text interaction is currently performed using Google Hangouts. This provides the identity for our smart home and manages the trust relationships with our smart home. It also provides the ability for our smart home to identify who the interaction is with, so the experience is truly personalised.
Progress / Summary
This project is progressing much quicker than we expected or planned. Mainly because we are coding late into the night!
We are now using a text-based version of this interface on a daily basis, to see how well it works and improve the intelligence and modelling aspects. This project has replaced the previous query interface behind our XMPP interface. We have also developed a desktop client to test out the more complex algorithms and behaviour.
We will be working on this project through 2014. As our smart home grows in scope and features, our SHA will grow with it.
It amuses my colleagues at work, that I have a dialogue with my home but, it is only through regular dialogue that I can learn and improve our SHA.
- We would like add speaker verification (voice authentication).
News & Articles
- Command Fusion - Voice Control in Home Automation: Fad or Future? (18th December 2013)