Comparisons of input modalities and methods

Comparisons of Computer
Input Modalities and
Methods
Yoshiharu Sato, http://yo-sato.com/

Input methods
• Mechanical Movement
• Audio
• Gaze
• Brain
• Multimodal Fusion

Mechanical Movement
• Advantages
• Easy to control
• Disadvantages
• Speed is limited by the mechanical movement.
• Hand/Finger methods
• Body Gestures
• Muscle sensing

Hand/Finger methods
• Advantages
• Disadvantages
• Artifact (input device) must be within reach of the user.
It does not suit to remote control scenario or mobile
scenarios.
• Hand is busy to type characters or hold a device. Eyes
are busy to look at keyboards or touch panel. It does not
suit to mobile scenario.
• Keyboard
• Handwriting

Keyboard
• Advantages
• Keys directly map to characters, and there is smaller problem of
recognition accuracy than hand-writing or voice recognition (note:
it does involve recognition for East Asia ideogram input). This is one
of reasons why it is hard to beat keyboard as mainstream input
means.
• Keys can represent any functions and language is rich.
• Device is cheap.
• It requires smaller computation cost than the others.
• Disadvantages
• Keyboard input operation is not natural.
• Inputting texts reply on the knowledge of key positions (“memory in
the world”, Norman 1988).
• Hardware keyboard
• Software keyboard

Hardware keyboard
• Advantages
• Keys are fixed. By that, human can rely on the
knowledge in the world [Norman, 1988], and it’s easy to
operate.
• Disadvantages
• Keys are fixed and limited, and functions bound to a key
are sometimes over-loaded and modes are introduced
to confuse users.

Software keyboard
• Advantages
• Keys are configurable by software, and there is no need of
over-loading of keys.
• Touch language is richer than key press.
• Disadvantages
• Touch is less accurate than mouse, and requires more efforts
in correction than the hardware keyboard
• Keyboard occludes screen real-estate, and distracts user’s
thinking.
• Keys are part of touch monitor, and it is hard to use under an
extreme lightening condition.
• Keys are configurable by software, and users need to look for
key positions and a new key layout requires re-learning.

Handwriting
• State of art
• Online hand-writing technology was established in 90’s.
• Typical recognition engine goes through the process of
normalization of input data (e.g., base-line, slant/slope),
segmentation and feature extraction and classification
(dynamic programming, neural network or so + language
model).
• Commercial engines have about 10% character error rate for
isolated characters in boxes, and 20％ character error rate for
run-on mode in 90’s. They were close to practical accuracy.
• Hand-writing is already integrated into most retail devices such
as PC or Smart Phone.
• Offline handwriting technology has not reached to a
practical use.

Handwriting (Cont’d)
• Advantages
• Ink is the character, and it is direct and intuitive. Human has
been familiar for a long time.
• Pen can play the role of mouse.
• Silent. It can protect privacy.
• Not subjective to environment noise.
• Disadvantages
• Ink needs a conversion to character codes, recognition cannot
be 100%, and recognition results require corrections.
• Hand-busy
• Finger-movement to write a character is complex and time
consuming.

Body Gestures
• State-of-art
• Body gestures are recognized by computer vision, or
motion sensor.
• Microsoft Kinect
• Leap Motion
• NTT DoCoMo “UbiButton”
• “Ring”
• Shiseido
• There is a research to use tongue gestures by magnetic
sensors.

Body Gestures (Cont’d)
• Advantages
• Gesture language can be richer than mouse/keys and touch
by virtue of 3D.
• It does not occlude screen real-estate.
• Disadvantages
• Computer vision is subjective to occlusion and light condition.
• 3D freehand pointing precision may be lower than that with a
2D surface.
• Freehand gestures involve more muscles than
keyboard/mouse interaction, and large/frequent arm
movements cause fatigue over time.
• It’s socially awkward. It is strange if I make gestures against
machine in crowded environment.

Muscle sensing
• State-of-art
• EMG (electromyography) in forearm muscle-sensing
band can classify finger moves.
• There is no commercial system yet for computer
commanding.
• There are several vendors of EMG, and low-end device
costs less than $1,000.

Muscle sensing (Cont’d)
• Advantages
• Muscle can be sensed by a non-obtrusive way without
some artifacts in the reach of the user.
• It allows hand-free operations.
• It doesn’t require observable interaction that can be
socially awkward. It protects privacy.
• Not interfere with environment as voice recognition or
computer vision.
• Fatigue free.
• Disadvantages
• It is limited by mechanical movement speed.
• Language must be designed.

Audio
• Advantages
• Speaking is direct, intuitive, and natural. Human has been familiar
with it for a long time. People don’t have to learn speaking. So
consumers perceive speech interface as not a input task.
• Hand-free and eye-free, and suites to mobile scenario.
• It is 5 times faster to speak than writing/typing.
• Disadvantages
• Voice needs a conversion to character codes, requires recognition,
and corrections.
• There is a segmentation problem of conversation, commands, and
text recognition.
• Voice recognition
• Silent speech recognition
• Lip reading

Voice recognition
• State-of-art
• Voice recognition technology has been investigated since
1960’s, established in 1990’s.
• Voice recognition has been already in practical use in call
centers, medical jobs, and any time-critical jobs but
documentation is required. Remote hand-free control by
speech in a car is also in practical use. The remote control of
home equipment’s is also starting up.
• There have been researches to use speech as primary and use
other method for confirmation, selection, or correction. A
research showed a double of T9 productivity. A research
combines speech with Gaze and Dasher, and gained twice
productivity compared Dasher only.

Voice recognition (Cont’d)
• Advantages
• Voice can communicate emotions.
• Disadvantages
• It is subjective to environmental noises. Recognition
accuracy drastically drops in noisy environment by 20-
50%. The accuracy degradation comes from natural
spontaneous interaction or diverse speaker too.
• It’s socially awkward in two ways
• Speaking is loud and invites noises to the others.
• It doesn’t keep privacy. It does not suit to crowded
environments.
• See http://yoshiharusato.wordpress.com/2014/05/29/why-
speech-recognition-do-not-work/.

Silent speech recognition
• State-of-art
• Research of non-voiced speech recognition emerged
recently. Alternative to air-microphone are throat
microphone, surface EMG (electromyography),
ultrasound imaging of tongue and lips, and a type of
stethoscope microphone.
• There is no commercial system yet.

Silent speech recognition
• Advantages
• Silent speech solves the most critical defects of voiced
speech recognition.
• It is robust against environmental noise.
• It protects privacy.
• Disadvantages
• Technology practicality is to be proved.
• The quality of body-conducted speech degrades compared
with normal speech.
• NAM is not able to recognize pitch (Tone of Chinese).

Lip reading
• State-of-art
• Lip reading is approached from pattern recognition by
computer vision, or muscle move recognition by EMG
(Electromyography). The computer vision approach is still the
level of limited vocabulary (Takeshi Saitoh, 2009). Word
recognition rate is about 80-90%. EMG approach can
distinguish only vowels.
• According to (Rosenblum, 2010), human lip-reading experts
can read tong positions, air flows, and tones by observing
subtle moves of chin, cheek, and face. Theoretically the
technology should be able to overcome the current
limitations.
• There are a number of researches to use lip reading to
supplement speech recognition, or combine it with keys.
• There is no commercial system yet.

Lip reading (Cont’d)
• Advantages
• Lip reading solves the most critical defects of voiced
speech recognition.
• It is robust against environmental noise.
• It protects privacy.
• Disadvantages
• Lip reading is not matured yet as a standalone
technology.
• Computer vision approach is subjective to occlusion and
light condition.

Gaze
• State-of-art
• It’s approached by computer vision. There are already some
commercial systems. Most of commercial systems measure the
Point-Of-Regard by “corneal-reflection and pupil-center” method
with an infrared camera. There are a number of vendors. Gaze
tracking is applied in Digital camera called “Iris” to sense focus.
• There are remote sensor type and head-mounted type. Head-
mounted eye tracker can take advantage of higher accuracy and
simplified geometry, and robust against head moves.
• Current eye-tracking systems achieve an accuracy of 0.5 degrees
(equivalent to a region of approximately 15 pixels on a 17” display
with a resolution of 1024x768 pixels viewed from a distance 70cm).
• There have been a number of researches of eye-typing for disabled
people. They use software keyboard or dasher with gaze. There was
a research to apply the gaze tracking to replace candidate selection
in document authoring scenario, which observed more than half
the time was spent on looking and selecting the right choice from
candidate list with traditional IME.

Gaze (Cont’d)
• Advantages
• Eye gaze moves quicker than hand/finger/body. A simple target selection and cursor
positioning operations were performed approximately twice as fast as with an eye tracker
than with any of the conventional cursor positioning devices. When all is performing well,
eye gaze interaction can give a subjective feeling of a highly responsive system, almost as
though the system is executing the user’s intentions before he or she expresses them
(Karn, 2003).
• The eyes can move without fatigue.
• The time required to move the eye is not related to the distance to be moved, unlike most
other input.
• Operating the eye requires no training or particular coordination for normal users.
• Disadvantages
• It is difficult how to interpret Point-Of-Regard if we don’t use other means or control.
Moving one’s eyes is often an almost subconscious act, and eye movement is always “on”,
called “Midas Touch” problem (Karn, 2003).
• Dwell time (hampering speed, fatiguing), “gaze-and-touch”, or eye gesture were used to solve this.
• Eyes basically provide only positional information.
• Computer vision is subjective to occlusion and light condition.
• It requires calibration before use.

Brain
• State-of-art
• The brain-machine interface may replace any human computer interactions
someday. But it is not certain when brain-machine interfaces can deal with texts
or symbol sequences.
• It uses
• expensive high-end sensors as
• fMRI (Functional Magnetic Resonance Imaging)
• or Brain blood pattern by fMRI (functional magnetic resonance imaging)
• or MEG (Magneto-encephalography),
• or low-end sensors as
• NIRS (Near-infrared spectroscopy)
• or EEG (Electro-encephalogram.
• MSR showed off-the-shell EEG ($1500) can classify several brain states [Tan,
2005]. Hitachi offers “Kokorogatari” (2005) which tells Yes/No by NIRS. Honda
research succeeded in 2006 to distinguish 3 symbols ‘paper, stone and
scissors‘ by fMRI. Honda research also showed in 2009 a robot ASIMO moves
arm and foot as commanded by EEG & NIRS system.
• There are ventures who offer some solution: NeuroSky, Inc, BrainGate, and
Emotiv Systems.

Brain (Cont’d)
• Advantages
• Eye-free, Hand-free.
• Disadvantages
• Technology is not matured yet. EEG requires intense
focus at present.

Multimodal Fusion
• Advantages
• Users have a freedom of choice of modality. It
contributes to reliability (error correction).
• Can support more users.
• Modality fusion usually outperforms uni-modal
recognition.
• Disadvantages
• Processing (either early fusion or late fusion) could
become more complex than mono-modal methods.

Input methods
• Mechanical Movement – slow but reliable
• Audio – fast for text input
• Gaze – fast for pointing
• Brain
• Multimodal Fusion

Summary
• Silent Speech (including Lip Reading) is a preferred
technology of text input.
• Gaze is the fast pointing method and provides
information of user’s intention.
• Finger dexterity is reliable to control & command
machines.

Comparisons of input modalities and methods

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Comparisons of input modalities and methods

Similaire à Comparisons of input modalities and methods (20)

Plus de yoshiharu sato

Plus de yoshiharu sato (8)

Dernier

Dernier (20)

Comparisons of input modalities and methods