[Scummvm-tracker] [ScummVM :: Bugs] #11919: TTS/screenreader support

Sun Oct 25 09:39:24 UTC 2020

#11919: TTS/screenreader support
----------------------------+-----------------------
  Reporter:  ObjectInSpace  |      Owner:  (none)
      Type:  defect         |     Status:  new
  Priority:  normal         |  Component:  --Other--
Resolution:                 |   Keywords:
      Game:                 |
----------------------------+-----------------------
Description changed by ObjectInSpace:

Old description:

> ScummVM should provide the ability for in-game text to be read via
> synthesized speech. Apart from the convenience of playing interactive
> fiction without having to look at a screen, this will enable all games
> using this engine to be enjoyed by the estimated 289 million people
> around the world with vision loss, some of whom enjoy adventure. Several
> Z-code interpreters support TTS, as does Retroarch, so it should be
> feasible for this project also.
>
> Windows has an open-source library available called Tolk which should do
> most of the heavy lifting itself: https://github.com/dkager/tolk/
> There is another called Universal Speech which appears to do a similar
> thing, but I don't think it has been updated as recently:
> https://github.com/qtnc/UniversalSpeech
> Both of these libraries support JAWS+NVDA which are the most popular
> screenreaders. They also offer SAPI speech for universal compatibility.
>
> Microsoft also provides a text-to-speech API via the XBox SDK. This is
> specific to Narrator, which is also included on Windows. Similarly for
> OSX, IOS and Android, support for their screenreaders is provided via
> native accessibility APIs.
>
> Ideally, a player of an IF game should be able to hear the response of
> their input read back to them. Graphical adventure games should have the
> name of the currently selected object or action be read, along with any
> text as it is displayed on screen such as conversation prompts.
>
> Scenario 1: player of Zork 1 types :open mailbox", hears back "the
> mailbox is now open. There's a leaflet inside."
>
> Scenario 2: Day of the Tentacle player moves the cursor over the clock,
> hears "grandfather clock." Player presses o, hears "open."
> Scenario 3: player of Grim Fandango decides to talk to Carla, hears: "1.
> Busy night? 2. What's the shuttle waiting for? 3. Can I try out your
> metal detector?"
>
> There are a few different ways to achieve this. Retroarch uses optical
> character recognition (OCR), which converts the text from screenshots
> into a machine readable format via pattern matching algorithms.  another
> project called SoniFight essentially reverse-engineers certain games to
> find the text from the memory address.
> (https://github.com/FedUni/SoniFight) However I feel that these are both
> sort of hackish. The real solution should be to find when and where those
> strings are referred to in the game and then have them be exposed to that
> platform's assistive technology via ScummVM.

New description:

 ScummVM should provide the ability for in-game text to be read via
 synthesized speech. Apart from the convenience of playing interactive
 fiction without having to look at a screen, this will enable all games
 using this engine to be enjoyed by the estimated 289 million people around
 the world with vision loss, some of whom enjoy adventure. Several Z-code
 interpreters support TTS, as does Retroarch, so it should be feasible for
 this project also.

 Windows has an open-source library available called Tolk which should do
 most of the heavy lifting itself: https://github.com/dkager/tolk/
 There is another called Universal Speech which appears to do a similar
 thing, but I don't think it has been updated as recently:
 https://github.com/qtnc/UniversalSpeech
 Both of these libraries support JAWS+NVDA which are the most popular
 screenreaders. They also offer SAPI speech for universal compatibility.

 Microsoft also provides a text-to-speech API via the XBox SDK. This is
 specific to Narrator, which is also included on Windows. Similarly for
 OSX, IOS and Android, support for their screenreaders is provided via
 native accessibility APIs.

 Ideally, a player of an IF game should be able to hear the response of
 their input read back to them. Graphical adventure games should have the
 name of the currently selected object or action be read, along with any
 text as it is displayed on screen such as conversation prompts.

 Scenario 1: player of Zork 1 types :open mailbox", hears back "the mailbox
 is now open. There's a leaflet inside."

 Scenario 2: Day of the Tentacle player moves the cursor over the clock,
 hears "grandfather clock." Player presses o, hears "open."
 Scenario 3: player of Grim Fandango decides to talk to Carla, hears: "1.
 Busy night? 2. What's the shuttle waiting for? 3. Can I try out your metal
 detector?"

 There are a few different ways to achieve this. Retroarch uses optical
 character recognition (OCR), which converts the text from screenshots into
 a machine readable format via pattern matching algorithms.  another
 project called SoniFight essentially reverse-engineers certain games to
 find the text from the memory address.
 (https://github.com/FedUni/SoniFight) However I feel that these are both
 sort of hackish. Since these strings exist in the game already, ScummVM
 should ideally be able to send them directly to each platform's assistive
 technology via their applicable APIs.

--
-- 
Ticket URL: <https://bugs.scummvm.org/ticket/11919#comment:2>
ScummVM :: Bugs <https://bugs.scummvm.org>
ScummVM