[Scummvm-devel] README

Mon Feb 9 01:36:04 CET 2004

Am 08.02.2004 um 18:51 schrieb Marcus Comstedt:

>
> Max Horn <max at quendi.de> writes:
>
>> Well, all this still leaves my question open:
>>
>> How do you output useable PLAIN TEXT from a (La)TeX source?
>
>
> A working, but slightly hackish, way to do it is
>
> latex2html -no_navigation -info 0 -no_subdir foo.tex
> lynx -dump foo.html > foo.txt
>

And if you add "-split 0", it'll put everything into a single HTML 
file, yeah. I ended up trying this:
   latex2html -no_navigation -split 0 -info 0 -dir html 
-show_section_numbers readme.tex

At least to me, already the result of latex2html don't look that great. 
But when I then use lynx to dump it into a text file, it's far far 
worse than our manually formatted README. Granted, I always expected a 
loss compared to the hand tuned plain text layout in there.

I also tried html2text (from 
http://userpage.fu-berlin.de/~mbayer/tools/html2text.html). In many 
cases the output was better; but in some places, lynx "wins". I used 
this command for it:
   html2text -o readme.txt -nobs -style pretty readme.html

If you wonder what I am talking about, some sensitive spots:
* "7.5 Using MP3 files for CD audio": Look at the example command (and 
whether it sticks to the paragraph before it or is nicely distinct)
* "1 About": Are paragraphs separated by a blank line, or do they all 
stick together
* "2.1 Reporting Bugs": look at the text "Please include the following 
information": does it stick to the paragraph before it, or the table 
after it, or is it separated from both with blank lines?
* "5.1 Command Line Options": The list of command options is unreadable 
in the lynx output, but fine with the other tools
* In the same section, the Examples are much better in html2text than 
in all the others
* The credits are quite sensitive (esp. the 'headlines' of the 
subsections)
* html2text does funny things with the underline style 
("This_is_underlined_text")
* the slight indention lynx uses everywhere makes it a bit easier to 
visually navigate to certain sections

Furthermore I tried elinks and w3m. It seems elinks produces output 
which is strictly better than lynx. With w3m, I noticed that it 
produces "bad" lists (a blank line between each list item).

Next thought was that maybe some improvements could be achieved by 
using another converter from LaTeX to html. The only one I know is 
"hevea" (http://pauillac.inria.fr/hevea), which is written in Ocaml. 
First thing to notice, it's *much* faster than latex2html. I am talking 
about an order of magnitude at least.

Running the hevea created HTML through w3m, elinks and html2text gives 
results which are pretty good, I think. Definitely better than 
converting the latex2html output. Forget about lynx, though, it's still 
giving crap output.

Hevea has actually a text output mode! Very nice, since it does fancy 
things like doing "ASCII underline", e.g.:
2.1  Reporting Bugs
===================
At first some of the tables looked *really* bad, though. In the credits:
     Hannes      Readme Conversion
     Niederhause
     n
However, that turned out to be caused by the fixed (4cm) table column 
width. We could either change that back, or maybe there is a switch for 
hevea to make it ignore that. Anyway, reverting "p{4cm}" back to "l" in 
said tables, it worked fine. Still, I have some grieves with this 
direct text mode.

Hevea also allows embedding HTML commands in the LaTeX source, for 
custom formating, which I think would be useful.

Summary: To me, the only adequate output was generated by hevea + 
elinks/w3m/html2text. But to let you draw your own conclusions, I 
included the generated files in an attachment to the mail. [I had to 
remove the attachment, apparently the first time I sent this mail, 12 
hours ago, it got filtered out due to it]
The HTML output of hevea and latex2html is OK, too. As a result, I 
don't see any advantage in writing the docs in HTML (converting them to 
text would still have the same problem described here; but making nice 
PDF output would be harder, and we loose all the structural information 
which LaTeX has about a text).

Cheers,

Max