RTF 2 HTML Lite Converter Readme
Author
The RTF 2 HTML Lite Converter, version 0.2 Alpha.
Copyright © 2001, 2002, 2003, 2004, 2008 Sergey A. Galin
Author's Homepage: http://sageshome.net
SourceForge.Net Project Page: https://sourceforge.net/projects/rtf2html-lite/
Licensing Information
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation,
either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
Supported Platforms
The program is written in standard C++ and can be easily ported to almost
any modern operating system. Current sources may be compiled on most U**X systems
including Linux, under MS-DOS (in Borland C++) and
Windows (MS Visual C++ or MinGW, console application or DLL).
What Is It?
RTF 2 HTML Lite is an application to convert RTF documents to HTML :)
Its key features are:
- Very efficient and compact standard-compliant
Strict HTML 4 + CSS2 code (can be easily modified to output
XML/XHTML).
- Result HTML code few times more compact then output of MS Word's HTML
filter on the same RTF file.
- Document appearance precision aimed conversion: output HTML looks very close to original
RTF (see below for list of unimplemented
features). The program even supports some features, unavaliable in most
other filters, like invisible font, capitalization, long spaces, hard page
breaks (for printing).
- Information tags, unsupported by HTML, are converted into HTML comments
(<!-- ... -->).
- High speed and reliability (e.g. buffer overflow protection).
Note: current version is Alpha, and was not throughfully tested,
so there are possible problems when parsing non-MS Word files.
- Portable to virtually any OS.
- Independent from any third-party code, including STD.
- Very low memory usage (compact code and effective parser).
- Well-written and compact source code means that application can be
easily supported improved.
- Can be used as a part of another application (e.g. when built as Windows DLL),
including both open-source and proprietary software.
Please read the LGPL license for more information.
Forms Of Distribution & Installing
- 1. Source code.
- You should know what to do with that or you don't need that ;-)
- 2. Linux command-line executable (in ELF format).
- Unpack archive, in subdirectory 'linux' run:
$ chmod 755 rtf2html
$ cp rtf2html /usr/bin
- Your system must have the following shared libraries installed:
libc.so.6
ld-linux.so.2
- 3. Windows DLL + headers and sample programs in Visual Basic and C++ (MinGW/Win32).
- You should know what to do with DLL, otherwise you just don't need it :)
Copy r2h.dll to your Windows system directory, e.g.
C:\Windows\System (Windows 9x) or C:\WINNT\System32
(Windows NT/2000/XP etc) and see sample programs.
- After installing DLL, you can use GUI program in Visual Basic (located in
win32dll\vb-demo directory of distribution package) to convert
files.
- 4. Windows command-line application.
- Copy rtf2html.exe to any directory, specified in search
path, e.g. C:\Windows.
- 5. DOS command-line executable.
- Copy rtf2html.exe to any directory, specified in search
path, e.g. C:\DOS.
You may wonder why do I compile it in so many versions. The answer is:
1) for self-training purposes; 2) just for fun; and 3) you never know what
will you need some day :)
Version 0.X Command-Line Version Arguments
Usage:
rtf2html [<input RTF file> <output HTML file> [<image output directory>]]
It's pretty self-explanatory. With no arguments, program prints build,
copyright and usage information. Without third argument, it outputs
images to
<output HTML file's directory>/<output HTML file name>.files/
(similarly to how Internet Explorer and Mozilla do when saving
HTML document with images).
Note: DOS version always requires image output
directory parameter, since filename.htm.files will never fit
in DOS filename format ("8.3").
Partially Supported RTF Features
List support is limited. Lists are not converted to
real HTML lists (OL, UL, DL). But, in most cases, they lookl exactly as
they should, since most RTF editors (e.g. MS Word) add plain formatting
tags for each list element into RTF.
Some list markers (bullets) are not converted.
Bullet symbols which can be represented as ASCII characters are handled well.
Markers based on Wingdings font (often used in MS Word) work fine in
Internet Explorer but may work or not work in other browsers.
Unimplemented RTF Features (To Do)
- Better support for OpenOffice Writer's RTF and Linux word processors,
including buggy ones.
- "Table tag not closing" bug with OpenOffice's RTF.
- Prevent code from stupid RTF writers to generate redundant style references like f3 c4 f3 c4 f3 c4.
- Always add HTML
<TITLE>
; and META
tags into header.
- Paragraph borders.
- Table cell size control.
- Table borders.
- Page Headers and Footers (not sure if needed at all).
- Support for hyperlinks (Microsoft-specific tags and/or automatic URL detection).
- HREF/SRC URL-encoding (usually not needed).
- Header (<H1>, <H2>...) tags (header formatting already works OK, but
HTML header tags are not used).
Other Bugs / To Do's
- Improved DLL API, ability to read from memory buffer and write output into memory buffer.
- Converting BMP images to PNG. (Can be done for GCC versions via GD.)
- Converting vector images (WMF) to raster. Requires portable WMF rendering library.
History
- Version 0.2:
- OpenOffice's tag for background color added.
- Text indent tag added.
- Fixed crashes with bad color indexing in OpenOffice RTF.
- UNICODE RTF support. UNICODE characters converted to HTML UNICODE representation,
not recoded. If there is also an ANSI representation of the symbol, ANSI used and
UNICODE ignored. There is also a new flag (r2hPreferUnicode) telling that converter
should use UNICODE even if ANSI version present.
- Version 0.1 Alpha.
- This was the first version released.
Unimplementable RTF Features
RTF tags listed below cannot be converted to HTML/CSS because according
features are inapplicable to continous media, cannot be handled upon conversion
or just not supported by HTML 4. Contents of these tags not output even into
comments.
vern000 edmins000 paperw paperh paperw000 paperh000 cols facingp gutter000 deftab000 |
*\nextfile *\template makeback defformat revision margmiror titlepg outl shad expnd000 |
ulw uld pgnx pgny pgndec pgnucrm pgnlcrm pgnucltr pgnlcltr pgnstart |
Section columns are also not supported by HTML (use tables instead).