Print Version Last modified: December 16, 2014

RTF 2 HTML Lite Converter Readme

Author

The RTF 2 HTML Lite Converter, version 0.2 Alpha.
Copyright © 2001, 2002, 2003, 2004, 2008 Sergey A. Galin
Author's Homepage: http://sageshome.net
SourceForge.Net Project Page: https://sourceforge.net/projects/rtf2html-lite/

Licensing Information

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Supported Platforms

The program is written in standard C++ and can be easily ported to almost any modern operating system. Current sources may be compiled on most U**X systems including Linux, under MS-DOS (in Borland C++) and Windows (MS Visual C++ or MinGW, console application or DLL).

What Is It?

RTF 2 HTML Lite is an application to convert RTF documents to HTML :) Its key features are:

  • Very efficient and compact standard-compliant Strict HTML 4 + CSS2 code (can be easily modified to output XML/XHTML).
  • Result HTML code few times more compact then output of MS Word's HTML filter on the same RTF file.
  • Document appearance precision aimed conversion: output HTML looks very close to original RTF (see below for list of unimplemented features). The program even supports some features, unavaliable in most other filters, like invisible font, capitalization, long spaces, hard page breaks (for printing).
  • Information tags, unsupported by HTML, are converted into HTML comments (<!-- ... -->).
  • High speed and reliability (e.g. buffer overflow protection).
    Note: current version is Alpha, and was not throughfully tested, so there are possible problems when parsing non-MS Word files.
  • Portable to virtually any OS.
  • Independent from any third-party code, including STD.
  • Very low memory usage (compact code and effective parser).
  • Well-written and compact source code means that application can be easily supported improved.
  • Can be used as a part of another application (e.g. when built as Windows DLL), including both open-source and proprietary software. Please read the LGPL license for more information.

Forms Of Distribution & Installing

1. Source code.
You should know what to do with that or you don't need that ;-)
2. Linux command-line executable (in ELF format).
Unpack archive, in subdirectory 'linux' run:
$ chmod 755 rtf2html
$ cp rtf2html /usr/bin
Your system must have the following shared libraries installed:
libc.so.6
ld-linux.so.2
3. Windows DLL + headers and sample programs in Visual Basic and C++ (MinGW/Win32).
You should know what to do with DLL, otherwise you just don't need it :) Copy r2h.dll to your Windows system directory, e.g. C:\Windows\System (Windows 9x) or C:\WINNT\System32 (Windows NT/2000/XP etc) and see sample programs.
After installing DLL, you can use GUI program in Visual Basic (located in win32dll\vb-demo directory of distribution package) to convert files.
4. Windows command-line application.
Copy rtf2html.exe to any directory, specified in search path, e.g. C:\Windows.
5. DOS command-line executable.
Copy rtf2html.exe to any directory, specified in search path, e.g. C:\DOS.

You may wonder why do I compile it in so many versions. The answer is: 1) for self-training purposes; 2) just for fun; and 3) you never know what will you need some day :)

Version 0.X Command-Line Version Arguments

Usage: rtf2html [<input RTF file> <output HTML file> [<image output directory>]]
It's pretty self-explanatory. With no arguments, program prints build, copyright and usage information. Without third argument, it outputs images to
<output HTML file's directory>/<output HTML file name>.files/
(similarly to how Internet Explorer and Mozilla do when saving HTML document with images).

Note: DOS version always requires image output directory parameter, since filename.htm.files will never fit in DOS filename format ("8.3").

Partially Supported RTF Features

List support is limited. Lists are not converted to real HTML lists (OL, UL, DL). But, in most cases, they lookl exactly as they should, since most RTF editors (e.g. MS Word) add plain formatting tags for each list element into RTF.
Some list markers (bullets) are not converted. Bullet symbols which can be represented as ASCII characters are handled well. Markers based on Wingdings font (often used in MS Word) work fine in Internet Explorer but may work or not work in other browsers.

Unimplemented RTF Features (To Do)

  • Better support for OpenOffice Writer's RTF and Linux word processors, including buggy ones.
    • "Table tag not closing" bug with OpenOffice's RTF.
    • Prevent code from stupid RTF writers to generate redundant style references like f3 c4 f3 c4 f3 c4.
  • Always add HTML <TITLE>; and META tags into header.
  • Paragraph borders.
  • Table cell size control.
  • Table borders.
  • Page Headers and Footers (not sure if needed at all).
  • Support for hyperlinks (Microsoft-specific tags and/or automatic URL detection).
  • HREF/SRC URL-encoding (usually not needed).
  • Header (<H1>, <H2>...) tags (header formatting already works OK, but HTML header tags are not used).

Other Bugs / To Do's

  • Improved DLL API, ability to read from memory buffer and write output into memory buffer.
  • Converting BMP images to PNG. (Can be done for GCC versions via GD.)
  • Converting vector images (WMF) to raster. Requires portable WMF rendering library.

History

Version 0.2:
OpenOffice's tag for background color added.
Text indent tag added.
Fixed crashes with bad color indexing in OpenOffice RTF.
UNICODE RTF support. UNICODE characters converted to HTML UNICODE representation, not recoded. If there is also an ANSI representation of the symbol, ANSI used and UNICODE ignored. There is also a new flag (r2hPreferUnicode) telling that converter should use UNICODE even if ANSI version present.
Version 0.1 Alpha.
This was the first version released.

Unimplementable RTF Features

RTF tags listed below cannot be converted to HTML/CSS because according features are inapplicable to continous media, cannot be handled upon conversion or just not supported by HTML 4. Contents of these tags not output even into comments.

vern000
edmins000
paperw
paperh
paperw000
paperh000
cols
facingp
gutter000
deftab000
*\nextfile
*\template
makeback
defformat
revision
margmiror
titlepg
outl
shad
expnd000
ulw
uld
pgnx
pgny
pgndec
pgnucrm
pgnlcrm
pgnucltr
pgnlcltr
pgnstart

Section columns are also not supported by HTML (use tables instead).


© Sergey A. Galin, 2002
Go to Start page
© Sergey A. Galin, 1998-2017http://sageshome.net/oss/rtf2html.php
Time: 0.010s · RTF2HTML: 10269/10538 Total: 2120350/2400294