Setext

Basic Information About Setext

Setext is a simple text formatting language that was originally designed for use in simple email and USENET exchanges. It has gradually fallen by the wayside to HTML; however, it remains an extremely simple and useful way of presenting text in a structured fashion.

Programs

  • setext2html.pl – original script from BSDi.com for setext2html conversion
  • setext2latex.pl – my own perl setext to latex converter
    (Revised 16 Sep 2007; v 1.13)
  • setext2html.pl – my own perl setext to html converter
    (Revised 9 Sep 2007; v 1.9)

Tag Reference

Name Setext Pattern Example Displayed As / Comments
(a) subject-tt Subject: …[From: … & Date: … ]
Subject: Re: SGML vs. Setext
From: user1@example.com
Date: 10 Apr 2001
Displayed literally w/ minimal number of headers.
Mail/USENET headers. These bits of information primarily relate to mail/news. Can accomplish a third level of hierarchy.
Note, my setext2latex parser requires that the headers be at the start of the line and only recognizes “Subject”, “Date”, and “From” translating them to the \title, \author, and \date, of the LaTeX title page respectively.
(b) title-tt “Title
=====”
This is a long title
====================
Displayed in a (user) selected style for titles.A distinct title identified by the text, maximum one per setext. Must start at the beginning of the line.First title-tt, subhead-tt, or subject-tt found scanning the file from top to bottom will become the LaTeX \title. Therefore a subject-tt should be before a title-tt which should be before an subhead-tt. No error is raised by the parser on multiple title-tt’s or multiple subject-tt’s after the first one.
(c) subhead-tt “Subhead——-“
Subheading One
--------------
Displayed in a (user) selected style for subheadings. A distinct subheading identified by the text, zero or more per text. Must start at beginning of line. See note in title-tt about handling.
(d) indent-tt 66-char lines indented by 2 spaces
  First paragraph..
  more of paragraph.

  Next paragraph...
Lines undented and unfolded (longer lines are generally tolerated by most parsers). This is primary body text, generally plain undented in emails, etc. currently.
(e) bold-tt **[multi]word**
This is **very important**...
Display in a (user) selected style, preferrably bold.One or more bold words, generally *word* or **word** in emails
(f) italic-tt ~word~
This is an ~italic~ word.
Display in a (user) selected style, preferrably italics.Single italicized word; unclear why multi-word form not available.Multi-word form of ~first~second~third~ supported by setext2latex.
(g) underline-tt [_multi]_word_
This is _underlined_text_.
Display in a (user) selected style, preferrably with underlining–except
in browsers where underlining corresponds to hot links.One or more underlined words
(h) hot-tt [multi_]word_
This is a hot_word_.
Used in conjunction with href-tt to make footnotes or hyperlinks. In setext.pl provided
at home page it makes the hot-tt a hyperlink to the corresponding href-tt. In my LaTeX converter
the href-tt becomes a footnote associated with just after the hot-tt.Hypertextual 1+ word
(i) include-tt >[space][text]
> This is quoted text...
> ...more...
Displayed in a user selected style, preferrably monospaced with the leading “>”Normal text quoting style of news/mail user agents.
(j) bullet-tt *[space][text]
* Item 1 that is...
  ...really long* Item 2
Displayed in bullet or list format.
Ambiguity as to whether “*” has to be in leftmost non-space
position or absolute first position, actual practice suggests first non-space position. Also, handling of run-on lines with bullets is ambiguous, I resolve by allowing lines to be wrapped according to 2-char indent rule.
(k) quote-tt `[typo tags from (a)-(p)]`
`here's some _underlined_text_ to show literally`
Displayed literally, e.g as if containing typotag was not therein. Probably could omit “`” marks in display.
Mostly useful for presenting stuff about setext in setext.
Implementation is ambiguous, setext2latex handles “ as a literal `. Otherwise handles everything from the first ` to the next ` as a literal string even across multiple lines.
(l) href-tt ^.. _hot_word URL
^.. _hot_word http://www.this url
Not directly realized except with hot-tt. URL could also be some text for a footnote.
Modified in version 1.11 to allow wrap around onto multiple lines that start with “^.. “. [hypertext link def]
(m) note-tt ^.. _hot_word Note:(“*”)
^.. _hot_word Note:("Here's an error")
Generate an error. Unclear why this is even a typotag; unclear why/how to use. One possibility would be to use for footnotes while href-tt is used for hyperlinks. [hypertext note def]
(n) twobuck-tt $$[at end of line]
This is the end of this setext. $$
[start parsing a new setext within this file]
Used to mark the end of the first (or only) setext in a file. Generally appears at the end of the file since most files include only a single setext.
(o) supress-tt ^..[space][not dot]
.. This won't show up.
[not shown] not presently used in email/etc. generally
(p) twodot-tt ^..[alone on line]
..
[noted; not shown] logical end of text

Comments

The authoritative reference is no longer available at BSDI; however, the existing table lacks concrete examples. My examples attempt to correspond to the usage in the setext.pl script from BSDI, usage by others, e.g. Tidbits, and my own usage.The specification is somewhat inadequate for describing behavior, e.g. title-tt and subhead-tt have to be at the start of the line, etc.Similarly, from an implementation standpoint the existing parsers for setext that I have encountered do not properly handle multiple layers of include-tt or even typo-tags within include-tt.

Note: only one instance of the element (c) (or, in its absence, (b)) is absolutely required for a text to be considered a valid setext.

All the elements but (c) are in effect optional, not necessary for a setext to be declared as such. Element (a) deals with setexts that arrive via email and end up being parsed (processed) as unedited mailbox files; fully employed the (a), (b) and (c) make it possible to distribute “multisetexts”, i.e. setexts with one additional level of logical structure (= more than one setext per message; more than one message in a mailbox). If such file is viewed as a multisetext it will result in 3-level-outline structure: mail-subjects become top-level chapters, setext titles denote subchapters (topics) and the subheads yet finer threads within these (still a notch ABOVE mere “paragraphs of text”).