Ndf
From nformation
Nathan's Document Format
Contents |
Introduction
- It is clear that there is a need for a simple universally accessible data format for portable content.
- Care should be taken to keep things SIMPLE, intuitive, and accessible to humans.
Comments on the nature of documents
When looking at human communication we can see that it is ordered and progressive. For example, text is characters, compounded into words, which form sentences. Sentences can be used on their own, or become grouped into paragraphs. Paragraphs are grouped into chapters, then sections. The content of any book could be represented as a nested list.
What about the pages and the formatting. These are transitory, and artifacts of the limitations and dimensions of the medium. For example if a book is reprinted, they can change the fonts used, or the page size, or the page numbers, and we still think of it as the same book. You can drastically reformat something and if this underlying structure remains unchanged, the content is still recognized as being the same. Whereas if the content is changed, and the formatting stays the same, we recognized that this is not the same book at all.
The molecular unit used for this format is the sentence or line. This is chosen because it is a middle way between paragraphs(+) blocks and words/characters. Sentences are optimally clear and individual syllogisms, which are structured to communicate ideas. Using this as the core of the format helps to support and re-enforce an emphasis on clarity and conciseness.
What makes this different
- Unlike HTML this is not a presentation system.
- Formatting and layout information should be left up to the renderer for interpretation.
- Content such as English language sentences can be interpreted by renderers for text, html, and speech, without complication.
Why 3...
"First shalt thou take out the Holy Pin, then shalt thou count to three, no more, no less. Three shall be the number thou shalt count, and the number of the counting shall be three. Four shalt thou not count, neither count thou two, excepting that thou then proceed to three. Five is right out. Once the number three, being the third number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch towards thy foe, who being naughty in my sight, shall snuff it." Amen.
Format Description
The format is based on lines of text. The fields are delimited by tab characters. There are three fields, in the following order, explained in more detail below. 1. Name / Number 2. Datatype 3. Field of data
So that is [name]\t[datatype]\t[field]\n
1. Name / Number
This field is a value representing the address of an object.
There are all sorts of different ways this could be done. The exact method of handling this is not presently defined. Methods used should be intuitive, and intended to be compatible with others.
The simplest form is to use whole integers - addressing each object through progressive use of the characters 0 - 9.
This also allows for deeper structure to be represented.
- For example an object may hold sub objects, each layer of nesting is represented with a period/decimal point/ dot. 2.6.23 for example.
- Another method would be to use ISO standard date+time or unix timestamps.
Remember to keep it simple.
2. Datatype
This is context / content / class type information for the data field. The length is currently not restricted, but care should be taken to keep things simple and accessible.
- It is recommended to use non-case-sensitive combinations the letters a-z
- apologies to those using other character sets, but no. that gets too complicated too fast.
- . to delimit subtypes
- "special" characters should be avoided.
3. Field of Data
This is the actual data. If the data type is a function, then this is it's parameter(s).
- This should not include "special" characters.
- If you want to put binary or special data in, consider using something like base64 encoding.
- Remember this field ends with a newline (\n).