Why "flat is better than nested"?

Fri Dec 22 10:22:05 EST 2017

On Sat, 23 Dec 2017 01:48 am, andrewpate08 at gmail.com wrote:

> On Monday, October 25, 2010 at 11:07:42 AM UTC+1, kj wrote:
>> In "The Zen of Python", one of the "maxims" is "flat is better than
>> nested"?  Why?  Can anyone give me a concrete example that illustrates
>> this point?
>> 
>> TIA!
>> 
>> ~kj
>> 
>> PS: My question should not be construed as a defense for "nested".
>> I have no particular preference for either flat or nested; it all
>> depends on the situation; I would have asked the same question if
>> the maxim had been "nested is better than flat".
> 
> I think there is a point where flat stops working. One of the products I
> work on has a 40-50 field datastructure - it is hard to work with and find
> the appropriate fields in. So I would say structured is better than flat but
> simple is better than to structured.

Do you realise you are responding to a message more than seven years old?

In any case, structured is not the opposite of flat. Nested is the opposite to
flat, and unstructured is the opposite of structured. The two concepts are
independent of each other: any data structure can be:

- flat and structured;
- flat and unstructured;
- nested and structured;
- nested and unstructured.

Data can even be semi-structured; for example, mp3 files have some structured
metadata (title, artist, track number, comment, etc), while data in the
comment field itself is unstructured (free-form) text.

Here is an example of iterating over a flat collection of numbers:

values = [1, 2, 3, 4, 5, 6, 7, 8]
for num in values:
    print(num)

Here is the same, as a nested collection of numbers:

values = [1, [2, [3, [4, [5, [6, [7, [8, []]]]]]]]]
while values:
    num, values = values
    print(num)

In Python, the first is much more efficient than the second. It is also easier
to write, easier to read, and less likely for the programmer to mess up.

A list or a dict is flat; a binary tree is nested.

Structured and unstructured data can have two related but separate meanings.
One is to distinguish between data with or without a pre-defined
organization:

- structured data: XML, JSON, YAML, databases, etc;

- unstructured data: books, the body of emails, arbitrary web pages, etc.

Dealing with unstructured data in this sense often means coming up with some
sort of heuristic or "best guess" for picking out the useful information
(say, based on regexes) then writing a scraper to pull part the data source
looking for what you want.

The meaning you seem to be using seems to be:

- structured data has named fields (e.g. C struct, Pascal record, 
  object with named attributes, Python named tuple, CSV file with 
  descriptive column headers);

- unstructured data does not name the fields (e.g. a plain tuple,
  CSV file without column headers) and you have to infer the 
  meaning of each field from out-of-band knowledge (say, you read
  the documentation to find out that "column 7" is the score).

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.