This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Old Pie in the Sky

About 2010 the author tried to see how far he could go with his concept of a framework to capture, store, organize, and operate upon any visible page on the web. Some of these ideas are stored here.

Here are some articles in this section:

1 - Zen Project #1: A Web Site and a New Technology to Help People Scale the Web

This page was first posted on tomelam.blogspot.com about 2010 and has been lightly edited.

Ascending and Descending, M. C. Escher, 1960

In this, the first in a series of articles on Zen, I describe a large and general problem people have as users of the web. Then I describe a set of technologies I am working on, more fully described at Mashweb.Club and my older website, tomelam.com, which I call Zen, that could significantly reduce the problem. On average, each of us consumes three times as much information as we did in 1960 and checks 40 web sites a day, according to a 2010 article on National Public Radio’s web site. New York Times articles tell us that this makes us impatient, forgetful, anxious, shallow, and unfocused. Information is our crack cocaine, and we should have been in rehab long ago. An information fix is even called a hit! (See below.)

Part of the problem is the difficulty in finding what we seek—even when we know it is on the web. When we search for something on the web, we hope that the process will be linear and finite, but we get lost in a strange loop, a tangled hierarchy of searching, scanning, surfing, and subscribing that we try to climb down like a staircase. In our search for digital nirvana, employing bookmarks, tags, annotations, hyperlinks, search, indices, rankings, ratings, and subscriptions, we find ourselves skipping up and down between various levels of metadata and web resources, like the monks trudging endlessly on the paradoxical, looped staircase in Escher’s lithograph “Ascending and Descending,” shown above. On our way we collect scores of bookmarks and piles of paper notes, and get a headache and short temper, too.

Let us try to outline the stages or hierarchies of the most general search:

  1. Search for an answer to a question using a general purpose or specialized “search engine,” typically by entering keywords, phrases, and parameters (e.g., date, tags, domain, URL, links-to, similar-to). Items that match the query criteria are called hits. Hits are presented in lists or in hierarchical outlines and are typically ordered by criteria such as relevance or paid placement. Searching is optional if we can replace the list or outline of hits with a list or outline of resources from categories in a web directory. Examples of directories of the whole web were the now-defunct Yahoo! Directory and DMOZ. (See Wikipedia’s entries on Yahoo! Directory and DMOZ and a successor to DMOZ.) Many specialized directories of web resources also exist.
  2. Examine brief descriptions, summaries, or snippets of the items in the list or outline, if available.
  3. Open the links to the relevant-looking hits—in new windows or tabs if possible. Evaluate these linked-to resources for fitness to our purpose.

This linear procedure can often miss highly specialized information. Sometimes this happens because we weren’t able to choose the best search query at the outset: we didn’t have enough information to choose well. Sometimes we might not even know information is available pertaining to our problem. And sometimes we really need information from the “deep web,” i.e., data available only behind a form-oriented user interface, such as an airline fare search interface or a phone directory search interface. The “deep web” is generally not indexable by web search engines (“spiders” to be more precise). A better alternative to this hit-and-miss search technique is “semantic search”. (When I first wrote this article in 2010, it was sometimes called “idea search,” but the source I referred to then is now gone, not even archived on Archive.org.) The following articles provide a start at learning about semantic search:

  1. SearchEngineJournal.com’s article “Semantic Search: What It Is & Why It Matters for SEO Today”.
  2. SearchEngineWatch.com’s article “The beginner’s guide to semantic search: Examples and tools”.
  3. Wikipedia.org’s article “Semantic search”.
  4. OnCrawl.com’s article “What is semantic SEO?".
  5. Google’s article Google Knowledge Graph Search API.
  6. Stewart, Scott, and Zelevinsky’s article “Idea Navigation: Structured Browsing for Unstructured Text”.
  7. Adam Westerski’s book Semantic Technologies in Idea Management Systems: A Model for Interoperability, Linking and Filtering.

We often resort to trial-and-error variations of our search, but this is usually frustrating. A better approach than the trial-and-error, hit-or-miss approach is to refine each level of hierarchy, as described above, by applying the whole search procedure to that level. That is, we seek information resources, meta-resources (e.g., search engines), and methods of search. A method of search can involve subscription to a forum, wiki, mailing list, social bookmarking service (like Delicious, being slowly revived, archived here; and Digg; and Reddit; and Pinboard), or general social media (like Twitter and Facebook) so that we can make new friends and acquaintances (or communicate with old friends) and get help from them. Or it might be to subscribe to pertinent blogs in an attempt to find out which experts are sharing helpful information on our topic. (Try Feedly or read about Google Reader. Also read about Google Blog Search or try Bloglovin, or QuiteRSS, or some of the sites in this moreofit list.) The “helpful information” could be the final solution we are seeking, or it could be a set of new keywords we hadn’t considered or known, or it could be some other forum, wiki, blog, mailing list, chat room, or other resource. The results on topic-specific forums, wikis, blogs, mailing lists, and chat rooms will be pre-filtered by design, avoiding completely irrelevant hits that can swamp the relevant ones in a general-purpose whole-web search engine. There are many other strategies for refining the levels of search hierarchy.

Other problems arise in our research on the web and in remembering what we find: too many web pages and no adequate means of keeping track of them. If we open a window for each web page that looks pertinent to our search, we soon overburden our web browser or our operating system. The browser will crash, or, in the case of Google Chrome, tab panes will crash. Web browsing will slow to a desperate crawl. (These problems with Google Chrome, the most-used web browser, have existed for for a decade or more. Chrome has some of the best support for extensions and developer support, so it is hard to replace.) If we bookmark our web pages, we irretrievably lose time navigating the browser’s bookmark manager and anyway might not be able to find the bookmarks later when we need them.

So what’s the answer, short of a very heavyweight “idea browser” and possibly much better natural language processing—even artificial sentience—that is not within our reach? I propose we make it possible to write the web as easily as desktop text documents can be written, and to implement some of the original ideas about hypertext like transclusion, so one single web page becomes our notebook and scrapbook. The reader might think of Evernote, Google Notebook, or Xanadu, but the kernel of the technology I’m working on focuses on organizing and tuning the user’s view of and interaction with the web and on being universally usable, globally, by anyone with a reasonably up-to-date web browser, without browser plug-ins. On top of the kernel, many web-server-assisted capabilities can be built.

Zen will allow the user to get close to the process of refining his searches, so that he can keep meta-resources, final results, and records of how he got results at his fingertips, under his control, programmed via a simple visual interface. He should be able to program the Internet by dragging and flowing links, text, data in microformats, and media from any web page into a web page he controls and saves to a web server, and via simple programming of his own interface to the web. He should be able to use his links as bookmarks. He should be able to organize the links, text, and media in folders, tab panes, accordion panes, and other web widgets and lazy-load them so that time is not taken when he re-opens his page at a later date. Only the open pages or other widgets that contain the links, text, data, and media should be loaded. (That’s how a lazy-loaded pane works.) Zen will even allow lazy-loaded widgets to be put into other lazy-loaded widgets. Many of the web-server-assisted services mentioned above and below can easily be codified and automated so that they can be included as widgets on a web page.

The user will be able to share these pages over the web with other people very easily, in many ways. The special web server will serve his web page and will use its in-built proxy capabilities to enable the user’s web page to be composed of content from multiple web sites. Such a composition is called a mashup. The JavaScript inside the user’s web page will enable many kinds of mashup to be created even without help from the special web server. These JavaScript-enabled mashups will use certain kinds of mashable content from multiple web sites. Microformats, mentioned above, can assist the mashups to “digest” and pour content from disparate places into the page. For example, much social media, including the present blog, can be accessed via web feeds using an RSS or Atom microformat. This allows the media to be formatted by a news aggregator for human consumption or inserted into a web page.

Various features of Zen will allow development of a web page to proceed smoothly. Someday maybe the web server will enable the user’s web navigation to be recorded automatically. Variations of the user’s web page will be as easily produced as Git branches, because a Git-like “filesystem” on the web server will record the components of the web page analogously to Git commits. The system must be easy to understand without much training.

I am beginning to program Zen to enable this writable web. I plan eventually to create a web site offering registrations for people to create their own portals to the web. These will be real portals: most of the media content in the web pages will be hot-linked, not copied. For web sites that disallow hot-linking, the special web proxy will be used to work around the limitation by imitating web browsers that such web sites expect to deal with. I am hoping that many people will be interested in the technology, and I am seeking collaborations from companies, investors, technology marketing experts, and programmers.

2 - Zen: First Principles

This is the second in a series of articles on Zen, that began with the article “Zen Project #1: A Web Site and a New Technology to Help People Scale the Web”. The article was first posted on tomelam.blogspot.com about 2010 and has been edited.

Zen has ambitious goals. The key to its success will lie in its remaining true to its unique set of first principles:

  1. Zen will be global, i.e. it exists on the Web.

  2. Zen will be a zero-install program, i.e. it provides its basic features without browser plug-ins or extensions.

  3. Zen will provide graphical user interfaces for adding graphical widgets and HTML elements to a Zen web page. Of course, if too many graphical widget frameworks—or the wrong combination of frameworks—are used in a single web page, trouble ensues. Zen will not in general impose any limitations upon the way the widgets and frameworks can be combined, so there should be a very safe and easy-to-use version of Zen for general use by all users.

  4. (This principle is the most ambitious, and might never be realized, but please read to the end of its description.) Zen will “engulf the Web”: it will provide a user-programmable widget that will embed a “web browser” in a Zen web page. Again, even non-technical users will be able to “program” this web browser to do such things as make any web page dissectible, editable, rearrangeable, and mash-able. It will be possible to embed multiple copies of the “browser widget” in a web page. The user will be able to use Zen to copy or move widgets and components from the embedded web page into the rest of the Zen web page. This principle of “engulfing the Web” is very difficult to remain true to because it implies that web pages will be analyzed by a special web server with proxy capabilities. The special web server will also allow any page to be hot-linked. [March 15, 2012: As of now, clicking on the “any page to be hot-linked” link only leads to the top of the blog post. Please skip down to the last paragraph of the post. Something about Blogger.com seems to have prevented a link target inside a post to be accessed.] The special web server will also emulate referral links, thereby allowing a web document that can only be accessed through a referral link (not directly via URL) to be accessed from any Zen-enabled web page. Although it would be difficult to create an application true to this principle (the author came close to cloning web pages, but the code was very slow), a web browser extension like iFrame Allow can allow a similar sort of encapsulation. Such extensions should be used very carefully because they can lead to bleeding of user data from one website to another.

  5. Zen provides many functions like filter and sort that even non-technical users can apply to data sources just by dragging widgets around in the web page. It will be possible to chain many of the functions together to provide more possibilities for data manipulation.

  6. A user will be able to capture a sequence of interactions with their Zen web page such as moving the mouse pointer over a particular kind of widget (the technicalities of what “kind” means here are not very important in this discussion), clicking on a widget or web page component, typing a single character on the keyboard, entering a string of characters in a box, etc. The range of interaction events that can be captured is from the lowest level interaction with a web page (characters and mouse interactions including the clicking upon submit buttons) through any kind of widget interaction. The user can generalize the captured sequence (“parameterize” it) so that it can be applied to a class of situations. He can store it and attach it to a menu or button so that it can be evoked by choosing the menu item or clicking the button. Thereafter when the user of the page containing the parameterized sequence interacts with the page his interactions with the page are circumscribed: his out-of-sequence input is optionally treated as an error and he can be gently guided to follow the sequence. The capabilities of this Zen sequence capture are somewhat elaborated from the basics described here.

  7. A user will be able to create persistent copies of his Zen web pages. These copies will be saved on the Zen web server.

  8. Zen will facilitate a user creating web pages and web applications that do not include the Zen library. After Zen removes itself from these pages and applications, they will only depend upon HTML, one or more already-well-accepted JavaScript libraries, and a minimum of JavaScript “glue.” In another way of speaking, Zen will export such pages and applications. Some part of the Zen library will be required to enable all features of captured sequences, however.

  9. Zen can be augmented with features that break its first principles, but the result will not be Zen.

  10. As an exception to First Principle #1, Zen can be injected into any web page, via web browser add-ons or extensions, to allow the page to be redesigned and mashed up with other content and to gain Zen features. However, only content managed by cooperative JavaScript libraries will be fully compatible with Zen. A cooperative library must facilitate the tracking of every widget and every “DOM element” that the library adds to or subtracts from the web page. (DOM elements are the “atoms” or structural components of a web page. Tables, divisions of a page, paragraphs, lists, and list items, among other things, are represented by DOM elements.) Furthermore, a cooperative library must facilitate the tracking of actions that it can perform in response to the user’s interaction with the web page or in response to data received from the web server. (Such prepared actions are called “event handlers.” They respond to interaction such as mouse clicking and key pressing.) The word facilitate is a bit ambiguous: it can mean “make it possible” or “make it easy.” So far as Zen is concerned, a cooperative library should only have to make possible the things just mentioned, but in its first iterations, Zen might only be fully compatible with libraries that make those things easy.

I might add more first principles to this list later.

I am still seeking collaborations from companies, investors, technology marketing experts, and programmers.