Oleg Andreev

Month
Filter by post type
All posts

Text
Photo
Quote
Link
Chat
Audio
Video
Ask

October 2010

Splay tree

“All normal operations on a binary search tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree.

A top-down algorithm can combine the search and the tree reorganization into a single phase.”

http://en.wikipedia.org/wiki/Splay_tree

The splay tree modifies itself every time it is searched becoming more and more efficiently organized over time.

Oct 31, 20103 notes
A copy-pastable code

Everyday as a software developer you have to invent some abstractions. Simply speaking, you have to decide where to put the new code. After you decide this, you write more code and repeat the process. Sometimes the earlier decisions need to be changed and you refactor the existing code. Now you decide where to put the old code.

I really need a hint. The OOP folks teach us to model the real world. Just look at the problem domain, they say, and you will see where the things belong. It works great until you hit some system-specific pure abstractions and there is no natural metaphor to help you.

Try another approach. Since the initial question is where to put the code, and the refactoring is about moving the code around, why not to make the code itself easily movable? How about making the code copy-paste friendly?

The first idea which comes to your mind is to wrap it in the object. Yes, it might solve the problem. But at what cost? Creating an object means defining the interface (class, protocol, whatever) which creates another entity in the program and eats a part of your brain. Not always a good idea when you are already stuck finding out where’s the best place for just ten lines of code.

When you are trying to solve a problem, do not hurry creating another one. Relax, put the code somewhere where it is easier to move from and make it depend on the nearby code as little as possible. Usually you do so by putting the dependent data in some local variables. You can later transform them into function arguments or object properties.

When you make the code movable, you can (sic!) move it around and isolate more and more over time. Maybe 5 minutes later you will discover you don’t need it at all. Or that it should be simplified and moved in a function. Or that it should have more functionality and become an object. Or that it should be split in two different tasks. All of these questions become much easier to answer when you keep the code simple, stupid, light and isolated just enough. Just enough to copy and paste it.

Oct 30, 20105 notes
A history of concurrent software design

Early approaches to concurrency

When machines were big and slow, there was no concurrency in software. Machines got faster and people figured out how to make multiple processes running together. Concurrent processes proved being extremely useful and the idea was brought further to the per-process threads. Concurrency was useful because it powered graphical interactive applications and networking systems. And those were becoming more and more popular and more advanced.

For some tasks concurrent processes and threads presented very difficult challenges. The threads participate in a preemptive multitasking, that is the system where the threads are forced-switched by the kernel every N milliseconds. At the same time, the threads has a shared access to the files, system interfaces and in-process memory. The threads do not know when they are about to be switched by the system, which makes it difficult to safely retain and release control over the shared resources. As a partial solution, different sorts of locks where invented to make multi-threaded programs safe, but those didn’t make the work any easier.

A typical code in a multi-threaded environment:

prepareData();
lock(firstResource);
startFirstOperation();
unlock(firstResource);
prepareMoreData();
lock(secondResource);
startSecondOperation();
unlock(secondResource);
finish();


Modern concurrency

Next approach to concurrency was based on a realization that the problem of shared resources lays in the very definition of the “shared”. What if you create a resource with a strictly ordered access to it? Sounds counter-intuitive: how can this be concurrent? Turns out, if you design the interface like a message box (that is only one process reads it and nobody blocks waiting for a response), you may build many of such resources and they will work concurrently and safely. This idea was implemented in many sorts of interfaces: unix sockets, higher-level message queues and application event loops. Finally, it found its way into the programming languages.

Probably, the most wide-spread programming language today, JavaScript, features function objects that capture the execution state for later execution. This greatly simplifies writing highly concurrent networking programs. In fact, a typical JavaScript program runs on a single thread, and yet it can control many concurrent processes.

Mac OS X 10.6 (Snow Leopard) features built-in global thread management mechanism and language-level blocks making writing concurrent programs as easily as in JavaScript, but taking advantage of any amount of available processing cores and threads. It is called Grand Central Dispatch (GCD) and what it does is perfectly described by a “message box” metaphor. For every shared resource you wish to access in concurrent and non-blocking way, you assign a single queue. You access a resource in a block which sits in the queue. When the block is executed, it will have an exclusive access to the resource without blocking anybody else. To access another resource with the results of the execution, you will have to post another block to another queue. The same design is possible without blocks (or “closures”), but it turned to be more tedious and limiting and resulting in less concurrent, slower or unstable programs.

The modern concurrent code looks like that:

prepareData();
startFirstOperation(^{
  prepareMoreData();
  startSecondOperation(^{
    finish();
  })
})

Every call with a block starts some task in the other thread or at a later time. The block-based API has two major benefits: the block has access to the lexically local data and executes in a proper thread. That is it eliminates the need for explicit locks or moving and storing the local data explicitly just for making it available in a proper thread.

Think of it this way: every block of code inside the curly brackets is executed in parallel with the code it was created in.


Future directions

The upcoming generation of software already is or will be written this way. But block-based approach still isn’t perfect. You have to manage queues and blocks explicitly. Some experimental languages and systems already have a transparent support for “continuations”: that is the code looks linear, in a blocking fashion, but the process jumps between different contexts and never blocks any threads:

prepareData();
startFirstOperation();
prepareMoreData();
startSecondOperation();
finish();

This is much more natural and looks like a naïve approach which we started with and fixed with the locks. However, to make it work concurrently we have to learn GCD and take it to the next level.

When you start some operation which operates on a different resource and can take some time, instead of wrapping the rest of your code within a block, you put the current procedure in a paused state and let the other procedure to resume it later.

Imagine that instead of the discrete blocks of code, the kernel manages continuously executed routines. These routines look very much like threads with an important exception: each routine gives up the execution voluntary. This is called cooperative multitasking and such routines are called coroutines (рус. сопрограммы). Still, though, each routine can be assigned to a thread just like a block or be rescheduled from one thread to another on demand. So we retain the advantage of the multi-processing systems.

Example: you have a web application which does many operations with shared resources: reads/writes to a database, communicates with another application over the network, read/writes to the disk and finally streams some data to the client. All the operations should usually be ordered for each request, but you don’t want to make thread wait each time you have some relatively long-running operation. Also, it is not efficient to run multiple preemptive threads: there is a cost of switching the threads and you get all sorts of troubles with random race conditions. GCD and blocks help for the most part, but if you use them to make every single operation on a shared resource, you will get an enormously deep nested code. Remember: even writing to a log means accessing a shared file system which better be asynchronous.


15 years later

Today, a lot of trivial operations like writing to a disk or accessing a local database do not deserve asynchronous interfaces. They seem fast enough and you still can drop more threads or CPU to make some things faster. However, the coroutines will make even these trivial tasks asynchronous and sometimes a little bit faster. So why is that important anyway?

The coroutines are important because every shared resource will get its independent, isolated coroutine. That means, every resource will have not only private data and private functionality, but also a private right for execution. The whole resource will be encapsulated as good as any networking server. The file system, every file, every socket, external device, every process and every component of an application will have a coroutine and a complete control on when to execute and not execute. This will mean that there is no need for a shared memory and a central processor. The whole RAM+CPU tandem can be replaced with a GPU-like system with hundreds of tiny processors with private memory banks. The memory access will become much faster and the kernel will not need to waste energy switching threads and processes.

A single design change which makes programming easier will make a shift to much-much more efficient architecture possible. It won’t be just faster, it will be efficient: while the servers could be 100 times more productive, the personal devices could be 10 times faster while consuming 10 times less energy.


30 years later

By the time operating systems will support coroutines and a truly multi-processor architecture, new applications will emerge with capabilities we can only dream about. Things like data mining, massive graphics processing and machine learning work mostly in the huge data centers. Twenty years later this will be ubiquitous just like a 3D games on the phones today. These task will require more memory space. Finally, the common storage memory will be merged with RAM and processor and processing of huge amounts of data will become much more efficient.

Given such a great advance in technology, humanity will define its unpredictably unique way to educate and entertain itself. As we get closer to that time, it will become more clear what is going to be next.

Oct 29, 2010
Two kinds of charts

There are two very different kinds of information visualizations. And I don’t have pies and bars in mind.

The first kind is for presenting the knowledge. You have already discovered some interesting facts and now need to present them clearly to the reader. Your task is to design a most appropriate form for the data, that the knowledge will become obvious. (Of course, you should not be dishonest and lie to a reader by making perspective distortions or shifting the zero point.) Sometimes you may even drop the figures and just show the data geometrically. The charts should have little noise: no lines, no ticks, no labels. Curves can be smooth and some nice 3D effects can be applied. Present as little data as necessary. Prefer geometric objects to tables.

The second kind is for discovering the knowledge. You have raw data and no particular idea about what could be so interesting about it. In such case you need a format which lets you discover the knowledge. Comparing to the first kind of visualization, here you might want to have more additional information, most probably an interactive table or graph to play with. Add some additional hints: the mesh, absolute and relative values, confidence intervals etc. Of course, this form should be much more honest and complete than the presentation of a first kind. No special effects, no smooth curves. Prefer tables and simple charts to fancy geometric objects.

When presenting a data, first thing to do is to decide what kind of problem do you solve. If you present a raw data, make it easy to work with it and find the knowledge. If you have already found a knowledge, present it in the most clear form.

Oct 28, 2010
Printed book idioms to be avoided on screen

1. Breaking an article into multiple pages.

Page is a physical limitation of a paper medium. Sometimes the text does not fit and you have to drop the last paragraph on another page. On the screen you have plenty of vertical space and there is no excuse to cut the reader’s context.


2. A lot of iPad newspaper apps simulate multi-column layout. They shouldn’t.

The purpose of a multi-column layout is to make articles’ layout more flexible on a big newspaper page. On a wide page you can fit a couple of articles and an ad. But the screen is not that wide.

Narrow columns also require small font size, which is a problem on a display of a resolution under 300 dpi.

Narrow columns require manually-tuned hyphenation and sometimes font width-adjustment. It is a requirement for the books as well, but a more narrow column looks even worse. Unfortunately, it is not the case for the digital media today.

If the column does not fit the screen, you constantly have to scroll down and up when reading a page: down when finishing the first column and up to proceed to the next one.

You can scroll and zoom the page on screen. If you make a single scrollable and zoomable column, you don’t need to provide font size control or worry about how much of content is visible. The reader can choose the more comfortable size of the page for herself.


3. A lot of people use footnotes on the web. This is horrendous: you have to leave the current line and scroll down. And even if you scroll by clicking a footnote number, you then have to scroll back. And even if you have a link from the footnote back (like in a wikipedia article), browser doesn’t scroll exactly to the position at which you were before.

On the screen you have a plenty of vertical space. And if you don’t use multiple columns (which you should not), you have some space on the side. That means, you may put some notes in the block of smaller font right below the paragraph, or on the side.

Summary

Do not break articles in pages. Do not break text in the columns. Make text column scrollable and zoomable. Make the footnotes immediately under the paragraph, or put them on the side.

Oct 25, 20103 notes
Mac OS X Lion predictions

There are some predictions or wishlists floating in the tubes regarding an anticipated update to Mac OS X. Some of them are more probable, some less and some are just plain crazy. Let me give you my predictions and some commentary.

1. The next cat name is likely to be “Lion”. This is based entirely on a single picture from the invitation picture and also is the least interesting prediction. I don’t think it is going to be the “last” release in any sense.

2. The merge with iOS. First, Mac OS X already has some UI features borrowed from iOS: navigation buttons in Dock stacks, iPhoto and iTunes. There will be more of them. Maybe scrollview will be updated with more flat scrollbars, maybe some bouncing will appear (and if so, it will be off by default for the existing software).
No way there will be a touch-controllable UI for the existing applications. The apps are not designed at all for the multi-touch and the size of the finger. Even if Windows 7 supports this, there’s no reason for Apple to follow the same path. However, taking in account the dual-mode touch screen patent, it seems more probable that Mac OS X might be transformed into iOS device on demand. But Apple does not favor dual-mode UIs: this just creates confusion for users and developers. The Front Row is a rare example of a second UI mode (transforming Mac into a focused media player). But the iOS is considered more or less a full-feature environment with far reacher user interface then the Front Row and at least as rich as Mac OS X. It is very unlikely that the iMac or MacBook will have two personalities which complete with each other and cooperate badly producing a huge confusion.

So believing in a strong movement towards touch UI everywhere, we may expect not a dual-mode, but per-window fusion of the iOS apps into Mac OS X. This has it’s issues also: still the file sharing is not as smooth as what we expect on a desktop OS, the iPad screen in portrait mode does not fit in the MacBook screen. And again, if you can touch and drag the iOS window, why not to touch and drag other windows? And if you can touch and drag all the windows, why not touch all the buttons? And the screen should be oriented horizontally just as keyboard or trackpad today. This is not easy to solve.

So the UIKit multi-touch will eventually show up in some version of Mac OS X, but it is not as easy as some may believe. The less improbable prediction: the Mac OS X will have a very conservative, slow introduction of touchscreen with emphasized limitations to minimize confusion as much as possible.

3. AppStore for Mac OS X. This is a really good idea in pure form, but once again has some conceptual difficulties. Apple will not lock the Mac OS X as they did with iOS, so it will compete with other distribution channels and may be forced to lower their 30% cut. At the very same time they would have to retain approval process to filter out crappy software they will sell. Developers who are not happy with the commission and the approval process, will go distribute the apps on their own. But this is very hard to debate because there’s still no third party app store for Mac, so the place seems vacant. Or it is vacant because no one could build a viable store business yet. Anyway, the Apple is the most likely company to succeed at this, and if executed perfectly, it will attract a lot of developers and make themselves and Apple much more money, and drag the Mac even further in the market share race.

4. Resolution independence (making UI 1.25-1.5 times bigger). Mac OS X team works on resolution independence for more that 4 years already. And still, on Snow Leopard the implementation is buggy and far from being anywhere close to “beta” status. The conceptual problem here is that this technology is aimed at scales of 1.25 and 1.5, not 2.0 like on iPhone. And this is not as simple as multiplying everything by two. I guess the displays with 2x higher resolution (for MacBooks at least) will become affordable before the 1.25 scale will be fixed for all the shipping apps.
Oh, do not forget that the “retina display” approach does not make things bigger for people with poor vision, it makes them sharper. The sharper text somewhat easier to see, but not as easy as 1.5x bigger one. Apple may realize that system-wide smooth resolution scaling is not worth tinkering with and full screen zoom is just enough for solving vision problems. My bet is on retina displays and old resolution independence framework being put on the shelf.

5. Finder improvements. Some folks dream about tabbed Finder. The problem is that file system is hard enough already. Adding tabs just complicates the look of the Finder and makes file system even scarier. Even if the tabs find their way into Finder, they will be disabled by default. Just like the tabs in the earlier versions of Safari were disabled.

What would be really cool is a merge of the Dock Stacks with the Quick Look and a merge of Quick Look with other apps. This is a pure speculation. Have you noticed how easy it is to jump through the folders in the Dock Stack? Buttons are big and once you find the file you want, the window disappears automatically. The Quick Look also disappears easily. Finder on the other hand, creates clutter: you have too many individual Finder windows all over the desktop. The tabs do not remove the clutter, they just organize it. Maybe what we need is not organizing it manually, but having something like a “recent folders” list and jumping through them using Quick Look.

How many times you’ve started a movie in Quick Look and played it way too long to forget that it is not a stand-alone player? And then you do something with Finder and the movie disappears! Take a look at the iCal: if you open an event, the popover window will appear with details. This window behaves much like the Quick Look: do something else and it disappears. But if you move is a little bit, it will transform into a stand-alone panel which will stick on the screen until you close it. The same idea can apply to Quick Look. It will be super-useful to transform a folder preview into a Finder window, a movie preview into a Quick Time window etc.


6. iChat with FaceTime, iCal like in iPad, iLife, iWork updates: this all is possible. The question is the timing: maybe not all of that will be tomorrow, but only some. I don’t expect super-cool features here, but more like an evolution and improvements.


7. Macs won’t support blu-ray drives. I haven’t heard about blu-ray from any of people I know. Those who really need it may buy an external drive.


8. There won’t be NTFS mounts or built-in VM for Windows. Not because there is a fight with Microsoft. Apple simply doesn’t have time for the features most people don’t need. BootCamp was an important thing in 2006 to bring more customers. Nowadays Apple does not mention “switching” anymore. There is already a plenty of ways to communicate with Windows, both built-in and supplied by third parties.


9. Mac OS X distributed as a free software update. Recently Apple has lobbied an accounting rules change to be able to distribute free updates of iOS for non-subsidized products like iPod touch and iPad. This makes the platform more vibrant and much more devices stay up-to-date. Making Mac OS X update free, Apple can accelerate adoption of their technologies and bring better and more exciting applications to the Mac.

Edit: forgot to add that a lot of goodies from UIKit, MapKit, EventKit etc. might well be ported to the Mac APIs. The NSTableView might learn about recyclable views from UITableView.

Oct 19, 20102 notes
#predictions
Next page →
20152016
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201420152016
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201320142015
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201220132014
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201120122013
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201020112012
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200920102011
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200820092010
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200720082009
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200620072008
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200520062007
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200420052006
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200320042005
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200220032004
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200120022003
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
200020012002
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199920002001
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199819992000
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199719981999
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199619971998
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199519961997
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199419951996
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199319941995
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199219931994
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199119921993
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
199019911992
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
198919901991
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
198819891990
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
198719881989
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
198619871988
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
19861987
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December