“That is what happens when a conventional generational garbage collector is used for a program whose object lifetimes resemble the radioactive decay model. The collector simply assumes that young objects have a shorter life expectancy than old objects, and concentrates its effort on collecting the generations that contain the most recently allocated objects. In the radioactive decay model, these are the objects that have had the least amount of time in which to decay, so the generations in which they reside contain an unusually low percentage of garbage. For the radioactive decay model, therefore, a conventional generational collector will perform worse than a similar non-generational collector.”
Paper by William D. Clinger and Lars T. Hansen (PostScript)
The major feature of a dynamic language is interactivity. With Smalltalk you may run the program and inspect/change it at runtime. This implies some GUI for VM with built-in crappy text editor: you don’t edit files, you edit objects now.
This does not sound very comfortable for many reasons. First, you would always want to have a “canonical” state of your application which is not affected by runtime mutations: that is, plain text files stored under some version control. Next, you would like to use a different text editor or GUI and it is much simpler to achieve when you operate with plain files instead of fancy VM/language-specific API.
How do we combine interactivity of Smalltalk with text file editing? Let’s take the puriest OO language ever designed: Io.
1. Each file contains an expression.
2. The only way to load the file is to evaluate it in context of some object: object doFile("file.io"). The return value would be a result of the expression in the file.
3. We may have a convention that some files return a prototype object: the object which is used as a prototype for other objects created in runtime.
4. To load “prototype object” we use a special loader object which would track the file-to-object mapping: Article := Loader load("article.io")
5. Loader monitors the filesystem and when some file is changed, it loads it into another object and replaces the prototype with that object: Article become(load("article.io"))
6. At that point, all articles in the system suddenly have another version of Article proto.
You have to follow some safety rules. For instance, proto’s descendants should not modify the proto and rely on such modifications.
Of course, this method still does not allow you to change/inspect any object in the system. For this to work you may put a breakpoint message somewhere and use a debugger after the proto is reloaded and VM stepped on that message. Or wire some Smalltalk-like GUI to your app.
Simple proto-based reloading helps development a lot and in contrast to class loading methods with full app reload, works faster and for full range of source code including all libraries. Rails dependency system does not reload gems, but does a pretty good job with constant reloading. All ruby/rails issues with global data applied.
Now, working with Objective-C where nil eats messages, I realized that the code is more elegant, but it takes significant amount of time to debug it. You create if/else branches and breakpoints to trace the nil, then you fix the bug which causes it and erase the conditional code. You get your elegant code back and wait for another issue to arise later.
“Essentially, Haas Unica came about as a result of analysing the original version of Helvetica, its variants (as they were in 1980) and similar faces and seeking to improve them - to produce the ultimate archetypal sans serif face. A single face to unite them all, if you like. ”
See also: From Helvetica to Haas Unica (flickr set)
The paper discusses how thread-oriented programming is more efficient (in terms of performance and development cost) than event-oriented.
My personal observation is that cooperative multitasking (based on coroutines, fibers) requires less and easier to read code comparing to evented callback-based code.
The Objective C syntax is poisoned with nested square brackets:
[[[Class alloc] initWithApple:a andOrange:o] autorelease];
First, lets move opening bracket behind the name of the receiver:
Class[alloc][initWithApple:a andOrange:o][autorelease];
You may agree that this is much easier to write now. However, at this point we lose compatibility with ANSI C (think buffer[index]).
Lets omit brackets for messages without arguments and use a space as a delimiter:
Class alloc [initWithApple:a andOrange:o] autorelease;
At this point we may get back compatibility with ANSI C by making a non-context free grammar (parser should recognize that a[b:c] could not be used for index operations).
You can implement exactly that syntax in Io using the standard language features.
Stylesheet and javascript URLs and content should be controlled by application code. Putting static files into public folder is so nineties.
Before starting a work on a distinct feature, you create a branch:
$ git checkout -b myfeature
You write code, create fast commits, merge in master, rewrite code etc.
$ git checkout master$ git merge myfeature --squash
Now you have merged all the changes into the working tree, but not committed in the master branch (because of --squash option)
You may git add some files to produce nice commits as described in the previous article.
These rules are designed for an easy code review using “git log -p”. This command shows the history of commits with patches.
1. Commit message should include task reference number (# of ticket/case in bug tracker, url of wiki etc.). If there’s no reference number, then the ticket must be really trivial or include refactoring only.
2. Commit represents an atomic working patch. No “WIP” commits with undefined behavior are allowed. In your private branches you can do whatever you want, but when merging to master, you must aggregate commits in a set of working patches. If you don’t do that, the single feature would be spread among 30 commits with arbitrary code being written and erased between the start and the end.
3. Commit should be small. You should split a big commit in a few independent ones. More safe commits should be stored first. Good example: you had fixed some performance issue. First, commit a benchmark which shows the previous performance, then commit an updated code. This helps to test the previous code using newer benchmark without manipulating code by hand.
Rule 2 tells you not to pollute master branch with tons of WIP commits and rule 3 tells you to squash WIP commits wisely: do not put everything in a huge patch.
It is much easier to follow these rules when you look what others do with the code using git log each time you pull updates.
“There are two basic type of method: ones that return an object other than self, and ones that cause an effect. […]
As a general philosophy, it’s better to try and make your methods one type or the other. Try to avoid methods that both do something and return a meaningful object. Sometimes it will be unavoidable, but less often than you might expect.”
“But Apple require that this app be paid, not free, in order for us to offer In App Purchase. So lets look at that again, the same user downloads the app for $0.99 assuming it’s a one time payment, then launches the app to find that he only gets 30 days of service for the $0.99 he just paid. Furious he leave one star reviews all over the place even though we went to great lengths in the iTunes description to spell out the exact nature of the subscription and costs (but no one actually ever reads that stuff).”
While searching for “tell, don’t ask” I have got an interesting wikipedia article.
Is there a CS book teaching us how to write big complex programs?
- components identified by URI (“RESTful partials”)
- precise invalidation on data update (no timeout-based silliness)
- easy to extend, test and debug
The biggest advantage of dynamic languages is interactivity. With dynamic language you can open any part of the running system, change something and see how it behaves under these particular conditions, immediately. This dramatically improves design cycles, completely eliminates compile lags and helps to debug efficiently.
Smalltalk/Self guys got it more than 30 years ago.
It is pity to see how current Ruby/Python/JavaScript/etc. frameworks are less interactive than C++/Java within some modern IDE (like Visual Studio).
If the dynamic VM is a move forward, then next step are highly interactive tools. Everything else is just the same old story.
See also real life benefits of dynamic languages at stackoverflow.
Based on hash table vs. message-receivers and activatable slot, not value.
1. Every slot is activated on direct access. Non-activatable slot access raises exception.
2. x := y creates getter method(getSlot(“_x”)) and setter method(v, setSlot(“_x”, v); v).
3. x = y is parsed as x=(y) (i.e. message x= with argument y).
4. No ::=operator.
5. Method definition: obj setSlot(“add”, method(x, self + x))
5.1. Method definition macro: obj def add(x, self + x) (could be implemented in Io itself)
Pros:
- cleaner setters and hooks for setters;
- smaller syntax;
- uniform message dispatch: each message is processed by a method;
- safety: no need to use getSlot(“x”) for method arguments when activatable value could be passed (relevant to any abstract algorithms).
Cons:
- performance hit since local variable access should perform double hash table lookup; this could be optimized by storing hidden variables (_x) in a plain array.
What do you think?
1. Use divs with float:left/position:absolute and negative margins ONLY for the global page layout.
2. For inner modular things like “thumbnail with centered image and centered caption” NEVER EVER use layout tricks mentioned above. Always make sure the module does not require specific outer tags and styles. This is generally possible using tables.
Reasoning: when the smart object is inserted into unprepared environment (div or table) it is nearly impossible to put it into correct position since it has lost its height which should stretch outer container.
This is an important addition to the previous article.
Yesterday I have stated that every incremental development process suffers from increasing module coupling by definition. Smaller steps give you flexibility to turn around a current point in the development process, but not to jump out of it.
In previous article I have completely missed the first statement and started talking about “refactoring 2.0”. In fact, when you have reached first N lines of code in your project you should start a new feature from scratch (literally: create new folder, git init, etc.) This action could be considered as a small jump out of the current environment towards the latest requirements.
When you start building something side by side with the existing environment, you are forced to define some minimal API for the existing code to communicate with the new feature. This could be object-oriented API, config file or network protocol. Maybe you would need to refactor existing code in order to provide such API. In result you would produce two less coupled modules which will give you more flexibility as project gets bigger.
An observation: smaller module is easier to fit into a reasonable API. Complexity grows exponentially in respect to code size.
2. Immutable state is something I don’t care for, so it worries me that referring to map, filter, etc as “functional programming” may give people the impression that they have to swallow this immutable state business in order to use these things.
The Danger of Equating Map and Filter with Functional Programming
Inspired by You Can’t Get There From Here c2.com article
Every incremental development process suffers from increasing module coupling by definition. Smaller steps give you flexibility to turn around a current point in a development process, but not to jump out of it. With incremental process you are reaching local optimum: the best solution for the problem you are not solving today. But this is not the real issue (at least, you can sell it to someone else). The issue is that you can’t move incrementally from the local optimum due to high coupling. The only way out is to take independent components which are suitable for the new task, jump out of the current point and set up new process based on these components. Efficiency of this jump is measured in total relevance of all these components.
In other words, we need some insurance that some critical amount of investment (1 month, $100K etc.) is not thrown away as a whole thing. To achieve this we should keep the work splitted into small distinct pieces, each of the acceptably low cost.
It is usually recommended to refactor the code in order to extract abstract entities and generalize their API. However, it looks like a stupid game in the same playground: a single project directory tree with 1000 files in it.
Lets take a look at search tree balancing principle: each node should have some optimal number of children. If it has too many children, we have to evaluate linear search in the node. If it has too few, we have to evaluate linear search through the linked list instead of a tree.
Our asset is the code. The efficient evaluation of the code requires to keep it in a good shape. This could mean the following:
- N lines of code per method
- M public methods per class/module (+ M private)
- F modules/files per folder
- L levels of folders per library/dependency
- D libraries/dependency per product/another library.
Each figure is average. You can have 10*N lines method as long as there are ten N/10 line methods. The ultimate goal is to have maximum L*F*M*N lines of code per program (as well as M*N lines per class).
Figures could be something like that: N=7, M=7, F=17, L=3, D=7.
The idea is to limit the amount of code you work with. If you do so you would be pushed to extract least coupled parts out of the project, therefore making them more valuable individually and giving more focus to the essentials.
This implies slightly different mindset comparing to traditional refactoring. You do not look for a way to restructure the program just for making it cleaner: you look for a way to keep as little code as possible by extracting least relevant code into separate external modules.
“Another example of the inefficiency of large organizations. Individuals have little to gain in successfull projects (the company won’t make them rich), but much to lose in unsuccessful ones (they could loose their job). So the rational decision is to avoid risk, as it is not balanced by return.”
See also Python paradox
“It looks like what they’re doing are stackless user level threads and this means they don’t play nice with C calls as they don’t allow calling into C and then back into the language (as they can’t save the C stack). This may not sound like a problem until one considers how almost all C library bindings involve callbacks (xml parsers, graphics, audio, media processing, networking, etc).”
Caution: the post is 3 years old.
The most interesting one is ooc — another attempt to add objects, inheritance and improve packaging in C. via Steve Dekorte“An event to bring together bright folks working on unfinished or recently finished programming languages. Even relatively young languages like Scala would not qualify; this event is all about the sharpest part of the cutting edge.”
We assume you had Leopard with standard Ruby shipped with OS, tons of macports and rubygems already installed. Then you install Snow Leopard on top of it (not clean install).
The problem is that standard 10.6 dynamic libraries all went 64 bit and could not be linked with 32 bit code. This could be fixed by rebuilding/reinstalling all the macports and rubygems.
1. Install the latest Xcode shipped with Snow Leopard.
2. “port” command will fail with a message about incompatible Tcl architecture. The proper version of macport is 1.8 which is not released yet. You can obtain it from SVN trunk and built it by hand (./configure && make && sudo make install). I have also added trunk version of ports to sources.conf (see the link above for instructions) to be sure that I have the latest SL-compatible ports in the list. Maybe this is was wrong assumption, but it worked for me just fine.
3. Remove all ports:$ sudo port -f uninstall installed
4. Update rubygems to 1.3.1 at least (please google for instructions).
5. Remove vendor gems (gem uninstall refuses to remove them and fails to do batch remove):
$ gem list | cut -d" " -f1 > installed_gems
$ sudo mv /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8 /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8.bak
$ sudo mkdir /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8
Note: installed_gems file contains a list of all installed gems so that you can cat and xarg it for installing all gems back.
6. Uninstall all gems
$ sudo gem list | cut -d" " -f1 | xargs sudo gem uninstall -aIx
7. Make the rubygems use 64 bit architecture.
$ cat installed_gems | xargs sudo env ARCHFLAGS="-Os -arch x86_64 -fno-common" gem install --no-ri --no-rdoc
Note: it is NOT i686 (as I thought it should be), it is x86_64 instead.
Here are convenient aliases for “sudo gem install” for both architectures:
alias sgi32="sudo env ARCHFLAGS=\"-Os -arch i386 -fno-common\" gem install --no-ri --no-rdoc"alias sgi64="sudo env ARCHFLAGS=\"-Os -arch x86_64 -fno-common\" gem install --no-ri --no-rdoc"
You may also put alias sgi="sgi64" in your .bash_login on snow leopard.
Now we all can proceed doing productive work. \o/
References:
1. http://www.nabble.com/-MacPorts—19446:-openssl-fails-to-compile-on-x86_64-td23247203.html
2. http://jaredonline.posterous.com/got-mysql-to-work-with-rails-in-mac-os-106-sn
3. http://cho.hapgoods.com/wordpress/?p=158
“Rule 1. The sum of a signed value and an unsigned value of the same size is an unsigned value.”
“If developers are allowed to write "to the hardware,” the result is a broken platform where the vendor can’t move forward without breaking the apps.“
Apple added closures support to C, C++ and Objective C with lightweight threading and multicore balancing.
x = ^{ printf(“hello world\n”); }
Snow Leopard shows a rare case when upgrade brings you more valuable performance and bug fixes rather then incredible new features.
“Writing a Mandelbrot set calculator in a high level language is like trying to run the Indy 500 in a bus. While it might be amusing for a car magazine to test circuit times for various busses on a race course, it really tells potential bus buyers very little about which bus they should buy as the problem of cost efficiently maximizing transportation throughput on normal roads involves different tradeoffs than the problem of no-expense-spared maximization of transportation speed on an extremely windy road.”
W3C box model (width specifies pure content area width) represents bottom-up philosophy where content specifies the area for itself. Parent elements should adapt to it.
IE6 box model (width specifies content area with borders and paddings) represents top-down philosophy where designer specifies available spaces for nested elements.
In fact, content designers don’t care about the container width and html designers think better in terms of IE6 box model. Thanks to stupid standards, we all have to make additional calculations in our heads when working with styleshits.
According to previous note on interfaces in dynamically typed languages, it would be great if the API could specify type expectations even easier than a to_type message send to each argument in method body.
class Person
def initialize(name.to_s, birthday.to_date = DEFAULT_DATE)
...
end
end
Should be interpreted as:
class Person
def initialize(*args)
args.size == 2 or raise ArgumentError
name = args[0].to_s
birthday = (args[1] || DEFAULT_DATE).to_date
...
end
end
This idea can be applied to other languages as well.
Why’s that? Consider Dostoevskiy’s “Crime and Punishment”: it is just 1 Mbyte. And what size is your codebase?
Object interface is a set of messages with defined behavior the object should respond to.
In statically typed languages, interface is required by type declaration. In dynamically typed languages this is done by telling object of unknown type to cast itself to the desired type.
# User's code expects object responding to #to_page
# when casted to page, we expect proper #render and #size behavior
class SomeController
def process(object)
page = object.to_page
page.render
page.size
end
end
# 1. Page class with #to_page interface returning self
class Page
def to_page
self # return self since it is already a page
end
# page public api
def render
end
def size
end
end
# 2. Non-page class with #to_page interface
class AtomicBomb
def to_page
# return a relevant article
Page.wikipedia_article("Nuclear weapon")
end
end
# 3. Class without #to_page, but with #render method
# fails with "object does not respond to #to_page" exception
# and does not cause undesirable side effects
class Foo
def render
# very specific nasty rendering method
end
end
# 4. Class without #render
# fails with "object does not respond to #to_page" exception
# rather than with less descriptive "object does not respond to #render"
class Baz
end
This technique minimizes duck typing collisions by reducing the number of exposed methods to a single “to_{unique_type_name}” method. It also protects you from inventing obtrusive method names with type prefixes such as “page_render” or “page_size_in_characters” (see example above).
The rule of thumb:
1. When API consists of more than one method, introduce #to_my_type method
2. Whenever you receive an object from an unknown source (e.g. defined in a different file), use explicit type casting with #to_some_type method.
Note: never ever make others ask about the kind of an object using #respond_to?. This method should be used for legacy code and indicates possible duck typing issues.
Also, #is_a?(SomeAbstractInterfaceModule) is considered badly designed compared to #to_* methods.
class BlankSlate
class <<self; alias __undef_method undef_method; end
alias __instance_eval instance_eval
ancestors.inject([]){|m,a| m + a.methods }.uniq.
each { |m| (__undef_method(m) rescue nil) unless m =~ /^__/ }
end
class MyProxy < BlankSlate; end
Note 1: ancestors.inject{…} ensures that all Kernel methods like :p are properly removed.
Note 2: alias __instance_eval could be safely removed if you don’t need this method.
“Giving every piece of data a fixed identity, is radically different from the relational model which deals only with sets of values and leaves the notion of identity up to the application. Working with identities as a first-class notion is essential if schema is to be flexible. Long before we can agree on the exact shape of the data used to represent a person or a building, we can agree that individual people or buildings exist and that they have certain obvious attributes that we might want to record: height, address, builder, etc.”
JavaScript transactional memory extension
“The name, if you’re wondering, comes from the simplest sequence of operations which will thoroughly mix the bits of a value - "x *= m; x = rotate_left(x,r);” - multiply and rotate. Repeat that about 15 times using ‘good’ values of m and r, and x will end up pseudo-randomized"
Austin Appleby.