Automated Memoization!
June 9th, 2007
I saw an interesting post on something that allows you to annotate Java fields for caching. That’s shiny and all, but I like to live in the Ruby world (though annotations in Java are finding some very neat uses).
In case you came hunting for Part III of the Rails tutorial, don’t worry—it’s coming, but this caught my attention first.
Now, in the Java world, this is the pattern demonstrated:
public int getArea(){
if (area == null)
area = new Integer(getAreaCalculated());
return area;
}
In the Ruby world, we have a more idiomatic way of expressing it:
def area
@area ||= calculate_area
end
The ||= operator serves as a shortcut for:
@area = (@area || calculate_area)
It’s basically the self-referencing form of the || operator. It’s important to know how the || operator works when trying to figure out what’s going on. In Java, if I type "test" || "hello", I get a compilation error. Why? Because the || operator is only defined for booleans. We’re trying to use it for Strings above. In Ruby, however, this is not the case. In Ruby, much like in many C derivatives, the || operator evaluates to the first non-false value, from left to right. Remember, too, that Ruby considers nil and false equivalent. So what would happen if we wrote "test" || "hello" in Ruby? We would get "test". What about nil || "hello"? We get "hello".
So let’s look a little closer at the ||= operator, then. When we do what we did above, we’re basically saying `put @area into @area, unless @area evaluates to false, in which case put the results of calculate_area into @area’. In the Java article, this is called caching; in the Ruby world, it seems to more commonly be referred to as memoization, which also seems the more appropriate term.
A Flexible Memoization Solution for Ruby
Devising an API
The aforementioned article mentions a mechanism for annotating Java methods as cached by marking the class as @Reloadable and the method as @Cached. We’re going to do something similar, but with Ruby. Our first purpose will be to memoize any given method. The syntax I’m going for is1:
memoize_method :my_method
I also want to simplify the creation of readers for certain instance variables that we want to memoize, but that are based on initialization code. For example, in a scraping script I wrote for getting downloads for a specific section off of the Sakai CLE, I have this bit of code:
# Provides the configuration items for this user.
def user_config
@config ||= (read_cached_config || read_config)
end
In this case, I either get the configuration from a separate (file-based) cache or I read it explicitly from the website. This kind of double-caching is simple enough that we probably don’t want to write an entirely separate method for it, and it’s returned from a method with a different name than the caching methods to boot. Thus our solution of memoize_method isn’t as elegant. So on top of that, let’s add another syntax:
memoized_reader :user_config { read_cached_config || read_config }
This lets us create an attribute reader just like attribute_reader does, except this one will be memoized, and will use the passed block for the initialization code. Golden! In this same general vein, though, comes the use case of memoizing an attribute with only one method as the initializer. With the above syntax we would do:
memoized_reader :user_config { read_config }
But we don’t really need a full-on block for that, so let’s switch the syntax up for that:
memoized_reader :user_config, :read_config
But here what we’re really doing is a generic pattern of memoizing the result of one method and making it available as another one. So maybe we can use:
memoize_method :read_config, :as => :user_config
Where by default it memoizes under the same name, and, if you want to, you can provide a different name2.
Building the Background Syntax
Now, this provides us with two methods for our API. But are they so different? The only clear difference between memoized_reader and memoize_method is that memoized_reader will create an instance variable of the same name you pass it, whereas memoize_method will do… Something else. We don’t really know what; it’ll store the results in some nebulous place that we don’t really need to know about. But let’s unify syntaxes a little. Right now, we pass memoize_method a method name and, optionally, an alias for it. But it doesn’t take a block. memoize_reader, on the other hand, will take a block, but it won’t take an alternate method name under which to return the memoized variable.
In the background, hidden from a prying developer’s view, why don’t we introduce a unifying syntax? It’ll be a little uglier, but more flexible, and will allow the other two to exist on top of it. Here’s an idea for the syntax:
memoize :as => :user_config { read_config }
memoize :read_config, :as => :user_config
memoize :read_config, :as => :user_config, :in_variable => :@config
memoize :as => :config, :in_variable => :@config { read_cached_config || read_config }
Why use this syntax? It unifies the syntaxes for both of the other methods. Why not use it as the main syntax? It’s not as clean and shiny. This is that bastard syntax we don’t want people to know we came up with because they might kill us with sticks, you know?
So now, we’ve got a full planned syntax. Time to write code!
Spec’ing it Out!
That’s right. Our first step? Write some specs! Let’s BDD it up! We’ll use RSpec to write some specs for what we’re expecting these fine methods to do:
require 'automated_memoization'
describe AutomatedMemoization, 'when memoizing my_method' do
before(:all) do
class TestClassA
include AutomatedMemoization
def my_method; Time.now; end
memoize :my_method
end
end
before(:each) do
@test = TestClassA.new
end
it 'should alias my_method to my_method_without_memoization' do
lambda { @test.method(:my_method_without_memoization) }.should_not raise_error(NameError)
end
it 'should call the memoized method once and only once' do
@test.should_receive(:my_method_without_memoization).once.and_return(Time.now)
3.times { @test.my_method}
end
it 'should return the same object after the first call to my_method' do
orig_value = @test.my_method
@test.my_method.should equal(orig_value) # equal checks for object identity!
end
end
At its base, we require that when memoizing a method we get a certain alias and that alias is only called once (since the results should be memoized). We also require that calls past the first one always return the same object. We test this with should equal(), which checks for object identity—i.e., that they are exactly the same object, not just equivalent ones. Because we check for identity, it isn’t really necessary to use the time (which would be useful if we were just checking for equality), but hey, whatever.
Basic Memoiziation
Let’s get ourselves some basic memoiziation, then:
module AutomatedMemoization
# Include methods in ClassMethods as (surprise!) class methods.
def self.included(mod)
class << mod; include ClassMethods; end
end
# Make sure we have a memoization hash when we need it.
def __memoize_hash__
@__memoized__ ||= {}
end
module ClassMethods
def memoize(method)
old_method_name = "#{method}_without_memoization".to_sym
new_method_name = "#{method}_with_memoization".to_sym
self.class_eval do
define_method(new_method_name) do |*args|
self.__memoize_hash__[method] ||= self.send(old_method_name, *args)
end
alias_method(old_method_name, method) # alias out the current method
alias_method(method, new_method_name) # alias in the new method
end
end
end
end
There’s a bit of Ruby magic going on here, so let’s think about what’s happening. First of all, we’ll be sticking our code into a module. We can mix that module in wherever we need memoization, or, if we really want to, we can mix it into Module to get it everywhere. Inside this module, we define included. This method gets called whenever this module is included somewhere else. Because inclusion by default includes all methods as instance methods, and we want these as class methods, we use the class << mod syntax to get the class’s metaclass and include the class methods there, effectively making them class methods.
Inside the class methods, we have the memoize method. This method currently just takes one parameter, the name of the method we’re going to memoize. Then, we generate the name we’re going to be renaming that method to and the name we’ll be using for the new method. Finally, we do an instance_eval on self. self in this case will be the class, so a class_eval does basically the same thing as reopening the class for modification. First, we create a new method, using our new method name, and make it automatically add an entry to an @__memoized__ hash. Notice also that we take an arbitrary number of parameters and pass them on to the original method—this is because we don’t actually know whether the method we’re memoizing takes parameters or not. If it does, great! If not, then it’ll still work fine. You’ll notice that we use a memoizing method to create that hash the first time we need it. We call the instance variable and its corresponding method with surrounding __s largely to avoid name clashes, as well as to emphasize that they shouldn’t really be accessed outside of what we’re doing. Then, we alias the method to what we’ll be using as its `old’ name, and alias the new method to the method’s original name3. Note also that we use alias_method instead of alias. This is because alias, if given the two variables we gave it, would interpret those as method names, since it’s an actual Ruby keyword and a Ruby statement. alias_method takes two strings or symbols and deals with those as such.
Now, our new method should take care of all the magical memoization! Our next step is to add the :as parameter.
Spec’ing and Coding :as
Here’s a spec (assuming it’s already got TestClass as above):
describe AutomatedMemoization, 'when memoizing my_method as my_memoized_method' do
before(:all) do
class TestClassB
include AutomatedMemoization
def my_method; Time.now; end
memoize :my_method, :as => :my_memoized_method
end
end
before(:each) do
@test = TestClassB.new
end
it 'should keep my_method' do
lambda { @test.method(:my_method) }.should_not raise_error(NameError)
end
it 'should create my_memoized_method' do
lambda { @test.method(:my_memoized_method) }.should_not raise_error(NameError)
end
it 'should call the memoized method once and only once' do
@test.should_receive(:my_method).once.and_return(Time.now)
3.times { @test.my_memoized_method}
end
it 'should return the same object after the first call to my_memoized_method' do
orig_value = @test.my_memoized_method
@test.my_memoized_method.should equal(orig_value) # equal checks for object identity!
end
end
Here, we’re checking a few things. First of all, we’re checking that my_method isn’t aliased in this case (no need, since there won’t be any name clashes). Second of all, we’re checking that my_memoized_method is at least created. Third, we’re checking that the memoized method is called once (as before), and that we get the same object every time from my_memoized_method.
Now let’s see the code that makes this happen (I’ll replicate the above code with relevant changes made, but only in the memoize method):
module AutomatedMemoization
module ClassMethods
def memoize(method, options = {})
options = { :as => method }.merge(options)
if (options[:as] == method)
old_method_name = "#{method}_without_memoization".to_sym
new_method_name = "#{method}_with_memoization".to_sym
generate_memo_method(old_method_name, new_method_name)
self.class_eval do
alias_method(old_method_name, method) # alias out the current method
alias_method(method, new_method_name) # alias in the new method
end
else
generate_memo_method(method, options[:as])
end
end
# Generates a memoizing method for +method+ named +memoized_method+.
def generate_memo_method(method, memoized_method)
self.class_eval do
define_method(memoized_method) do |*args|
self.__memoize_hash__[method] ||= self.send(method, *args)
end
end
end
end
end
Now we’ve added a helper method that generates a memoizing method by a certain name that memoizes a method of a different name. `Ah-hah!’ you might be thinking. `There’s a difference here! This code creates a different entry in the __memoized__ hash than the other one does!’ That’s very true. When we use memoize without providing an :at parameter, we use the old_method_name for the entry in the hash, whereas before we were using the original method name. Not to worry, however. That’s an implementation detail, and no one should be relying directly on our code besides ourselves anyway.
So what’s the next option we need to support? :in_variable, which makes us store the results of the method in a specific instance variable.
Spec’ing and Coding :in_variable
First, let’s spec out this functionality4:
describe AutomatedMemoization, 'when memoizing my_method into an instance variable' do
before(:all) do
class TestClassC
include AutomatedMemoization
def my_method; Time.now; end
memoize :my_method, :in_variable => :@my_var
end
end
before(:each) do
@test = TestClassC.new
end
it 'should alias my_method to my_method_without_memoization' do
lambda { @test.method(:my_method_without_memoization) }.should_not raise_error(NameError)
end
it 'should call the memoized method once and only once' do
@test.should_receive(:my_method_without_memoization).once.and_return(Time.now)
3.times { @test.my_method}
end
it 'should return the same object after the first call to my_method' do
orig_value = @test.my_method
@test.my_method.should equal(orig_value) # equal checks for object identity!
end
it 'should store the object returned after the first call to my_method ' +
'in the @my_var instance variable' do
orig_value = @test.my_method
@test.instance_variable_get(:@my_var).should == orig_value
end
end
Now, let’s code this baby out:
def memoize(method, options = {})
options = { :as => method, :in_variable => nil }.merge(options)
if (options[:as] == method)
old_method_name = "#{method}_without_memoization".to_sym
new_method_name = "#{method}_with_memoization".to_sym
generate_memo_method(old_method_name, new_method_name, options[:in_variable])
self.class_eval do
alias_method(old_method_name, method) # alias out the current method
alias_method(method, new_method_name) # alias in the new method
end
else
generate_memo_method(method, options[:as], options[:in_variable])
end
end
# Generates a memoizing method for +method+ named +memoized_method+.
# If non-nil, destination_var is the instance variable within which the memoized
# value will be stored.
def generate_memo_method(method, memoized_method, destination_var)
if destination_var.nil?
self.class_eval do
define_method(memoized_method) do |*args|
self.__memoize_hash__[method] ||= self.send(method, *args)
end
end
else
self.class_eval do
define_method(memoized_method) do |*args|
self.instance_variable_get(destination_var) or
self.instance_variable_set(destination_var, self.send(method, *args))
end
end
end
end
So, admittedly, this isn’t the shiniest looking code. See that big if statement in generate_memo_method where both of them do a class_eval? It’d be nicer to stick that if statement inside the class_eval, right? Maybe we’ll go ahead and fix some of this in the next round… Notice also that we don’t use the ||= syntax with the instance variable setting version, because there’s no way to do it - we have no =-based accessor. Instead, we use i.e., just about everything will run before it gets checked, without needing extra parentheses.instance_variable_set and instance_variable_get to return either it or the appropriate value after we set it. The or operator will run everything to its left and then everything to its right if the code to its left evaluated to false or nil. In particular, the difference between or and || is that or has very low precedence -
Another alternative to the above approach is to pass a block containing the self.send bit to a setter method which handles the logic of whether to use the instance variable or the memoize_hash. Again, though, if we did this, we would get the check for whether destination_var was the one to use every time we ran the method.
Now that we’ve got that done, all we’ve got left is accepting the block instead of a method! We’re in the home stretch!
Spec’ing and Coding the Block Acceptor
So let’s write some specs for the system when it takes a block:
describe AutomatedMemoization, 'when memoizing a block as my_memoized_block' do
before(:all) do
class TestClassD
include AutomatedMemoization
end
block = Proc.new { Time.now }
TestClassD.class_eval do
memoize(:as => :my_memoized_block, &block)
end
@block = block
end
before(:each) do
@test = TestClassD.new
end
it 'should create my_memoized_block' do
lambda { @test.method(:my_memoized_block) }.should_not raise_error(NameError)
end
it 'should call the memoized block once and only once' do
@block.should_receive(:call).once.and_return(Time.now)
3.times { @test.my_memoized_block }
end
it 'should return the same object after the first call to my_memoized_block' do
orig_value = @test.my_memoized_block
@test.my_memoized_block.should equal(orig_value) # equal checks for object identity!
end
end
As usual, we check that the block is only called once, that it returns the same object after the first call, and that my_memoized_block is created.
Now, let’s code it up!
module AutomatedMemoization
module ClassMethods
def memoize(method = nil, options = {}, &block)
options = method and method = nil if method.is_a?(Hash)
options = { :as => method, :in_variable => nil }.merge(options)
raise ArgumentError, 'If you pass a block, you must pass the :as option.' \
if options[:as].nil? && method.nil?
if (options[:as] == method)
old_method_name = "#{method}_without_memoization".to_sym
new_method_name = "#{method}_with_memoization".to_sym
generate_memo_method(old_method_name, new_method_name,
options[:in_variable], block)
self.class_eval do
alias_method(old_method_name, method) # alias out the current method
alias_method(method, new_method_name) # alias in the new method
end
else
generate_memo_method(method, options[:as], options[:in_variable], block)
end
end
# Delegates to either +generate_sending_memo_method+ or to
# +generate_calling_memo_method+ to generate an appropriate memoizing
# method.
def generate_memo_method(method, memoized_method, destination_var, block)
self.class_eval do
if block
define_method(memoized_method,
&generate_calling_memo_body(method, memoized_method,
destination_var, block))
else
define_method(memoized_method,
&generate_sending_memo_body(method, memoized_method,
destination_var, block))
end
end
end
def generate_calling_memo_body(method, memoized_method, destination_var,
block)
if destination_var.nil?
Proc.new do |*args|
self.__memoize_hash__[block] ||= block.call(*args)
end
else
Proc.new do |*args|
self.instance_variable_get(destination_var) or
self.instance_variable_set(destination_var, block.call(*args))
end
end
end
def generate_sending_memo_body(method, memoized_method, destination_var,
block)
if destination_var.nil?
Proc.new do |*args|
self.__memoize_hash__[method] ||= self.send(method, *args)
end
else
Proc.new do |*args|
self.instance_variable_get(destination_var) or
self.instance_variable_set(destination_var, self.send(method, *args))
end
end
end
end
end
Ah! Code explosion! What happened? Well, we did a lot of refactoring there to try and minimze the amount of repetition in our code. First of all, we refactored the decision of whether to send a signal to the instance or call a block into two methods that return a proc. We pass that proc to define_method so that the defined method has the appropriate body. We’ve also nested the decision logic inside the class_eval in generate_memo_method so as not to have to repeat class_eval. The rest of the code is largely the same, except that we also raise an ArgumentError if we’re given a block without an :as parameter, since in that situation we wouldn’t have a name for our method. We also now index into our memoization hash using the block as our key when we’re passed a block (instead of a method name). Remember that you can use any object as a key for a hash! As with the last time we changed the index, it’s largely irrelevant, since that is an implementation detail5.
The Public API
Phew! So that’s the implementation of memoize. From there, implementing our other two methods is trivial; I won’t write specs for these largely because this article is terribly long already:
module AutomatedMemoization
module ClassMethods
# Modifies the method specified by +method_name+ to memoize its results.
# Available options:
# [:as] Specifies a name for the memoizing method. Passing +cool_method+,
# for example, would memoize +method_name+ to +cool_method+. Calling
# +method_name+ would get the unmemoized version.
# [:in_variable] Specifies an instance variable to store the use memoized value
# into. Should include the initial @.
def memoize_method(method_name, options = {})
memoize method_name, options
end
# Creates a reader for the variable specified by +var+ that memoizes its results.
# Takes a block to provide an initial value for the variable, or the name of a method
# whose return value will provide an initial value. +var+ should not include the initial
# @, since +memoized_reader+ is made to work like +attr_reader+.
def memoized_reader(var, init_method = nil, &block)
instance_var = "@#{var}".to_sym
if init_method.nil?
memoize :as => var, :in_variable => instance_var, &block
else
memoize init_method, :as => var, :in_variable => instance_var
end
end
end
end
And there we are! A nice, shiny, flexible memoization solution for our Ruby classes.
For the final files (with the exception of the last two methods), hit automated_memoization.rb and automated_memoization_spec.rb.
The idea recently came up to create one of these whereby the results of a method would be memoized according to the parameters to a method, so that different parameters can store different results. But this article is really really long already, so maybe another time.
1 This syntax has the caveat that it needs to occur after the method has already been defined.
2 Some discussion yielded the alternate syntax of memoize_method :read_config, :user_config, as not requiring the additional hash manipulation and being similar to alias_method (though reversing the argument order, which we both agreed was wrong-way-around in alias_method). I decided to go with the :as syntax because I personally think it looks better. YMMV. Feel free to write a simple thin wrapper over it to `fix’ it.
3 Rails’s alias_method_chain wraps this two-step pattern up into one method call.
4 You may be thinking that it might be a good idea to factor a lot of the common code between this spec and the first one out into a shared behavior. Another approach might even be to factor the memoization without naming, memoization with naming, and memoization to an instance variable out into separate shared behaviors, and combine those into the various tests. You’d probably be right, but bear with me for the purposes of this article.
5 It is possible that blocks/methods have a worse hash function than do Strings, and this may slow down the implementation; nonetheless, at a high level, the difference is minimal.
7 Responses to “Automated Memoization!”
Sorry, comments are closed for this article.
June 12th, 2007 at 03:55 AM
good! but i cannt understand it. haha 哈哈
June 13th, 2007 at 09:28 AM
Well that’s unfortunate. Is there anything about it that’s particularly unclear? It did end up a little more complex than I originally expected, due to all of the varying forms I introduced. But that was mostly my fault :-P
June 13th, 2007 at 08:02 PM
no,you are very good.
something,i an a biginer ,so it is my fault.
haha,i will study hard to understand it .
i am a Chinese ,so my english is very poor , can you understand me?
_
Have a good day , thanks for the reply. :-P
June 14th, 2007 at 10:14 AM
Yeah, I understand you fine :-) Let me know if there's anything I can do to help you understand the above code. Like I said, it's a little more complicated than I originally anticipated, and it uses a lot of Ruby magic to achieve what it does :-)
June 14th, 2007 at 10:07 PM
hehe,my hero . after 24hours ,i find i must learn ruby.
the truth, i have not study it at all.Tanks a lot for the replies. _^
A problem ,your blog is not well on Windows . but it is well on Firefox.
_^
here is the screemshots:
http://img.photo.163.com/OKwCKyf8mJZWmqbJUek-5A==/172544160723794413.gif
http://img.photo.163.com/z8fPicz5xzlgY9WCngL9XQ==/181269885001821611.gif
June 15th, 2007 at 12:09 AM
Oh dear. I’m not terribly surprised, of course. I’ll try and fix that soon.
June 15th, 2007 at 05:42 AM
_