webdevjeff.us

Web Developer Jeff George

Blog

Taking up collections

Introducing arrays and hashes in Ruby

Oct. 3, 2015

In the Ruby programming language, two main data types carry the bulk of the load when it comes to handling collections: arrays and hashes. Understanding how arrays and hashes are alike, and how they are different, is critical to programming successfully in Ruby.

Arrays are for lists

An array is a list of values arranged in a sequential order. The easiest way to create an array is to declare a variable, and set it equal to the contents of the list we intend to save. It is common convention to give arrays variable names that are plural, to suggest that the variable represents a collection of items. Here, we make an array of our favorite pies:

  • pies = [ "pumpkin", "apple", "pecan" ]

There are other ways to create an array, such as by calling Array.new or by using the Array() method, but those techniques are beyond the scope of this post.

Since arrays store values in a specific sequence, each value identified by an integer index number which names it's place in the sequence. In typical computer-science fashion, array indexes begin counting at 0, so the first item in the array has the index 0, the second item has the index 1, and so on. To access a value in an array, we use the name of the array, followed by the value's index in square brackets. Here, we want the second item in our pies array, so we use the index [1].

  • $ pies[1]
  • => "apple"

Calling puts with an array as its argument will print each member of the array on a separate line, like this:

  • $ puts pies
  • pumpkin
  • apple
  • pecan
  • => nil

(Recall that puts always returns the value nil.)

We can add another item to our array using the push method, which can be used with our without parentheses. The item to be added to the end of the array is included as the argument to the push method, but parentheses around that argument are optional. Continuing with our pies array:

  • $ pies.push( "cherry" )
  • => ["pumpkin", "apple", "pecan", "cherry"]
  • $ pies.push "mud"
  • => ["pumpkin", "apple", "pecan", "cherry", "mud"]

You can also add items to an array with the "shovel" operator, <<. The shovel operator can be chained to add several items on a single line, as well. We'll start a new array of integers for this example.

  • $ numbers = [1, 2, 3]
  • => [1, 2, 3],
  • $ numbers << 4
  • => [1, 2, 3, 4]
  • $ numbers << 5 << 6 << 7
  • => [1, 2, 3, 4, 5, 6, 7]

To remove the last item from an array, you use the opposite of push, which is pop. This method permanently removes the last item, and returns its value.

  • $ new_numbers = [10, 20, 30]
  • => [10, 20, 30]
  • $ numbers.pop
  • => 30
  • $ numbers
  • => [10, 20]  # array contents after pop

If you want to remove the first item in the array instead, you can use shift, which so named because it removes the first item from the array, and shifts the remaining items down one index. The opposite of shift is unshift, which takes a value as an argument (again, parentheses optional), and inserts that value into the first position of the array, while "unshifting" the remaining elements up one index. shift returns the item removed from the array, while unshift returns the entire array, including the newly-added item in the first position.

  • $ other_numbers = [25, 50, 75]
  • $ numbers.shift
  • => 25
  • $ numbers
  • => [50, 75]  # array contents after shift
  • $ numbers.unshift(45)
  • => [45, 50, 75]  # array contents after unshift

There are literally dozens of other methods that can be used to manipulate arrays, adding items to them, taking items away, sorting them, re-ordering them, finding specific values or ranges within them, etc. You can find a complete listing of array methods in the official Ruby documentation, and a tutorial covering Ruby arrays at sitepoint.com.

You may have noticed that in all my example arrays so far, every item in each array has been of the same data type. Our pies array consisted entirely of strings, while numbers was all integers. In fact, there is no restriction in Ruby that arrays be of the same data type. You can mix and match as many data types in a single array as you please. The following is a perfectly legal array in Ruby:

  • $ peyton_manning = [18, "QB", 65.4, ["Colts", "Broncos"]]
  • => [18, "QB", 65.4, ["Colts", "Broncos"]]

This array contains an integer, a string, a float, and even another array, and it's all perfectly legal. But just because you can do something, doesn't mean you necessarily should do it. The items in this array all describe NFL quarterback Peyton Manning, but in order for them to be useful, we have to memorize which index in the array points to which piece of information. For example, to find out what Manning's pass-completion percentage is, we have to know to ask for peyton_manning[2]. You'd think there'd be a better way to store collections of related data of differing types, and there is. Ruby calls it a hash.

Making a hash of things

Like arrays, a Ruby hash is a data type storing a collection of items, and like arrays, hashes can contain data of any and all types, in any combination. The difference between an array and a hash is in how you access that data.

Remember that an array is an ordered set of values, and you access the values you want using a numeric index. In recent versions of Ruby, a hash also happens to be an ordered set of values (in early Ruby versions, hashes weren't ordered), but you don't access their values using indexes. Instead, data in a hash is stored as key-value pairs. That is, for every value in the hash, there is a key that lets you access it. For example, if I created a hash to keep track of my pets, I might store within it the key :dog, accessing the value "Moose". Hashes are commonly created in a manner very similar to how we set up arrays, using a structure called a "hash literal". A hash literal creating my full pets hash might look something like this:

  • jeffs_pets = {
  • :dog => "Moose",
  • :turtle => "Max",
  • :fish => ["Larry", "Curly", "Moesha"]
  • }

For readability, I typed each key-value pair in the hash on its own line, but I could have legally crammed them all onto a single line. Note that while we used square brackets to contain our array, hashes are held in curly brackets. For each key-value pair in the hash, the key is listed first, followed by the "hash rocket" operator consisting of an equals sign and a greater-than sign, and then the value. Values can be of any type; in this hash, the first two values are strings, but the third value is an array, which is how we assign more than one value to a single key.

Technically, keys can be any value type as well, but hashes are usually constructed using symbols as keys. (Symbols are a special type of object, similar to strings, but with some special properties we don't need to delve into here.) Symbols can be recognized because they are always begin with a colon, as in :turtle. The syntax to access the values within a hash is similar to that used to get to the values within an array, but in place of the array's numeric indexes, we use the hash's keys inside the square brackets. Lets check out the names of my pets:

  • jeffs_pets[:dog]
  • => "Moose"
  • jeffs_pets[:fish]
  • =>["Larry", "Curly", "Moesha"]
  • jeffs_pets[:cat]
  • => nil

So, entering jeffs_pets[:dog] returned my dog's name, Moose, and jeffs_pets[:fish] returnd the full array containing the names of all my fish. When we asked for my cat's name, with jeffs_pets[:cat], Ruby shrugged and said "nil," which is perfect, since nil, zilch, nada, bupkis, is exactly how many cats I own. In fact, when you ask for the value of a key that is not included in the hash, Ruby will return nil every time. Incidentally, if we want to get to the name of my second fish, we'd just line up two square-bracketed indexes behind the hash name—first the hash key, then the numeric index within the array, like this:

  • jeffs_pets[:fish][1]
  • => "Curly"

Although Ruby does remember the order of the key-value pairs in a hash, programmers for the most part don't care. Because of this, we don't need a bunch of different methods for accessing, inserting, or removing items in the hash according to their position in the sequence. Adding a new key value pair is a lot like declaring and setting the value of a variable—you name it, then set it equal to the value.

  • jeffs_pets[:chicken] = "Colonel"
  • => "Little"
  • jeffs_pets
  • => {:dog=>"Moose", :turtle=>"Max", :fish=>["Larry", "Curly", "Moesha"], :chicken=>"Colonel"}

To remove a key-value pair from a hash, you use the delete method, with the name of the key as the argument. This will return the value of the deleted key. Poor Max...

  • jeffs_pets.delete(:turtle)
  • => "Max"
  • jeffs_pets
  • => {:dog=>"Moose", :fish=>["Larry", "Curly", "Moesha"], :chicken=>"Colonel"}

As is the case for arrays, Ruby offers dozens of methods for manipulating hashes. You can read about hash methods in the Ruby docs, or get a more newb-friendly tutorial on hashes at sitepoint.com.

Iteration nation

One of the main reasons we use collections like arrays and hashes to store related data is so that we can work with the entire collection...collectively. We often want to search through a collection for specific bits of information, or perform the same action on every single item of the set. Because we've organized our collections into arrays and hashes, ruby lets us accomplish these things easily, using looping and iteration.

When we loop through an array, or iterate over an array or a hash, we are systematically applying the same bit of code to each item in the collection, exactly one time each. Because arrays are accessed by numeric indexes, we have more flexibility in how we iterate over them than we do with hashes. For starters, we can use simple loops, limited by counters based on the array.size, to do something to every item in an array. (Did you notice I slipped a new method, size, in there? It returns an integer value equal to the number of items in the array.) Here, we'll use an until loop to print out all the members of a new array, beatles:

  • $ beatles = [ "John", "Paul", "George", "Ringo" ]
  • $ counter = 0
  • $ until counter == beatles.size
  • $ puts beatles[counter]
  • $ end
  • John
  • Paul
  • George
  • Ringo
  • => nil

In this example, our counter started at zero, and went up by 1 with each iteration of the loop. By the time the program had printed "Ringo," the counter had counted up to four. Since the counter was no longer less than the size of the array, the "until" condition was fulfilled, and the loop terminated, having printed the name of each member of the Beatles to the terminal.

Because we don't use numeric indexes to access the values in hashes, we can't use counters to loop through them with simple tools like while and until. Fortunately, we have the each method, an iterator that does much the same thing, and doesn't care that it doesn't have indexes to work with. Let's store a bit more information about the Beatles in a hash, then use each to print it back out.

  • beatles = {
  • :lead_vocals => "John",
  • :bass => "Paul",
  • :lead_guitar => "George",
  • :drums => "Ringo"
  • }
  • beatles.each { |key, value|
  • puts value + " played " + key.to_s + "."
  • }
  • John played lead_vocals.
  • Paul played bass.
  • George played lead_guitar.
  • Ringo played drums.
  • => {:lead_vocals=>"John", :bass=>"Paul", :lead_guitar=>"George", :drums=>"Ringo"}

Sure, the sentences are clunky, but you get the idea. Conveniently, arrays have an each method as well. Though what's going on under the hood for the array version of each is a bit different than it is for hash version, the syntax is just about the same. Since Ruby lets us put our each block on a single line if it's simple enough, we'll try that here:

  • beatles = [ "John", "Paul", "George", "Ringo" ]
  • beatles.each { |item| puts item }
  • John
  • Paul
  • George
  • Ringo
  • => ["John", "Paul", "George", "Ringo"]

If you want to learn more about Ruby loops and iterators, read Alan Skorkin's short-but-sweet introduction to the topic, A Wealth of Ruby Loops and Interators.

The right tool for the job

You've got two different kinds of collections in Ruby&mspace;arrays and hashes&mspace;each with its own strengths. So how do you choose which one to use in any given situation? You have to think about what kind of data you're planning to store, and what you plan to do with it.

If you've got a list made up of lots of examples of the same kind of thing, such as moves on a chessboard, flavors of ice cream, or your weight recorded every morning for a month, you'll probably want an array. Storing this sort of data in an array makes it easy to put in alphanumeric order, perform statistical calculations on (such as finding the min, max, median, average, etc.), eliminate duplications, and add more values to. If the order of the items in your collection matters, and especially if you anticipate reorganizing that order, then an array is definitely your best bet.

On the other hand, if your data includes many different kinds of information about a single topic, you probably want a hash. A hash is well-suited to storing customer data collected from a form, for example, which might include strings (first and last name, address), customer number and order number (integers), and products ordered (array). Storing this information in a hash will allow you to attach a label, in the form of a hash key, to each piece of information, such as :first_name, :last_name, :street_address, etc. You can then use the hash keys to retrieve the exact value you need, such as the customer's zip code, instead of having to iterate through the collection searching for it, as you would have to do if your customer information were stored in an array.

Arrays and hashes become even more powerful, however, when you remember that they can both hold arrays and hashes as values, along with other types of objects. For example, you might store the information about each customer in a hash, and then store all of those hashes in an array of customers! Working together in this way, there's almost no collection of data collection that you can't store with Ruby's arrays and hashes.