Welcome to Riak. Riak is a distributed, decentralized data storage system. In the wiki, you will find the “quick start” directions for setting up and using Riak. There are 10 keys in the result, as expected, such. However, when I use some of these keys in a KV query, Riak doesn't find the item. This is not caused by this particular key just being removed by another process, if I repeat (both index and KV query) in an hour, the results are the same. What might be the reason for such behavior?
Tool for migrating data from one or more buckets in a Riak K/V storeinto another Riak cluster by exporting all data from bucketsto disk, and then allowing the user to load one or more of the dumped bucketsback into another Riak K/V host or cluster.
This data migrator tool may be helpful in the following scenarios:
The app works by performing a streaming List Keysoperation on one or more Riak buckets, and then issuing GETs for the resulting keys (parallelized as much as possible) andstoring the Riak objects (in Protocol Buffer format) on disk.The key listing alone involves multiple iterations through the entire Riak keyspace, and is not intended for frequentusage on a live cluster.On the import side, the app reads the exported Riak objects from files on disk, and issues PUTs to the target cluster.
Do NOT use for regular data backup on a live cluster. Instead, see the Backing Up Riakdocumentation for recommended best practices. Again, the reason for this admonition is: if new data is written to a live clusterafter the export operation is started, that new data is not guaranteed to be included in the export.
To transfer data from one Riak cluster to another:
app.config
files are the same in both clusters.Meaning, settings such as default quorum values, multi-backend and MDC replication settings, should be the samebefore starting export/import operations.-d -t
options to export from one cluster and -l -t
to import into the target cluster.Note: Do this only if you set custom properties such as non-default quorum values,pre- or post-commit hooks, orset up Search indexingon a bucket.-d
option, to files on disk (the objects will be stored inthe binary ProtoBuf format)-l
option.You can download the ready to run jar file at:http://ps-tools.data.riakcs.net:8080/riak-data-migrator-0.2.4-bin.tar.gz
After downloading, unzip/untar it, and it's ready to run from its directory.
Make sure Apache Maven is installed
Build the Riak Data Migrator itself, using Maven. First, fork this project, and git clone
your fork.
Usage:java -jar riak-data-migrator-0.2.4.jar [options]
Options:
Dump (the contents of) all buckets from Riak:java -jar riak-data-migrator-0.2.4.jar -d -r /var/riak_export -a -h 127.0.0.1 -p 8087 -H 8098
The naval warfare they have in their new game sure looks like a ship load of fun.Key Feautures. Rome 2 total war steam key. It is a strategy game where players are tasked with raising an empire, maintaining cities and waging war when necessary.The game features many playable factions, each with their own unique units and play style.Total War: Rome 2 features deep, Real Time Strategy combat, letting players control individual groups of men in battle to best maximize their effectiveness against the enemy.It features an amazing 3d campaign map with incredible detail and structure. The creators of the award-winning Total War series set a new benchmark for strategy gaming quality and depth, taking their signature mix of real-time and turn-based gameplay to new heights.
Load all buckets previously dumped back into Riak:java -jar riak-data-migrator-0.2.4.jar -l -r /var/riak-export -a -h 127.0.0.1 -p 8087 -H 8098
Dump (the contents of) buckets listed in a line delimited file from a Riak cluster:
Export only the bucket settings from a bucket named 'Flights':java -jar riak-data-migrator-0.2.4.jar -d -t -r /var/riak-export -b Flights -h 127.0.0.1 -p 8087 -H 8098
Load bucket settings for a bucket named 'Flights':java -jar riak-data-migrator-0.2.4.jar -l -t -r /var/riak-export -b Flights -h 127.0.0.1 -p 8087 -H 8098
Copy all buckets from one riak host to another:java -jar riak-data-migrator-0.2.4.jar -copy -r /var/riak_export -a -h 127.0.0.1 -p 8087 --copyhost 192.168.1.100 --copypbport 8087
-This app depends on the key listing operation in the Riak client whichis slow on a good day.
-The Riak memory backend bucket listing operating tends to timeout ifany significant amount of data exists. In this case, you have toexplicitly specify the buckets you need want to dump using the -f
option to specify a line-delimited list of buckets in a file.
0.2.4-Verbose status output is now default-Added option to turn off verbose output-Logging of final status
0.2.3-Changed internal message passing between threads from Riak Objects to Events for Dump, Load and Copy operations but not Delete.-Added the capability to transfer data directly between clusters-Added the capability to copy a single bucket into a new bucket for the Load or Copy operations.-Changed log level for retry attempts (but not max retries reached) to warn vs error.
0.2.2-Changed message passing for Dump partially to Events-Added logic to count the number of value not founds (ie 404s) when reading-Added summary output for value not founds
< 0.2.1 Ancient History
At the end of this guide, you should be familiar with:
This and the other guides in this wiki assume you have Riak installedlocally. If you don't have Riak already, please read and followhow to install Riak andthen come back to this guide. If you haven't yet installed the clientlibrary, please do so before starting this guide.
This guide also assumes you know how toconnect the client to Riak. All examples assumea local variable of client
which is an instance of Riak::Client
and points at some Riak node you want to work with.
Riak is a near-purekey-value store, whichmeans that you have no tables, collections, or databases. Each keystands on its own, independent of all the others. Keys are namespacedin Buckets, which alsoencapsulate properties which are common among all keys in thenamespace (for example, the replication factor n_val
). If youconceived of Riak as a big distributed Hash
, it might look likethis:
What's missing from that picture:
value
has metadata that you can manipulate as well as the rawvalue.Enough with the exposition, let's look at some data!
Since our keys are all grouped into buckets, in the Ruby client, weget a Riak::Bucket
object before doing any key-value operations.Here's how to get one:
This gives a Riak::Bucket
object with the name 'guides'
that islinked to the Riak::Client
instance.
'But wait', you say, 'doesn't that bucket need to exist in Riakalready? How do we know which bucket to request?' Buckets are virtualnamespaces as we mentioned above; they have no schema, no manifestother than any properties you set on them (see below) and so you don'tneed to explicitly create them. In fact, the above code doesn't eventalk to Riak! Generally, we pick bucket names that have meaning to ourapplication, or are chosen for us automatically by another frameworklike Ripple orRisky.
If you are just starting out and don't know which buckets have data inthem, you can use the 'list buckets' feature. Note that this willgive you a warning about its usage with a backtrace. You shouldn't runthis operation in your application.
Looks like we don't have any buckets stored. Why? We haven't storedany data yet! Riak gets the list of buckets by examining all the keysfor unique bucket names.
You can also list the keys that are in the bucket to know what'sthere. Again, this is another operation that is for experimentationonly and has horrible performance in production. Don't do it.
You can 'stream' keys through a block (where the block will be passedan Array of keys as the server sends them in chunks), which isslightly more efficient for large key lists, but we'll skip that fornow. Check out theAPI docs for moreinformation.
Earlier we alluded to bucket properties. If you want to grab theproperties from a bucket, call the props
method (which is alsoaliased to properties
).
There are a lot of things in this Hash that we don't need to careabout. The most commonly-used properties are detailed on theRiak wiki. Let's set thereplication factor, n_val
.
A number of the most common properties are exposed directly on theBucket
object like shown above. Note that you can pass an incompleteHash
of properties, and only the properties that are part of theHash
will be changed.
The other bucket property we might care about is allow_mult
, whichallows your application todetect and resolve conflicting writes. It isalso exposed directly:
Now let's fetch a key from our bucket:
Depending on which protocol and backend you chose when connecting,you'll get an exception:
This means that the object does not exist (if you rescue theRiak::FailedRequest
exception, its not_found?
method will returntrue
, tell you that the error represents a missing key). If you wantto avoid the error for a key you're not sure about, you can check forits existence explicitly:
If you don't care whether the key exists yet or not, but want to startworking with the value so you can store it, use get_or_new
:
This gives us a new[[Riak::RObject
http://rdoc.info/gems/riak-client/Riak/RObject]] towork with, which is a container for the value. In Riak's terminology,the combination of bucket, key, metadata and value is called an'object' -- please do not confuse this with Ruby's concept of anObject. All 'Riak objects' are wrapped by the Riak::RObject
class. Since this is a new object, the client assumes we want to storeRuby data as JSON in Riak and so sets the content-type for us to'application/json'
, which we can see in the inspect output. Thedefault value of the object is nil
. Let's set the data to somethinguseful:
Now we can persist that object to Riak using the store
method.
If we list the keys again, we can see that the key is now part of thebucket (this time we use the bucket
accessor on the object insteadof going from the client object):
Now let's fetch our object again:
Assuming we're done with the object, we can delete it:
*Note: Deleting an RObject
will freeze the object, makingmodifications to it impossible.
We mentioned before that every value in Riak also has metadata, andthe Riak::RObject
lets you manipulate it. The only one we've reallyseen so far is the content type metadata, so let's examine that moreclosely.
For the sake of interoperability and ease of working withyour data, Riak requires every value to have a content-type. Let'slook at our previous object's content type:
Under the covers, the Ruby client will automatically convert that toand from JSON when storing and retrieving the value. If we wanted toserialize our Ruby data as a different type, we can just change thecontent-type:
Now our object will be serialized to YAML. The Ruby clientautomatically supports JSON, YAML, Marshal, and plain-textserialization. (If you want to add your own, check out theSerializers guide.)
But what if the data we want to store is not a Ruby data type, butsome binary chunk of information that comes from another system. Notto worry, you can bypass serializers altogether using the raw_data
accessors. Let's say I want to store a PNG image that I have on mydesktop. I could do it like so:
When the client doesn't know how to deserialize the content type, itwill simply display the byte size on inspection. Now here's a funpart: since I just stored an image, I can open it with my browser:
You can also specify a bunch of free-form metadata on an RObject
using the meta
accessor, which is simply a Hash
. For example, ifwe wanted to credit the PNG image we stored above to a specificperson, we could add that and it would not affect the value of theobject:
Now the next time we fetch the object, we'll get back that metadatatoo:
The values come back as Arrays because HTTP allows multiple valuesper header, and user metadata is sent as HTTP headers.
The Vector clock isRiak's means of internal accounting; that is, tracking differentversions of your data and automatically updating them whereappropriate. You don't usually need to worry about the vector clock,but it is accessible on the RObject
as well:
That vector clock will automatically be threaded through anyoperations you perform directly on the RObject
(like store
,delete
, and reload
) so that you don't have to worry about it.
Especially if you're using the HTTP interface, the last_modified
andetag
are useful. Whenreloading your object,they will be used to prevent full fetches when the object hasn'tchanged in Riak. They can also be used as a form of optimisticconcurrency control (with very weak guarantees, mind you) by settingthe prevent_stale_writes
flag:
Riak prevented the stale write by sending a 412 Precondition Failed
response over HTTP.
You can also access the Secondary Indexes andLinks directly from the RObject
, but we won't coverthose here.
Congratulations, you finished the 'Key-Value Operations' guide! Afterthis guide, you can go beyond into more advanced querying methods, ortake advantage of extended features of theclient. Secondary Indexes are a very popular feature, and as allgood Rubyists have thorough test suites, the Test Server is alsoa good next step.