The Ultimate Guide to Traffic in Valhalla – Part 1: Historical Traffic
Valhalla is one one of the few Open Source Routing engines with first class support for historical and live traffic, as well as incidents. The key difficulty with supporting these, is that there is no unified standard in how this information is provided: every vendor offers products differently and so Valhalla’s approach is to facilitate this functionality in a way that is vendor independent: no matter where your data comes from and how it looks like, you should be able to feed it into Valhalla given the interfaces provided. Using these interfaces correctly though requires some knowledge around some Valhalla lingo, especially around its tiled structure and edge and tile identifiers. This can make the developer experience quite tedious. That’s why the public repo has been bombarded with questions about traffic ingestion, and that’s why, after working with many organisations on these topics, I’ve decided to write down some of my experience in a series of articles. There are some small examples and code snippets scattered around GitHub, but I feel like a solid walk through is in order to try to explain the steps necessary to get any traffic data into Valhalla.
Types of Traffic Information
Generally, Valhalla supports two types of traffic information that has different impact on path finding: historical traffic (referred to as “predicted” in Valhalla lingo), and live traffic. Historical traffic describes typical speeds for a given edge at different times of the day/week, while live traffic always represents the traffic condition per edge “right now”.
Digression #1: Tiles and Identifiers
Valhalla stores all of its data in hierarchical tiles, commonly referred to as graph tiles. Think of them as tiles in tiled pyramids in web mapping. The Valhalla documentation goes into more detail on this here and here, but here are the most important things you need to know when thinking about traffic:
- a graph tile is represented as a single file
- a graph tile contains all nodes and edges whose (start) coordinates lie inside the tile boundary
- the relative path to a file inside the tile directory contains its identifier
Valhalla uses the same identifier system for tiles, nodes and edges. This way, given an edge or node ID, we know exactly where in the tiled graph we have to look for it. Lets look at an example:
00000000 00000000 00000000 01100011 11000010 10010010
--- 3 bits: level
- -------- -------- ----- 22 bits: tile
------ -------- ------- 21 bits: local
Here we have a 48 bit representation of something called a GraphId
, which identifies the hierarchical level, the tile inside that level and the entity inside that tile (i.e. a node or an edge; this part is 0 for graph ids identifying a tile). In practice, these ids are stored as 64-bit integers, so those contain 16 + 2 spare bits.
The example id points to my hometown’s tile with the id 817234 on level 2. This is represented as the following file path inside the tile directory: 2/000/817/234.gph
. The 0 padded directory beneath the level directory exists because on level 2, there is more than a million tile IDs, so the tile identifier can be a 7-digit number.
Inside the tiles, edges and nodes are stored in contiguous blocks, and their position inside those blocks are their local identifier within the tile. So given the full Graph ID 2/817234/1423
(denoted as <level>/<tile>/<local>
), we know the edge will be at the 1423rd position in tile 817234 on level 2.
All this is very practical when trying to find an edge or a node given a full Graph ID: Valhalla immediately knows where to look for it! There are more advantages to this system, but lets not lose our focus.
Part 1: Historical Traffic
or “predicted”, in Valhalla
Now that we’ve got the difficult parts out of the way, lets look at the easiest type of traffic, at least from the end user perspective.
Valhalla supports two types of historical traffic: freeflow/constrained and predicted. The first one is simple: they are simply two integers denoting the typical speed at an edge outside/during rush hour (in kilometers per hour). By default, Valhalla applies the constrained speed during 7AM - 7PM local time, otherwise it applies the freeflow speed. This is quite limited, which is why you can feed more detailed speed information for a whole week in 5 minute intervals. Storing 2016 8-bit integers for millions of edges is not an option though, which is why Valhalla expects them as a Base64 encoded string of DCT-II compressed coefficients. Discrete Cosine Transformation is in itself a very interesting topic (it’s used for example in JPEG and many other lossy image formats), but I won’t bore you with any more details (for anyone interested, Mike Pound from the University of Nottingham does a tremendous job explaining it in more detail here). The good news is: if you’re using C++, you don’t have to worry about the details, you can just use Valhalla’s implementation of the encoder. If you’re using another language, you can either look for a third party library, or simply port the code over from C++, it’s no wizardry.
The input Valhalla expects is a directory containing CSV files, one file for each graph tile. The relative path to each tile in the directory must contain its identifier, analogous to the tile path inside the graph tiles directory:
$ tree tiles
tiles
├── 0
│ └── 002
│ ├── 100.csv
│ └── 101.csv
├── 1
│ ├── 033
│ │ ├── 605.csv
│ │ ├── 962.csv
│ │ ├── 963.csv
│ │ ├── 964.csv
│ │ └── 965.csv
│ └── 034
│ ├── 322.csv
│ ├── 323.csv
│ ├── 324.csv
│ └── 325.csv
├── 2
│ └── 000
│ ├── 530
│ │ ├── 415.csv
│ │ └── 416.csv
....
Inside the CSV, each row represents speed information for a single edge (freeflow and constrained in kilometers per hour):
$ head /2/000/817/234.csv
edge_id,freeflow_speed,constrained_speed,historical_speeds
2/817234/1432,76,40,AAD/9gAU/+IAKP/OADz...
2/817234/1433,60,39,AAD/9gAU/+IAKP/OADz...
Matching to GraphIds
Now you might wonder how to match the traffic data to the right edge. And here is where the solution will differ based on your vendor. Most likely, you have traffic data per OSM way ID and want to map the OSM way ID to the edges Valhalla produced for it. If you’re using TomTom’s Orbis or any other product that directly comes as an OSM file containing the historical speed data, the answer is relatively simple: For each edge, Valhalla store its corresponding OSM way ID. You can obtain the mapping with valhalla_ways_to_edges
, which will write it to disk as a CSV, each row containing the OSM ID and a list of edge’s IDs and their direction relative to the way’s direction. Mind that here, instead of the string representation of GraphId, it will simply spit out edge IDs as full 64-bit integers, so 2/817234/24056
becomes 807191954066
, etc.
Digression #2: Shortcuts
Shortcuts are living proof that even things that aren’t real can harm you. On top of your regular, run-off-the-mill edges that correspond to the ways contained in the OSM data you fed into the tile build, Valhalla has “virtual” edges referred to as shortcuts: they contract multiple edges into one with the purpose to speed up graph traversal during path finding. They are completely hidden from the end user, because once a path is found, the shortcut edges it uses are turned back into the “real” edges they make up.
Since they are vital in path finding (graph traversal on longer routes will make great use of them), their speed information greatly influences the final routes. With regard to historical traffic, you have a couple of options in dealing with shortcut edges:
- Don’t. But be aware of the consequences: while the final route costing will be recalculated given only its “real” edges, the path itself might not represent the optimal path had historical speed information been considered. But then answer this question: why are you jumping through hoops to ingest this data into Valhalla in the first place?
- Deactivate shortcuts entirely: you can tell Valhalla to skip shortcut creation during the graph build. But keep in mind that even though they aren’t real, they do serve a real purpose, which is improving runtime performance. So expect heavy performance degradation.
- Bite the bullet and use the C++ library to figure out which edges make up which shortcuts, and compute averages for shortcuts yourself.
I heavily recommend option number 3 to get the most out of your (probably expensive?) historical traffic data. The reason Valhalla doesn’t do this for you is that computing averages from lossily compressed speeds is… well… lossy! It’s best to do this on the source speeds before compressing them. Also, a little bit of advertisement is in place here: I am the author of Valhalla Orbis Tools, a set of tools that help you make the most from TomTom’s Orbis data. If you’re interested, feel free to reach out at chrstn@bwnkl.de.
Putting It All Together
Once you have produced your final directory containing the CSV tiles, you can add the data to the graph by calling valhalla_add_predicted_speeds
. This would usually happen after the tile build has finished, but, importantly, before building the tarred tile extract. You can confirm that everything is working by starting valhalla_service
and calling /locate
on it with "verbose": true
. Per edge, it should give you a 2016 array of predicted speeds as well as the freeflow and constrained speeds. Huzzah!