Design of Zone Tariff Systems in Public Transportation

Given a public transportation system represented by its stops and direct connections between stops, we present OR models for two problems dealing with the prices for the customers. In the first, the fare problem, subsets of stops are already aggregated to zones and “good” tariffs have to be found in the existing zone system. Closed form solutions for the fare problem are presented for three objective functions. The second problem, the zone problem includes the design of the zones. In an exemplary way we study this problem for one of the objectives. It is NP hard and we therefore propose three heuristics which prove to be very successful in the redesign of one of Germany’s transportation systems.


Tariff Systems in Public Transportation
In this paper, we deal with the design of tariff systems in public transportation.This complex real-world problem was brought to our attention by a regional public transportation company several years ago.While working on the design of a fair tariff system we developed a mathematical theory and a visualization tool to evaluate the effects of tariff changes.We present our studies and report on our practical experience when designing zone tariff systems.
When using a bus or a train a passenger usually pays for a trip.There are several possibilities for defining ticket prices in public transportation.
• In a distance tariff system, the trip price is dependent on its length.The longer the trip, the higher the fare.This system is generally considered to be fair.To determine the ticket prices one needs the distance between a pair of stations.This makes a distance tariff inconvenient for the public transportation company and for the customers.
• The simplest tariff system is the unit tariff.In this case all trips cost the same, independent of their length.A unit tariff is very easy to handle, but the public often does not accept that a short trip between two neighboring stations costs the same as a long trip through the whole system.
• A model between these two tariff systems is a zone tariff system.To establish a zone tariff, the area is divided into subregions (the tariff zones).The price for a trip in a zone tariff system is dependent only on the trip's starting and ending zones.If the price can be chosen arbitrarily for each pair of zones, we call the tariff system a zone tariff with arbitrary prices.An example of such a tariff system can, for instance, be found north of San Francisco (Figure 1).The prices are given in form of a matrix (Table 1).
The most popular variant of a zone tariff system is the counting zone tariff system.To know his fare in this system a customer counts how many zones his trip will pass and reads off the price assigned to the number of crossed zones.The prices in this system are dependent on the starting and the ending zones of the trip, but trips passing the same number of zones must have the same price.Figure 2 shows a counting zone tariff system south of San Francisco; the corresponding price for a single one-way trip is in Table 2.
Because of their simplicity, zone tariff systems are very popular.In Germany, nearly all tariff associations already have or are introducing zone tariff systems.When a public transportation company wants to change its tariff system to a zone tariff system, it has to design the zones and establish new fares, such that the resulting tariff system is accepted by the customers and does not decrease the income of the company.The goal often is to design the zones in such a way that the new and old prices for most of the trips are as close as possible.This means that neither the public transportation company nor the customers will have major disadvantages when changing the current tariff system to a zone tariff system.
Another goal can be to design fair zones.In this case we do not consider the deviation from old prices, but the deviation from a reference price, that is, one that is considered to be fair such as the distance tariff.In this approach, the public transportation company needs to estimate its new income.
In spite of the importance of this zone design problem there has been limited literature on corresponding operations research (OR) models.The only literature we are aware of deals with the zone design problem with arbitrary prices (Hamacher and Schöbel 1995;Schöbel 1994aSchöbel , 1996;;Babel and Kellerer 2001).For the zone design problem in which we count the number of zones, to the best of our  Source.http://www.transitinfo.org.
knowledge, there is no literature dealing with suitable OR models.
In this paper, we present an optimization model for the latter problem.The remainder of this paper is organized as follows.In the next section, we present our model for the zone design problem with counting zones.In §3, we consider the fare problem and show how the fares for each number of passed zones can be calculated easily by closedterm formulas for three different objectives.In §4, we take one of these objectives to study the zone problem.We discuss complexity issues, develop bounds, and propose three solution algorithms for designing good zones.We discuss their numerical behavior in a real-world example in §5.Finally, we draw some conclusions in §6.

A Model for the Counting Zone Tariff
Let the station graph G = V E of the public transportation company be given, where V refers to the set of stops and E ⊆ V × V represents the available direct connections without intermediate stops.Furthermore, let d ij be a reference price for traveling from station i ∈ V to station j ∈ V .d ij can be the current ticket price of the public transportation company, or it can be a fair price such as a distance tariff.
If L denotes the number of planned zones, the zone (planning) problem identifies a partition In the fare (planning) problem ticket prices c p p = 0 1 2 are determined that are only dependent on the number of zones p in the journey.Here, c p is the price for crossing p zone borders.In particular, c 0 gives the fare for traveling within any zone, c 1 is the price for crossing one zone border, i.e., for going from one zone to an adjacent one, and so on.
To evaluate some partition P with a price vector c, we define for each pair of stations i j ∈ V , n ij as the number of crossed zone borders, when traveling from station i to station j. (Adding to the confusion most public transportation companies count the number n ij of passed zones on the trip from station i to station j, including both the starting and the ending zones, i.e., n ij = n ij + 1.We prefer our denotation for simplicity of our model.)The new ticket price for traveling from i to j is then given by Given the reference prices d ij for a trip between stations i and j, the absolute deviation in ticket price is calculated by Let w ij be the number of customers traveling from station i to station j and let W = i j∈V w ij be the sum of all customers of the public transportation company.The minimization of the following three objective functions is of interest: • Maximum absolute deviation: • Average absolute deviation: All three objectives are considered to be good models by practitioners.The first objective function, b max with identical weights, models the fact that the greatest deviation of ticket prices in the two different tariffs should be as small as possible.It gives a bound for the largest change in the ticket price for any customer.In the weighted case, b max minimizes the maximum deviation in the revenue of the company over all possible trips.b 1 gives the average of all absolute deviations, and b 2 gives the average of all squared deviations in ticket prices.The objective function b 2 leads to a smaller percentage of strongly affected customers than b 1 .Nevertheless, from our experience, b 1 is slightly better accepted by the practitioners than is b 2 .It also should be mentioned that deviations in price increases and decreases are treated equally, such that the model reflects both the interests of customers and of transportation companies.
We denote two zones V k V l as adjacent if there exist stops i ∈ V k , j ∈ V l , such that i j ∈ E, i.e., with a direct connection in the station graph G.To obtain the numbers n ij a shortest path algorithm (e.g., Floyd 1962, Warshall 1962) can be used according to one of the following models.
Station Graph Model.We use the station graph G = V E , but introduce new weights u ij for all i j ∈ E, defined by 0 if i and j are in the same zone 1 if i and j are in adjacent zones.
The length of a shortest path between two stops equals the minimum number of crossed zone borders.This approach will be needed later to update the zone distances in the greedy heuristic in §4.
Zone Graph Model.To reduce the size of the network we define the zone graph G = P E whose node set P is given by the zones and V k V l ∈ E if V k and V l are adjacent.All edges have weight 1.For i ∈ V k and j ∈ V l we get the minimum number of crossed zone borders n ij on a trip from i to j as the length of a shortest path from V k to V l in G .
The following example demonstrates the calculation of b max , b 1 , and b 2 .Let a station graph G with a partition into three zones V 1 = 1 2 , V 2 = 3 4 , and V 3 = 5 be given (see Figure 3).
Suppose that w ij = 1 for all i j ∈ V i = j, i.e., W = 20.If we assume that the distance between any adjacent pair of nodes is 1, the matrix d ij according to the distance tariff system can be A station network with five stations and three zones.The corresponding zone graph G consists of three nodes (see Figure 4).The number of crossed zone borders between stations i and j is then given by Suppose the new fares for crossing p = 0 1 or 2 zone borders are given by The new ticket prices can be calculated as The deviations between the reference prices d ij and the new ticket prices z ij are and the objective values can be calculated as

Solution of the Fare Problem with Fixed Zones
In this section, we solve the fare problem with respect to a given zone partition.Our first result shows that a closedform solution is possible for each of the three objectives b max b 1 , and b 2 introduced in §2.
V L be a given zone partition and let d ij be given reference prices.To minimize b max , b 1 , and b 2 we choose for all p = 0 1 L, where z * p is defined as Proof.Given the zone partition P we have to find fares c p ∈ for all p = 0 1 , minimizing b max , b 1 , and b 2 , respectively.Define M p = i j i j ∈ V and n ij = p and W p = m∈M p w m as the sum of all weights belonging to pairs of stations in the set M p .First, we note that each of the three objective functions can be separated into at most L + 1 independent subproblems, K max p , K 1 p , and Consequently, to minimize b max , b 1 , and b 2 we determine the optimal fare c p for p = 0 1 L separately, in each of the three objective functions.
• For b max : For all p = 0 1 L, the problem of minimizing is well-known from location theory when locating a point on a line such that the maximum distance to a given set of existing facilities on the same line is minimized.The proof for (1) can therefore be found in the location literature; see e.g., Love et al. (1988), Hamacher (1995).Note that is a one-dimensional, piecewise linear and convex function, its minimization is known in statistics (see e.g., Hays 1981) and in location theory as the one-dimensional median problem (see, e.g., Hamacher 1995, Plastria 1995) To demonstrate the result of Theorem 1 we continue the example of §2.The optimal values for the zone prices and the resulting values for the objective functions b max , b 1 , and b 2 are listed in Table 3.

Corollary 1. Given a zone partition
V L and reference prices d ij , the optimal values of the objective functions are given as follows: where Var denotes the variance of the set.
In practice, restrictions are often given on the new fares; sometimes there are even politically desired fares that have to be realized for the number of zones in the journey.With the help of Corollary 1, one can easily calculate the increase of the objective functions when using such given fares instead of the optimal ones.In particular, Corollary 1 shows that for the objective function b max the optimal fares c * max p are not needed to calculate the optimal objective value for a given zone partition.This will be needed in the next section when we optimize the zone partition with respect to b max .If, additionally, b max is used in the unweighted case, i.e., with w ij = 1 for all i j ∈ V , we can further simplify Theorem 1 and Corollary 1.
Corollary 2. In the case of equal weights, the optimal fares c * max p and the corresponding objective value b max are given by Proof.We calculate z * p as and consequently, Using Corollary 1 and z * p = K max p , the remaining parts follow immediately.Q.E.D.
We remark that similar results can be derived for the zone design problem with arbitrary prices (see Hamacher andSchöbel 1995, Schöbel 1994b).

Finding Zone Partitions for the Maximum Deviation Problem
The consequence of the results of §3 is that we can concentrate on finding the zones, because the zone pricing follows easily from the choice of the objective function.We now focus our attention on the maximum deviation problem.
Unfortunately, this problem is NP-hard and therefore difficult to solve.A first observation deals with the monotonicity of the objective function dependent on the number of planned zones L. Whereas it is easy to see that for the zone design problem with arbitrary prices all three objectives are monotone in L, this is not true for the zone design problem with counting zones, as Figure 5 shows.The station network consists of eight nodes, and we assume that w ij = 1 for all pairs of nodes i j.The reference prices are given as weights between any two adjacent nodes, as shown in the figure.Between any other pair of nodes the reference prices are given as the sum of the weights along a shortest path connecting the nodes.For the (unweighted) max absolute deviation problem, Corollary 2 shows that any solution with L = 5 zones leads to a strictly higher objective value than the graphed solution with L = 4 and b max = 1.We will therefore fix L in the following.
Theorem 2. The zone design problem with counting zones and objective function b max is NP-hard for all fixed L 3.
The proof of Theorem 2 is given in the appendix.Note also that the zone design problem with arbitrary prices is NP-hard (Babel and Kellerer 2001).
To motivate the heuristics of this section, we first present the following two observations for getting upper and lower bounds on the objective value b max .
Proof.For any zone partition P and any integer p, we have by ( 5) that Hence, b max = max p=1 L K max p also satisfies this inequality.Q.E.D.
Lemma 2. Given a zone partition P , let INT ⊆ E be the set of edges with both end nodes within the same zone and BET = E\INT.Then, we have Proof.
Lemma 2 suggests a zone design in which edges with high weights are collected in BET and edges with small weights in INT, or vice versa.To be more specific, let Diam be the maximal diameter over all zones.Assuming that edge weights along a path are additive, we get, again using (5), yielding that the maximal diameter Diam should be small, and consequently edges with large weights should be in BET while edges with small weights should be in INT.
Following these considerations, we present three heuristics for solving the zone design problem with counting zones.As input data we need-for any of the following algorithms-a set of n stations with reference prices d ij and a number L of planned zones.The output is then given by a zone partition with L zones.

Algorithms Based on Clustering Theory
The first algorithm is based on ideas from clustering theory, and in particular on the sequential agglomerative hierarchical nonoverlapping (SAHN) algorithms (see, e.g., Duran and Odell 1974).The idea is to start with n zones, each of which contains a one single station, and to combine in each step the two closest zones to a new one.Depending on the particular definition of the distance between two zones, different algorithms can be obtained.Two of them have been applied to the zone design problem: single linkage and complete linkage.
Step 1. Start with a partition P consisting of n zones, each of which contains a single station.Let d V i V j = d ij for all zones V i V j ∈ P .
Step 2. Determine two zones Step 3. Join V i and V j to a new zone V k and get a new partition P .
Step 4. Calculate the new distances for all V ∈ P : Step 5.If the number of planned zones is attained, then Stop, Output P , else go to Step 2.
The parameter c in Step 4 determines the formula for calculating the distance between two zones.In the context of the zone design problem, we have used • c = −1 for the single linkage algorithm, and • c = 1 for the complete linkage algorithm.
The interpretation for single linkage is the following: The distance between two zones is defined as the smallest distance between elements of the zones; consequently in each step we join along a shortest edge.Note that in complete linkage the distance between two zones is defined as the maximum distance between their elements, and in each step complete linkage tries to minimize the maximum diameter of the zones.

Greedy Approach
This approach is a variant of the SAHN algorithms discussed above, but with more emphasis on the specific structure of the zone design problem.Using the basics of Algorithm 1, we calculate for all edges i j the objective value b ij max when contracting i j of the current zone graph.Finally, we contract the edge with the smallest increase in the objective function.This is rather time consuming, but, as we will show in the next section, leads to very good results in practice.The formulation of the greedy approach is the following: Algorithm 2 (Zone Design by Greedy Approach).
Step 1.For all edges i j ∈ E with n ij = 1: Contract i and j temporarily and calculate b ij max .
Step 2. Contract the edge i 0 j 0 permanently, where b i 0 j 0 max = min i j b ij max and let n i 0 j 0 = 0.If the graph has L nodes, Stop.
Step 3.For all i j ∈ V , recalculate n ij as shortest distance, and goto Step 1.

Spanning Tree Approach
The idea of the following heuristic is to determine a set of edges BET that contains mostly edges with high weights.
Step 1. Find a maximum spanning tree T in the complete graph with edge weights d ij .
Step 2. Omit the L − 1 largest edges of T and get a forest with L components.
Step 3. Output: Zones are the connected components.
Note that in trees, the spanning tree approach is equivalent to the single linkage algorithm of clustering theory.In general graphs, it is always possible to find a spanning tree such that omitting its L − 1 largest edges leads to the same result as single linkage.However, if we start with a spanning tree with maximal weight (which performed best in practice), the spanning tree approach differs significantly from single linkage.

Practical Experiences in Saarland, Germany
As an example for the practical value of our approach, we consider the situation in the state of Saarland, Germany.Currently, there are six public transportation companies operating in the Saarland, each with its own tariff system.
• Four public transportation companies already use a counting zone tariff system, but their fares for crossing p zones and the structure of their zones are completely different, although they are partly operating in the same geographical region.
• The Deutsche Bahn (German Rail) still applies its distance tariff.
• There is also a public transportation company (serving the city of Saarland's capital, Saarbrücken) that uses a zone tariff with variable prices.
The traffic association of the Saarland is considering the introduction of one common counting zone tariff system that would be applied by all the public transportation companies.The public transportation network in Saarland consists of roughly 4,000 stations, where a preclustering into 600 minizones is given.The goal is to design about 100 zones and install a counting zone tariff system in such a way that the differences between the current and the new fares are as small as possible.It is also important that the new income of each of the public transportation companies not differ too much from its current income.Consequently, the reference prices in this application are the current prices for traveling.While the current fare structure is known and therefore relatively easy to get, it is usually hard to get  realistic data about the customers' behavior.In our project in the Saarland this was solved by using the income data of each of the transportation companies and dividing the income with the help of available statistics over the origindestination pairs used by the customers.We tested our algorithms on the data described above.The results of Algorithms 1, 2, and 3 are shown in Figure 6.This figure shows the objective value b max for any number of possible zones from 1 to 600.All objective values refer to a single-trip ticket for an adult, given in German Marks (DM).On the one hand, it turns out that in this practical application the greedy heuristic (Algorithm 2) is the clear winner in terms of the objective value: It generated the best results for any number of desired zones.On the other hand, the running time for Algorithm 2 for all possible numbers of zones, i.e., from L = 1 600, was nearly two weeks altogether in our first implementation (on an AixJ90).The spanning tree approach (Algorithm 3) and the single linkage algorithm (Algorithm 1) both needed Figure 10.
Zone planning with the software WabPlan in the state of Sachsen-Anhalt.
Source.The lines show the (fictional) customers who will have a change of more than 5% in their ticket prices.
only a few hours, but the results are much less convincing regarding the objective value b max , again.For a small number L of desired zones, single linkage did better than the spanning tree approach, whereas for a higher number of planned zones it was the other way around.This is due to the fact that the spanning tree approach starts with only one zone, while single linkage starts with 600 zones.On a subset consisting of only 400 stations (or 54 minizones) the heuristics have also been tested.In this smaller setting the running times of Algorithms 1 and 3 were within seconds, and Algorithm 2 needed only two minutes to obtain again the clearly best results.The results for nine zones are shown graphically in Figures 7, 8, and 9. Figure 7 shows a suggestion for a zone partition that is due to the political districts in this area of the Saarland.The objective value for this zone partition is b max = 5 15 DM, i.e., there exists a customer who will have a difference of 5 15 DM between his current fare and the new one.The result of the single linkage algorithm for nine zones is shown in Figure 8. Single linkage tends to form one large zone with several smaller zones surrounding it (see, e.g., Duran and Odell 1974).This behavior is also shown in Figure 8.The objective value of the graphed zone partition is 5 00 DM.The objective value in the spanning tree approach also was 5 00 DM for nine zones, but without these big differences in the zone sizes.The best results, however, were obtained by the greedy approach with an objective value of only 3 75 DM.The corresponding zone partition is shown in Figure 9.
For evaluating tariff zones in more detail, we use the software package WabPlan (Schöbel and Schöbel 1999).A graphical front-end provides a detailed analysis of all trips for which the fare will increase or decrease dramatically (see Figure 10).Furthermore, the expected income for each of the transportation companies in each ticket category is compared with its current income.
For practical purposes, many special rules for using fare zones are common.Several of these rules have also been implemented in our algorithms and tested on the Saarland data.
Empty Zones.First, in most zone tariff systems empty zones are used to increase the fare on some special trips without affecting all other relations.This seems to make sense in practice and can easily be incorporated in the algorithms presented in §4.In this way, given reference prices can be approximated arbitrarily close if the number L of zones is large enough.In our model this means that the optimal objective value goes to zero for b max b 1 , and b 2 in this case.
Border Stations.To avoid injustice, stations can be located on zone borders, meaning that they belong to more than one zone, and the cheapest choice for determining the fare applies.Because the zone tariff system should be clear and understandable, we tried to avoid this in the Saarland.In most cases it turned out that border stations can be avoided without losing anything in the objective values by changing only the zone design.
Special Rules for Large Zones.Also, some zones might be so large that they have to be counted twice when crossing them, in which case a special fare structure would have to be implemented.
In the Saarland, the tariff system proposed by using our methods is now in the implementation process.

Conclusion
In this paper, we have presented OR models for the counting zone tariff problem.For fixed zones we have shown that closed-form solutions can be provided.In contrast, the zone problem, i.e., the design of zones, is NP-hard.Three heuristics have been proposed and compared with respect to their numerical behavior.The practical usefulness of the approach is shown by its actual implementation in the state of Saarland, Germany.Other German states are currently using our system to evaluate their tariff structures.Due to the importance of this problem, we hope that this contribution motivates further research.
Figure 4.Zone graph with three zones.

•
. It is shown that the above problem is solved by the so-called weighted median of the set d m m ∈ M p , i.e., by any real number c = c * 1 For b 2 : Here we have to minimize K 2 p , i.e., min m∈M p w m d m − c p 2 Using the theorem of Steiner (see e.g., Sarkadi and Vincze 1974) of statistics, we note that the weighted mean of the values in d m m ∈ M p is the unique optimal solution for c p .Q.E.D.

Figure 5 .
Figure 5.A station network where the objective value for L = 4 is better than the objective value for L = 5.

Figure 6 .
Figure 6.Comparison of the heuristics: b max graphed for any number L of planned zones.

Table 1 .
Fares (in U.S. dollars) for a one-way trip.

Table 2 .
Fares by number of zones in journey for one-way trips.