Opened 17 months ago
Closed 17 months ago
#11928 closed Bugs (fixed)
surveyor area method is inaccurate
Reported by: | Charles Karney <charles@…> | Owned by: | barendgehrels |
---|---|---|---|
Milestone: | Boost 1.61.0 | Component: | geometry |
Version: | Boost 1.58.0 | Severity: | Problem |
Keywords: | area | Cc: | charles.karney@… |
Description
The following code illustrates a problem with the "surveyor" strategy for computing planar areas. It computes the area of a square of area 2 and the same square translated by 10^{8} in x and y. bg::area incorrectly computes the area in the second case as 0.
#include <iostream> #include <boost/geometry.hpp> namespace bg = boost::geometry; typedef bg::model::point<double, 2, bg::cs::cartesian> pt; int main() { bg::model::polygon<pt, false, false> poly; // A simple square of area 2 bg::read_wkt("POLYGON((1 0,0 1,-1 0,0 -1))", poly); // Computes the correct area (= 2) std::cout << "area " << bg::area(poly) << "\n"; // The same square shifted by (10^8 10^8) bg::read_wkt("POLYGON((100000001 100000000, \ 100000000 100000001, \ 99999999 100000000, \ 100000000 99999999))", poly); // Should give 2 but instead gives 0 std::cout << "area " << bg::area(poly) << "\n"; }
Running this code gives
area 2 area 0
Discussion:
- Surely it's a bad idea to embed the algorithm name "surveyor" into the API. Computing 2d areas is really straightforward. Users shouldn't need to worry about the underlying method. Maybe "planar_area" is a better name.
- In any case "surveyor" is a rather silly name of the type beloved of Wikipedia editors. There's no evidence that surveyors actually used this method.
- It's more expensive (at 2*n multiplies and 2*n additions) than applying the trapezium rule (otherwise known as "doing the integral"), i.e., summing (x[i+1] - x[i]) * (y[i+1] + y[i]) / 2, which clocks in at n multiplies and 3*n additions.
- Finally, as this example illustrates, it's subject to catastrophic loss of accuracy, while the trapezium rule will give the exact result. (Kahan contrasts evaluating x*x - y*y compared with (x - y) * (x + y) with x = 1e15+1, y = 1e15-1. The former result is wrong by 1.5% while the latter is exact.)
Attachments (0)
Change History (4)
comment:1 Changed 17 months ago by awulkiew
comment:2 Changed 17 months ago by Charles Karney <charles@…>
Thanks for the reference to Abel Flint (a great name!). You will see on pp. 59-63 that he describes the trapezium rule and not the so-called surveyor's method! Most of his contemporaries who were mathematicians would have recognized this as such and not, therefore, have seen it as some special technique invented by a surveyor.
The analysis of why
sum x_{i} y_{i+1} - x_{i+1} y_{i}
is typically subject to worse round off errors than
sum (x_{i+1} - x_{i}) (y_{i+1} + y_{i})
goes as follows. Each formula contains
- products
- differences in forming each term in the sum
- differences in forming the sum itself
In the surveyors formula 2 follows 1 and so the mild round off errors from the products are sometimes catastrophically magnified. In the trapezium rule 2 precedes 1 and there is no round off error if x_{i+1} and x_{i} are within a factor of 2 of each other (which is a common situation).
Both formulas are subject to potentially dangerous round off with 3. Centering the numbers might ameliorate this problem. Alternatively, merely switching to the other trapezium rule involving (y_{i+1} - y_{i}) would help, e.g., in the case of UTM coordinates where often the northings are much larger than the eastings.
comment:3 Changed 17 months ago by Charles Karney <charles@…>
By the way, in GeographicLib, I use an Accumulator class to perform the summation. This effectively sums at double the normal precision.
comment:4 Changed 17 months ago by awulkiew
- Milestone changed from To Be Determined to Boost 1.61.0
- Resolution set to fixed
- Status changed from new to closed
I tweaked the formula as you suggested. Thanks.
The name of the formula seems to be older than Wikipedia [1][2].
I haven't measured it but I'd rather guess that nowadays the most expensive part in both cases is data access.
Furthermore I guess that in the case of the trapezium rule the result is also not precise in general when coordinates are big. So I'd rather fight with this issue by moving all of the coordinates closer to the origin of the coordinate system before passing them into the strategy, so doing it on the fly while calculating the area with any strategy. Similar technique was used before to improve the precision of centroid(). This solution should improve the precision of a function no matter what strategy was used.
With that said I'm not against changing the formula. But should trapezium rule be better in all cases or is there a reason why the old one could be left as it was?
[1] Flint, Abel. "A System of Geometry and Trigonometry: together with a treatise on surveying: teaching various ways of taking the survey of a field; also to protract the same and find the area. Likewise, Rectangular Surveying; Or, an accurate method of calculating the area of any field arithmetically, without the necessity of plotting it.", 1808
[2] Braden, Bart. "The surveyor’s area formula.", 1986