less_retarded_wiki

main page, file list (644), source, all in md+txt+html+pdf, commit RSS feed, report abuse, stats, random article, consoomer version

Distance

Distance is a measure of how far away from each other two points are. Most commonly distance refers to physical separation in space, e.g. as in distance of planets from the Sun, but more generally distance may refer to any kind of parameter space and in any number of dimensions, e.g. the distance of events in time measured in seconds (1D distance) or distance of two text strings as the amount of their dissimilarity (Levenshtein distance). Distances are very important in computer science and math as they allow us to do such things as clustering, path searching, physics simulations, various comparisons, sorting etc.

Distance is similar/related to length, the difference is that distance is computed between two points while length is the distance of one point from some implicit origin. I.e. distance is computed between two vectors while length is computed from just one vector.

There are many ways to define distance within given space. Most common and implicitly assumed distance is the Euclidean distance (basically the "straight line from point A to point B" whose length is computed with Euclidean Theorem), but other distances are possible, e.g. the taxicab distance (length of the kind of perpendicular path taxis take between points A and B in Manhattan, usually longer than straight line). Mathematically a space in which distances can be measured are called metric spaces, and a distance within such space can be any function dist (called a distance or metric function) that satisfies these axioms:

dist(p,p) = 0 (distance from identical point is zero)
dist(p,q) > 0 if p != q (distance between two distinct points is always positive)
dist(p,q) = dist(q,p) (symmetry, distance between two points is the same in both directions).
dist(a,c) <= dist(a,b) + dist(b,c) (triangle inequality)

Approximations

Computing Euclidean distance requires multiplication and most importantly square root which is usually a pretty slow operation, therefore many times we look for simpler approximations. Note that a possible approach here may also lead through computing the distance normally but using a fast approximation of the square root.

Two very basic and rough approximations of Euclidean distance, both in 2D and 3D, are taxicab (also Manhattan) and Chebyshev distances. Taxicab distance is an upper bound on Euclidean distance (i.e. it's always greater than or equal to Euclidean distance) and is computed by merely adding the absolute coordinate differences along each principal axis (dx, dy and dz) while Chebyshev, a lower bound on Euclidean distance, takes the maximum of them (NOTE: taking minimum isn't possible due to the definition which requires two distinct points to always have positive distance). In C (for generalization to 3D just add one coordinate of course):

int distTaxi(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy

  return x0 + y0;
}

int distCheb(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy

  return x0 > y0 ? x0 : y0;
}

Both of these distances approximate a circle in 2D with a square or a sphere in 3D with a cube, the difference is that taxicab is an upper estimate of the distance while Chebyshev is the lower estimate. For speed of execution (optimization) it may also be important that taxicab distance only uses the operation of addition while Chebyshev may result in branching (if) in the max function which is usually not good for performance.

Taking the average of taxicab and Chebyshev distances will slightly increases accuracy towards better approximating Euclidean distance -- in 2D a circle plotted by this new metric will be an 8 segment polygon and analogously in 3D a sphere will be a 24 sided polyhedron (taxicab or Chebyshev alone give a square in 2D and a cube in 3D). The average can be shown to be equal to the maximum coordinate difference plus a half of the minimum; here is an branchless, integer-only C function implementing the taxicab-Chebyshev average:

int dist8(int x0, int y0, int x1, int y1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
  x1 = x0 > y0;                     // use as helper
  return (x0 >> !x1) + (y0 >> x1);
}

And a picture for summary:

--------------------------------------------------------------------------------
                                                  0 1 2 3 4 5 6                 
 Euclidean                           _____        1 1 2 3 4 5 6                 
                                 _.''     ''._    2 2 3 4 4 5 6                 
  sqrt(                    B    /             \   3 3 4 4 5 6 7                 
    (B.x - A.x)^2 +     _,-+   /       C   d   \  4 4 4 5 6 6 7                F
    (B.y - A.y)^2) _,--'       (       +-------)  5 5 5 6 6 7 8              _-+
              _,--'            \               /  6 6 6 7 7 8 8          _,-'   
         _,--'                  \_           _/                      _,-'       
 A  _,--'                         '--_____--'                    _,-'           
 +-'                                                      E  _,-'               
                                                          +-'                   
--------------------------------------------------------------------------------
                                                  0 1 2 3 4 5 6                 
 Taxicab (Manhattan)                  _A_         1 2 3 4 5 6 7                 
                           B        _/   \_       2 3 4 5 6 7 8                 
                           +      _/       \_     3 4 5 6 7 8 9                F
    abs(B.x - A.x) +       |    _/     C   d \_   4 5 6 7 8 9 a           ,----+
    abx(B.y - A.y)         |   <_      +------->  5 6 7 8 9 a b         ,'
                           |     \_         _/    6 7 8 9 a b c       ,'
                           |       \_     _/                        ,'
 A                         |         \_ _/                        ,'
 +-------------------------'           V                  E     ,'
                                                          +----'
--------------------------------------------------------------------------------
                                                  0 1 2 3 4 5 6                 
 Chebyshev                      _______________   1 1 2 3 4 5 6                 
                           B   |               |  2 2 2 3 4 5 6                 
   max(                    +   |               |  3 3 3 3 4 5 6                F
     abs(B.x - A.x),           |       C   d   |  4 4 4 4 4 5 6               ,+
     abs(B.y - A.y))           |       +-------|  5 5 5 5 5 5 6             ,'  
                               |               |  6 6 6 6 6 6 6           ,'    
                               |               |                ,--------'      
 A                             |_______________|              ,'                
 +--------------------------                              E ,'                  
                                                          +'                    
--------------------------------------------------------------------------------
                                                  0 1 2 3 4 5 6                 
 Taxi-Chebyshev average               ___         1 1 2 3 4 5 6                 
                           B     _.-''   ''-._    2 2 3 4 5 6 7                 
   max(abs(B.x - A.x),     +    :             :   3 3 4 4 5 6 7                F
       abs(B.y - A.y)) +       :       C   d   :  4 4 5 5 6 7 8             __,+
   min(abs(B.x - A.x),         :       +-------:  5 5 6 6 7 7 8        ,..''
       abs(B.y - A.y)) / 2 .   :               :  6 6 7 7 8 8 9       /
                           |    :_           _:                      /
 A                         |      ''-.___.-''                       /
 +-------------------------'                                 __..'''
                                                          +''
--------------------------------------------------------------------------------

The four mentioned distance measures: the distance is the length of the path illustrated between points A and B; next are shown "circles" (sets of points with distance d from point C), tables of distances of grid cells from the top-left cell (rounded to nearest integer if needed) and "lines" (one diagonal of a rhombus in which points E and F are opposite to each other) drawn by respective measures.

{ The following is an approximation I came up with when working on tinyphysicsengine. While I measured the average and maximum error of the taxi/Chebyshev average in 3D at about 16% and 22% respectively, the following gave me 3% and 12% values. ~drummyfish }

Yet more accurate approximation of 3D Euclidean distance can be made with a 48 sided polyhedron. The principle is following: take absolute values of all three coordinate differences and order them by magnitude so that dx >= dy >= dz >= 0. This gets us into one of 48 possible slices of space (the other slices have the same shape, they just differ by ordering or signs of the coordinates but the distance in them is of course equal). In this slice we'll approximate the distance linearly, i.e. with a plane. We do this by simply computing the distance of our point from a plane that goes through origin and whose normal is approximately {0.8728,0.4364,0.2182} (it points in the direction that goes through the middle of space slice). The expression for the distance from this plane simplifies to simply 0.8728 * dx + 0.4364 * dy + 0.2182 * dz. The following is an integer-only implementation in C (note that the constants above have been converted to allow division by 1024 for possible optimization of division to a bit shift):

int32_t dist48(
  int32_t x0, int32_t y0, int32_t z0,
  int32_t x1, int32_t y1, int32_t z1)
{
  x0 = x1 > x0 ? x1 - x0 : x0 - x1; // dx
  y0 = y1 > y0 ? y1 - y0 : y0 - y1; // dy
  z0 = z1 > z0 ? z1 - z0 : z0 - z1; // dz
 
  if (x0 < y0) // order the coordinates
  {
    if (x0 < z0)
    {
      if (y0 < z0)
      { // x0 < y0 < z0
        int32_t t = x0; x0 = z0; z0 = t;
      }
      else
      { // x0 < z0 < y0
        int32_t t = x0; x0 = y0; y0 = t;
        t = z0; z0 = y0; y0 = t;
      }
    }
    else
    { // z0 < x0 < y0
      int32_t t = x0; x0 = y0; y0 = t;
    }
  }
  else
  {
    if (y0 < z0)
    {
      if (x0 < z0)
      { // y0 < x0 < z0
        int32_t t = y0; y0 = z0; z0 = t;
        t = x0; x0 = y0; y0 = t;
      }
      else
      { // y0 < z0 < x0
        int32_t t = y0; y0 = z0; z0 = t;
      }
    }
  }

  return (893 * x0 + 446 * y0 + 223 * z0) / 1024;
}

A similar approximation for 2D distance is (from a 1984 book Problem corner) this: sqrt(dx^2 + dy^2) ~= 0.96 * dx + 0.4 * dy for dx >= dy >= 0. The error is <= 4%. This can be optionally modified to use the closest power of 2 constants so that the function becomes much faster to compute, but the maximum error increases (seems to be about 11%). C code with fixed point follows (commented out line is the faster, less accurate version):

int dist2DApprox(int x0, int y0, int x1, int y1)
{
  x0 = x0 > x1 ? (x0 - x1) : (x1 - x0);
  y0 = y0 > y1 ? (y0 - y1) : (y1 - y0);

  if (x0 < y0)
  {
    x1 = x0; // swap
    x0 = y0;
    y0 = x1;
  }

  return (123 * x0 + 51 * y0) / 128; // max error = ~4%
  //return x0 + y0 / 2;              // faster, less accurate
}

TODO: this https://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml

less_retarded_wiki

Distance

Approximations

See Also