Quality Control for Spatial Big Data

Topics: Big Data

Everyday we have been involved in either using or contributing in the big data around the world, and from the existing big data, approximately 80% of them are spatial big data, which involves geographic locations, addresses and coordinates . We use and contribute into spatial big data daily, even every hour. There are many ways that get us involved into spatial big data, such as using Google Map to search for locations, using Government Transportation Applications to find the best travelling options between places, and even geo-tagging your posts on social platforms.

However, uncertainties and errors are found when people are using the spatial data, such as inaccurate location of buildings in maps.

In this paper, uncertainties and errors of spatial big data, and ways to measure and reduce them are to be discussed, with using Google Navigation as the spatial big data sample for explanation purposes. Finding the fastest route from Tai Wai to Hong Kong Polytechnic University using Google Navigation Google navigation is an application created by Google in 2009 as an additional feature in Google Map for easy route finding.

The distance between locations on Google Map is calculated using geographic coordinates, and travelling time is calculated by using normal travelling speed with adjustment from local traffic conditions, government real time traffic data and other user’s location information. As the information comes from different sources, errors and uncertainties was found in the application.

Errors and uncertainties of Spatial Big Data There are three main types of error found in the spatial big data, namely random error, systematic error and gross error.

Get quality help now
Dr. Karlyna PhD
Verified

Proficient in: Big Data

4.7 (235)

“ Amazing writer! I am really satisfied with her work. An excellent price as well. ”

+84 relevant experts are online
Hire writer

To begin with, random errors are problems that are unavoidable and not happening in every instance. For example, in Google Navigation, if multiple queries for fastest travelling route are performed for the same location and the distance between the places are slightly different from each query, then random error exist in the calculation of distance. Route query for the same locations with slightly different point selected compared with figure 1 (distance is slightly shorter here)

Usually random errors were found in observer and instrument imperfection . The difference might cause by computer calculation error or imprecise location selected for query if the start and end points of the route are plotted manually each time. Random errors can also cause by unexpected conditions such as traffic and road closures. Random errors do not have pattern and can be induced by various factors. Random errors can affect the accuracy of result of route finding. As mentioned before, a slightly deviated point plot can cause difference in measurements of the distance and travelling time. The magnitude of impact caused by random errors are related to their size [7]. For instance, a combination of point plot mistake, local traffic conditions and inaccurate coordinates would largely impact the process of selecting the best route due to the complexity of the error.

The second type of error is systematic error, which same amount of difference were found in each measurement, or in other words, a pattern of differences is found from all measurements. In the spatial context, most of the systematic error are identified when the scale of the map is changed [3]. With reference to Google Navigation, it would be hard for public to spot systematic errors as it is not possible for people to find the accurate measurement to compare with the results from the application. However, systematic error should exist in the measurement since measurement of coordinates from different satellites would be different.

Although small systematic errors exist in almost all measurement, either caused by equipment or coordinates transformation error, their impact are relatively small to the application of the big data. Sometimes, data with small systematic errors are considered as accurate.  In Google Navigation, it is believed that some systematic errors exist in the measurement and navigating features, but the information obtain from the application are mostly considered as being accurate.

The last type of error is gross error, which measurements are significantly deviated from normal value. Gross error generally does not exist in normal measurements, unless human or machine error make seriously wrong measurements which could cause systematic gross error where all measurements are drifted away from normal in a large magnitude [4]. In Google Navigation, although there should not be any gross error as the results from most queries are almost accurate, gross error could be found when coordinates system is different at various countries.

Gross errors are much larger than random error and systematic error, thus would greatly affect the accuracy and reliability of the results. Gross errors can result in ridiculous measurements or wrong positioning on a map [9]. Gross errors should be eliminated in Google Navigation, but if they exist in the application, the measurement of distance to destination would be either too long or too short, and it might give inaccurate directions to drivers, which could possibly result in accidents on road.  Example of gross error in Google Map: Significant displacement of road network from actual location

Errors is one of the attributes that contribute to the concept of uncertainty, together with accuracy, reliability, precision, quality and completeness [5]. Uncertainty is a big issue in spatial big data as it affects the determination of true value for measurements. For example, when searching for the best route in Google Navigation, the given route might not necessarily be the fastest, as it could contain roads that are inaccessible at certain period, traffic conditions that are not reflected in google map, and inaccurate plot of start and end points of the route.

Uncertainties can be sourced from natural characteristics of the objects, people’s cognition towards the objects and measurement errors [9]. First of all, to have an accurate measurement for routing purposes, all spatial features are required to be defined properly. However, due to the complexity and non-regular shape of real-world objects, it will be difficult for people to define each object accurately. This type of uncertainties can cause problems in big data applications as estimations could be made from uncertain objects and would cause positioning and measuring errors [10]. As in Navigation, sometimes when we type in an address and search for the location, it might be pinned on the neighbouring building due to undefined property boundaries. This could change the distance to the destination, which could possibly affect the accuracy of the application.

Secondly, due to human’s limited knowledge and developing GIS applications for spatial understanding of the real world, we only understand a small part of what the real word is, which caused uncertainties in object categorisation in spatial databases [10]. The uncertainties here would cause issues in using spatial data as if the object is classified into more than one category, it could cause incorrect routing selection and wrong measurements to the route. One of the features of Google Navigation is to determine the best walking path for human. However, due to inadequate knowledge and estimation from satellite images, lines in vegetation areas are often mistaken as footpaths, which could cause incorrect directions for application users.

Lastly, measurement errors are a type of uncertainty that comes from data collected from different equipment, equipment calibrations and variation in coordinates systems [11]. As there is no absolute answer for any measurements, uncertainty in obtaining the “right” measurement could result in confusing information to spatial big data users. A typical example of measurement uncertainty are data collected from satellite images using different satellites. There are different satellite systems exist in the universe, such as GLONASS, Galileo and GPS, and satellites in each of the systems has different spatial resolutions. Therefore, different measurements are possibly made from each satellite and the uncertainty in measurement adjustment would cause inaccuracy in distance measurement in spatial applications, such as Google Navigation.

Ways to measure quantities of uncertainties and errors There are several ways to quantify uncertainties and errors. To begin with, an entropy error model can be used to show the magnitude of positioning uncertainties. Entropy error models are used to display the possible position displacement of a point using a 3D curved surface graph. The regions of errors on the 2D surface can either be in an eclipse or rectangle shape. The curved surface model is a combination of surface bands on x, y and z axis of the coordinates system. The entropy model created from each point can be combined to form entropy band model to display the positional error of a line segment. The model is useful in identifying possible displacement of lines and points from different coordinates system.

Another way to quantify uncertainties use random sampling to identify attribute uncertainties. Samples of measurements are to be collected from the spatial big database, and the quantity of uncertainties are to be measured by applying the probability theory. The probability theory is applied to determine the possibility of a random event from happening. As mentioned in last section, there are many forms of random errors that contribute to spatial data inaccuracy. This method would be able to identify majority of them and calculate the significance of the errors to data accuracy by estimating the impact of the factors to the measurements.

Other than entropy model and random sampling, buffer analysis error model are one other way to quantify errors. Buffer are a zone drawn around a point, line, polyline or polygon object. Buffers are drawn around the point or line created by measured coordinates and true coordinates of the object separately. When the two buffers overlapped each other, areas representing error of commission and error of omission are identified. The area of each zone is calculated to illustrate the quantity of errors of measurements.

The above-mentioned methods are some of the simple and basic ways to measure the quantity of errors and uncertainty, and the methods are still being improved to provide more accurate ways to calculate the magnitude of error and uncertainties in spatial big data. Ways to reduce or control errors and uncertainties. To improve the accuracy of spatial data, there are several ways to perform quality control and minimise the uncertainty and errors of the data.

Firstly, since most of the positional data found in the mapping and navigation are produced from measurement on satellite images, high resolution satellite images are needed to reduce measurement error. Different satellites exist around the earth and each of them is equipped with cameras with various resolutions. As compared in, satellites with higher solutions tend to have better accuracy and the image processing level is also higher. Satellites with better ground sensors and camera calibration can determine ground control points more accurately, thus provide better coordinates measurement for other objects on the ground.

Another method to reduce measurement errors is to perform coordinates system transformation. In the two-dimensional aspect, coordinate systems of each satellite image are different, as each image is taken in slightly different scales and angles. Different coordinates system would have different readings of coordinates for the same point, and the variance in the coordinates create errors in finding the correct distance between points. Coordinates system transformation would be able to unify the coordinates system of all satellite images. As a result, the positional error of points would be gradually reduced.

Other than high resolution satellite images and coordinates transformation, using metadata database is another way to improve spatial accuracy [15]. Metadata contains information about the spatial data, including the time of data acquisition, location of data capturing, coordinate system used and any distortion in the data capturing process. All the information will form a metadata database [9], and acumination of data in the database would help filling the gaps in the geographic information of applications using spatial big data. This could provide valuable adjustment in locational attributes and measurements, and to reduce random errors in using spatial big data.

Future Research Trends

More people around the world is using spatial big data every day, but there is a lack of tools to analyse spatial big data quality and the quality of information in the metadata database [16]. These tools are important in improving data quality as they can identify more sources of errors and the impact of them to the usage of the data. Therefore, it is suggested to perform researches on defining models to evaluate the data quality of spatial big data, in terms of error quantity, effect to actual applications and possibly identify solutions to reducing such errors.

Moreover, as the size of spatial big data is growing every day, and with large datasets, the effectiveness of spatial adjustment methods is uncertain. The existing spatial adjustment models are designed to be capable to perform transformations and error reductions in a short period of time with the current size of spatial data. It is uncertain whether the same models are still effective in performing adjustments in the future with larger and more complex datasets. Researches can be done on identifying more advance methods to control data quality.

Conclusion

Spatial big data are becoming more popular recently, and there are many applications using them. However, inconsistencies and inaccuracies are found when using the applications such as Google Navigation. In this report, three type of errors in spatial data are identified, namely random error, systematic error and gross errors. Together with uncertainties in natural status of objects, human cognition towards the objects and some measurement errors, the spatial data quality is gradually lowered.

To control and reduce the error in the data, the magnitude of errors is to be identified first to determine the best solution. Ways to quantify uncertainty and error include entropy modelling, random sampling and buffer analysis error modelling. Further to the error modelling, number of methods are introduced to reduce the errors, such as better satellite images, unify coordinate systems and employment of metadata database. With advance technologies, spatial big data quality is expected to be further improved in the future.

References

  1. A. Leszczynski and J. Crampton, ‘Introduction: Spatial Big Data and everyday life’, Big Data & Society, vol. 3, no. 2, p. 205395171666136, 2016. Available: 10.1177/2053951716661366.
  2. K. Thapa and J. Bossler, ‘Accuracy of Spatial Data Used in Geographic Information Systems’, Photogrammetric Engineering & Remote Sensing, vol. 58, no. 6, pp. 835-841, 1992.
  3. N. Chrisman and J. Girres, ‘FIRST, DO NO HARM: ELIMINATING SYSTEMATIC ERROR IN ANALYTICAL RESULTS OF GIS APPLICATIONS’, ISPRS – International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. -21, pp. 35-40, 2013. Available: 10.5194/isprsarchives-xl-2-w1-35-2013.
  4. C. Jordache and S. Narasimhan, Data Reconciliation and Gross Error Detection. Houston, Texas: Gulf Professional Publishing, 1999.
  5. A. Pang, ‘Visualizing Uncertainty in Geo-spatial Data’, In Proceedings of the Workshop on the Intersections between Geospatial Information and Information Technology, 2001. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.3823.
  6. D. Li, S. Wang and D. Li, Spatial data mining : Theory and Application. Berlin: SpringerNature, 2015, pp. 123-125.
  7. J. Han, Z. Liu and J. Kwon, ‘Investigating the Impact of Random and Systematic Errors on GPS Precise Point Positioning Ambiguity Resolution’, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, vol. 32, no. 3, pp. 233-244, 2014. Available: 10.7848/ksgpc.2014.32.3.233.
  8. I. Hughes and T. Hase, Measurements and Their Uncertainties: A Practical Guide to Modern Error Analysis. Oxford: OUP Oxford, 2014.
  9. W. Shi, Principles of modeling uncertainties in spatial data and spatial analyses. Boca Raton: CRC Press, 2010.
  10. R. Devillers, R. Jeansoulin and M. Goodchild, Fundamentals of Spatial Data Quality. Hoboken: John Wiley & Sons, 2010, pp. 43-59.
  11. J. Lisiecki and S. Kłysz, ‘Estimation of Measurement Uncertainty’, Research Works of Air Force Institute of Technology, vol. 22, no. 1, 2007. Available: 10.2478/v10041-008-0004-4.
  12. J. Gong and D. Li, ‘Entropy-Based Models for Positional Uncertainty of Line Segments in GIS’, Survey Review, vol. 43, no. 322, pp. 390-401, 2011. Available: 10.1179/003962611×13055561708786.
  13. S. Wang, W. Shi, H. Yuan and G. Chen, ‘Attribute Uncertainty in GIS Data’, Fuzzy Systems and Knowledge Discovery, pp. 614-623, 2005. Available: 10.1007/11540007_76.
  14. A. Elrahman and A. Shaker, ‘Point- and line-based transformation models for high resolution satellite image rectification’, Ph.D, The Hong Kong Polytechnic University, 2004.
  15. ESRI, ‘Metadata and GIS’, ESRI, Redlands, 2002.
  16. D. Li, J. Zhang and H. Wu, ‘Spatial data quality and beyond’, International Journal of Geographical Information Science, vol. 26, no. 12, pp. 2277-2290, 2012. Available: 10.1080/13658816.2012.719625.

Cite this page

Quality Control for Spatial Big Data. (2022, Feb 23). Retrieved from https://paperap.com/quality-control-for-spatial-big-data/

Let’s chat?  We're online 24/7