georeference.org
Subscribe to this thread
Home - General / All posts - Statistical and non-statistical analysis of geographic information in information space
NVerge

305 post(s)
#07-Dec-07 03:03

Elsewhere Dimitri wrote

"...and should get started in a new thread. Let me suggest also, to maximise the content in that thread, that folks refrain from postings only of the form "oh sure, I already sent in a suggestion for more statistics" and instead please include the full text of the suggestion you have sent in.

If we do that we will all see two things: a) there have been very few detailed suggestions requesting specific statistics measures and frankly, darned few superficial ones either, and b) of the very few detailed suggestions related to new statistics functions desired there is not much commonality. That would lead to a good debate."

So here is the new thread. The subject of which is statistical and non-satistical analysis of geographic information (in INFORMATION SPACE).

I have put information space (abbreviated to "i-space") in parentheses to distinguish it from working with geographic infromation in geographic space which is what is usually done now in Manifold (actually with some imagination people could work with information in i-space using Manifold now to some degree). I have also expanded the susbject to non-statistical analysis because there is considerable overlap.

Should Manifold have statistical and non-statistical i-space capabilities?

The answer is a no-brainer. The ability to perform statistical and non-statistical analysis of geographic information in i-space, and to be able to work with the same information in geographic space, is a very powerful combination and greatly expands the capabilities of a GIS. i-space analysis of geographic information is a fundamental requirement. You cannot fully analyse, depict, interpret, understand, classify, make selections from and generally fully utilise geographic information without it.

Footnote:

What is information space, you may ask. Hopefully most people using this forum know what geographic space is. It is what we experience in the real world and wherein the position of object is defined relative to three orthogonal spatial dimensions.

Information space is a virtual space, wherein objects are positioned according to the values of their properties (fields). Each field can (if we choose) be a dimension of i-space. Information space may have as many dimensions as the number of fields associated with the objects we may wish to work with in i-space. The field values an object has are the coordinates defining the position of the object in information space, just as geographic coordinates relative to three othogonal geospatial axes can define the position of the object in geographic space.

Many will have drawn plots of various kinds. Perhaps a histogram of object value against frequency of the value. This is a 1-D statistical plot representing the properties of a set of objects vs the properties of the set of objects, within information space. A plot of the property A vs property B of a collection of objects is a 2-D non-statistical plot of object postion in i-space. From it might be possible to identify clusters, indicating that there may be hitherto unknown subsets to the information which may have significance. If we format the points of this plot according to the number of objects having the same combination values for A and B, this is a 2D-histogram within i-space. We can now gauge the importance of any information susbsets that may be revealed. We might choose to select those points of one or both subsets and give the subset a name. This is the principle of classification, so important in the field of remote sensing, but which could easily expanded to drawing objects.

Now link those plotted points, or the dataset as a whole, to a map showing the geographic positions of the objects and you realise what can be done.

pcardoso


1,452 post(s)
#06-Dec-07 16:57

You cannot fully analyse, depict, interpret, understand, classify, make selections from and generally fully utilise geographic information without it absolutely.

So you expect to collect contributions to statistical needs in terms of georeferenced information.

I think that new operations or analysis hardly will be invented. Things (statistical tools) already exist elsewhere. By "new" I mean something that do not exist in no other solution. Regarding this why don't we just look around. Effort should be done in make things more intuitive. As an example, IDRISI is plenty of statistical tools absolutely fantastic. I feel Manifold team very reluctant in "mimicry" some basic things. Image registration is the most incredible aspect for start anything with image processing and subsequent statistics over them. Manifold is very far to reach the minimal level of real statistical capability (some will say: We have Surface tools that is great...).

I see a number of topics arranged under the Statistics and GIS that I feel as fundamental.

1) Descriptive statistics - Landscape metrics

2) Inferential statistics - Correlations, Var/Covar, Mantel - Least squared regression - Image processing (classification Maximum likelihood algoritms, fuzzy)

Statistical measures are not usually very simple arithmetic. They are when we don't care about its correct use.

NVerge

305 post(s)
#07-Dec-07 08:04

"Things (statistical tools) already exist elsewhere."

Indeed they do in usually in very expensive applications of pounds (and not because there is anything intrinsically expensive about applications that allow information to be worked with in i-space). Or via unfriendly scripting languages the equivalent of GMT for statistics. For example you could go out an buy ENVI at £5000 a pop in order to to have the spectral angle mapper - a very powerful tool. But the concept behind the SA Mapoper is no more complex than a plotting information in a i-space and using basic trigonometry to classify information (remote sensing measurements in this case, but could equally be any kind of quantitative information such population or climate statistics).

"Regarding this why don't we just look around."

If you mean by this why bother to ask for it when other applications to provide it exist, then indeed people will look around. Those that do need such capabilities, will go to or remain with competitor products from ESRI or others.

In fact it is IMV quite ridiculous that users should have to request this kind of capability in a GIS, it is so fundamental to geographic information analysis.

"So you expect to collect contributions to statistical needs in terms of georeferenced information."

People are welcome to toss ideas into the fray, i rather have my hands full with other things ATM

As i see it there are four sides to this particular coin:

Statistical maths - basic functions Display - creating drawings to depict information Interactive selection via displayed plots Analysis, classification and interpretation

A good first start would be to suggest what types of plots people might like to produce, perhaps using the geographic drawings and map object model (layers and formatting). The equivalent of geographic projection in i-space is of course going to be a combination of the dimensionality of the plot and the plot type. There are only about a dozen basic kinds of plot.

pcardoso


1,452 post(s)
#07-Dec-07 09:15

In fact it is IMV quite ridiculous that users should have to request this kind of capability in a GIS, it is so fundamental to geographic information analysis Agreed. I've being using this argument since the very first day of work with Manifold. Indeed they do in usually in very expensive applications of pounds This is not true. IDRISI Andes license matches an Manifold Ultimate Edition. Ok, they are substantially different applications but if Manifold aims to achieve the minimal in raster analysis, some functions must be there.

tjhb

3,494 post(s)
online
#07-Dec-07 10:45

I'm not contradicting anyone here, just passing on something I found the other day by accident, in case it's useful for this sort of purpose.

Under the RecordSet object in the object model we currently have access to these statistical functions (taking a broad view of what "statistical" means):

Average()—Returns average value of given column. Bottom()—Returns given number of bottom records for given column. CenterCorrelation()—Returns central correlation of given pair of columns. CenterCovariation()—Returns central covariation of given pair of columns. CenterMoment()—Returns central momentum of given order for given column. Correlation()—Returns correlation of given pair of columns. Covariation()—Returns covariation of given pair of columns. Excess()—Returns excess of given column. Maximum()—Returns RecordSet object with maximum values of given column. Median()—Returns Record object with median value of given column. Minimum()—Returns RecordSet object [with] minimum value of given column. Moment()—Returns momentum of given column. Range()—Returns range of given column. Skew()—Returns skew of given column. Sort()—Returns the same set of records sorted by given column. StandardDeviation()—Returns standard deviation of given column. Sum()—Records sum of given column. Top()—Records given number of top records for given column. Typical()—Returns given number of typical records for given column. Variance()—Returns variance of given column.

pcardoso


1,452 post(s)
#07-Dec-07 14:53

any clue about how there parameters are calculated?

correlation() parametric? skew() skewness? standarddeviation() sample? variance() sample?

tjhb

3,494 post(s)
online
#07-Dec-07 22:27

No clue whatsoever. The documentation is, well, isn't. But someone at Manifold has put some work in here already. Maybe this would be a good place to focus requests, so as to build on it.

pcardoso


1,452 post(s)
#08-Dec-07 08:21

Not surprisingly we read from Manifold that "Statistical measures are usually very simple arithmetic". Everything must be absolutely clear when working with statistics or we will end up with some tools close to the Microsoft way of do this, and everybody knows that should never run any Microsoft statistical functions.

If I hear from Manifold that they will put some effort to develop statistics, I'll do my best, to my knowledge, to collect reliable, well documented information to contribute. Otherwise, with this philosophy of "stats is easy", forget it.

NVerge

305 post(s)
#08-Dec-07 12:39

"Otherwise, with this philosophy of "stats is easy", forget it."

Paulo,

Then we are relying on you to educate us :-)

pcardoso


1,452 post(s)
#08-Dec-07 16:05

No Nick, this is not the idea. I do not pretend to be wiser than any other. Honestly there are a lot of member much more capable than me. I meant that I need to invest a lot of time, and its hard to deeply understand statistics. When someone come and say its very simple, I just don't take it seriously.

I'll not waste my time if I fell that it will be taken easily.

NVerge

305 post(s)
#09-Dec-07 02:25

There is a great difference between the putting forward suggestions for statistical capabilities and using these capabilities to perform a statistically rigorous anlaysis of some information set. Giving someone the tools to perform a job, does not qualify them or make them competant to do the work.

When writing suggestions i often find the pragmatic approach easier. That is, it is easier to not confront the subject head-on, but to first describe what i wish to do, then to set out how this can be achieved.

Dimitri may be proven right when he predicted that there may not be a consensus about what is required/desired. I expect this will almost certainly be the case if a purist approach is taken.

A good first step would be to come up with a list and examples of the kinds of charts we might wish to produce to depict information in i-space (or combination of i-space and geographic). Then, set out the options that could be used to customise the appearence of these charts, formatting of the informatiuon drawings etc.

KlausDE


4,545 post(s)
#09-Dec-07 03:18

Anyway I vote for a detailed description of the implemented algorythms in help unlike the way we'r used to because many statistical methods have a scope of validity in that input data must follow or may deviate from normal distribution or similar. So for me the first steps towards statistical tools would be functions to test such preconditions. The output of many statistical methodes on the other side is only complete with confidence limits. A build-in Chi² test and perhaps other methods would be needed. I'm not at all sure if this really is the way to go for manifold. Perhaps in some well defined areas like Nicks satellite image analysis, that have a strong relation to geography, we could define a standard procedure.

NVerge

305 post(s)
#08-Dec-07 02:30

Here are the above functions (except histogram skew) for an image or arrayed dataset.

Attachments:
Correlation coefficient.pdf
Standard deviation.pdf
Standard variation.pdf

petzlux

982 post(s)
#07-Dec-07 09:06

Just my two cents:

I also would welcome the addition of more (geo)statistical analysis tools to Manifold, even though there is a lot that can already be done using some clever SQL queries.

But there is a need for more complete visualisation of data, following the approach of Geo-Da,to enable Exploratory Data Analysis. Two fundamental components are needed for the most fundamental implementation of EDA techniques:

* graphs to be implemented that can display any data (1D, 2D, 3D ?4D?) : (bi)histogram,box plot, scatter plot (with regression), conditional plots ... * to ability to have a full dynamic linking and brushing between table data, graphs and geographic display

Exploratory SPATIAL Data Analysis then supplements these techniques with relevant spatial autocorrelation analysis measures and techniques such as : * Moran's I * Moran scatterplot: univariate, bivariate, EB corrected * LISA local Moran: univariate, bivariate, EB corrected * Spatial Regression

You can also have a look at the Geo-Da Homepage and specifically the comprehensive Workbook that shows in detail the different EDA techniques.

Altough I am no expert by any means in these techniques, I would find them useful and use them more often if they were readily available in my favourite GIS workhorse!


Check out the Manifold Wiki with SQL and scripting examples at http://www.manipedia.eu/

Spatial Knowledge, my personal blog.

pcardoso


1,452 post(s)
#07-Dec-07 09:10

Good points. A number of associated point pattern analysis, covering some bascic spatial statistics, would be very valuable.

tskam
139 post(s)
#07-Dec-07 16:56

Hi petzlux,

I strongly supported your suggestion on incorporating interactive analytical visualisation tool within Manifold environment. This is an area of research within this geovisualisation community however have yet to fully accepted by the GISvendors. ESRI try to incorporate some of this capability in ArcGIS 9. It would be very much appreciated if Manifold development team (Dimtri, please!) could give the suggestion some priority. I am currently driving the idea of using GIS to enhance business intelligence. If Manifold can provide this capability, it will definately help in promoting Manifold to the business community.

Beside Geoda, there are several very useful examples to refer to:

GeoVista Studio especially ESTAT (http://www.geovista.psu.edu/ESTAT/index.html)

Mondrian (http://stats.math.uni-augsburg.de/Mondrian/)

MANET (http://stats.math.uni-augsburg.de/Manet/)

cdv (http://www.soi.city.ac.uk/~jad7/cdv/)

If Manifold is keen, I am more than happen to provide technical input as much as I can since I have supervised a PhD research on designing a GeoVisualisation tools using tcl/tk and one of my currently research interest is on integrating R spatial statistical and graphical functions within Manifold (unfortunately, the linking are static).

NVerge

305 post(s)
#08-Dec-07 02:44

There are a lot of powerful methods developed for the classification of remote sensing data, that can be extended to quantitative information (numeric) generally.

A very good book on the subsject is:

Richards, J.A. & Jia Xiuping (2006). Remote Sensing Digital Image Analysis: An Introduction. Springer.

mulliken

159 post(s)
#07-Dec-07 06:34

This makes sense to me. In the way of a tiny start on this, I would suggest a histogram tool. This was developed for 5.5 a while back, and discussed in this thread:http://69.17.46.171/Site/Thread.aspx?id=1687&ti=632695395670000000, however the link got lost in translation. I have a copy, but it does not seem to run in 8.0. In any event, this tool should be more than an add-in.

It's true there are not a lot of types of plots, but the simple bar plot we have available needs a lot more options available. A bar plot display of the histogram would work for me.

NVerge

305 post(s)
#08-Dec-07 02:39

There is already the capability to produce histograms in Manifold - see under Charts in Help. It is rather rudimentary however, and is little more than a utility to produce a histogram and format its appearence.

mulliken

159 post(s)
#08-Dec-07 11:21

I could use a rudimentary version, but can't find it under Charts in Help. The only histograms I can find are for images and surfaces. Can you suggest a search that would turn it up? Thanks.

NVerge

305 post(s)
#08-Dec-07 12:33

http://www.manifold.net/doc/charts.htm

mdsumner


3,617 post(s)
#07-Dec-07 17:30

FWIW, and for anyone interested in using R, here's some ravings.

Qualification: I'm not really a user of "statistics" anymore, but I have a reasonable understanding of applied, frequentist statistics - basically linear models - from university days. These days I use a small subset of very specifically applied Bayesian stats using MCMC, and my "stats knowledge" is if anything, diminishing. Because of this I have a thoroughly undeserved reputation as "a stats guy", and I just want to put that into context.

But I know R pretty well, and it already can be tightly integrated into a workflow with Manifold. I don't tend to do it much as I don't need to, but there are numerous ways of very easily transferring data between them. Why would I want to do that? In R you can specify linear models terribly easily via formulas:

if "d" is my data (analogous to a table in Manifold)

reg <- lm(y ~ x, data = d)

provides a basic regression of x vs. y.

Formulas can specify any linear model you want, and on a similar foundation you'll find practically any statistical formulation in R somewhere.

Did someone say "spatial": http://cran.r-project.org/src/contrib/Views/Spatial.html

Oh yeah, and R has an enormous and spectular capacity for plots. Check this out: R Screenshots How do I get a Manifold table into R? 1. Copy and paste via the clipboard:

d <- read.delim("clipboard") ## will faithfully read data from a table/selection copied in Manifold

2. External text file

d <- read.csv("C:/temp/file.csv")

3. Straight from a table in a map file.

Using the function defined here, and the RODBC package

library(RODBC)

## open a connection to the file

ch <- odbcConnectManifold("C:/temp/world.map")  

## pass a Manifold query to the connection

d <- sqlQuery(ch, "SELECT [Longitude (I)] AS lon, [Latitude (I)] AS lat, 

             [Capital] AS name FROM [Countries] WHERE [Area (I)] > 150;") 

What other mechanisms are there? Package rgdal (very easily installed using install.packages("rgdal") ) can read directly from shapefiles, GeoTIFFs, and all the rest. Potentially you can use the ODBC driver in GDAL to read directly from .map files, certainly you could via a datasource, but IMO it's too much mucking around when files will do.

Here are the GDAL formats for raster and vector Databases obviously provide a far better common storage method for both R and Manifold, but I've simply not had access to the equipment I need to explore that.

There are other routes, that I can't think of right now, as well.

As far as I'm concerned, there is no need to more tightly couple R and Manifold: you just need familiarity with both. The topic of Manifold providing more "statistics" support is another question. Tighter coupling? You can run R from scripts in Manifold, and that is probably the best way to automate things, but you need pre-defined R scripts, and you just spawn (from VBScript, C#, or even the ActiveX languages) RScript.exe, and you can transfer data via files.

Very tight coupling, at the C# level can be done for the underlying C libraries in R, but it involves labourious and tedious explication of each and every variable via marshalling (that's my understanding, thanks to a good friend, and some very minor tinkering). If tighter integration is to occur it would involve something along those lines, or a rewrite of R in .NET. I get the impression that looser coupling and powerful machines these days make either of those massive projects pretty moot. You can simply do most things with a bit of R and Manifold knowledge.

tskam
139 post(s)
#08-Dec-07 01:31

A group of my students has recently completed an excellence work by creating applications to integrate R spatial statistical function within Manifold. The spatial statistical functions are kernel density estimation, L function, D function and bivariate L function. With these applications, Manifold users can now perform spatial point pattern analysis within Manifold. I enclosed herewith examples of the interfaces. They are all created using Manifold form and writing in vbscript. The output can be either raster data for kernel density estimation or jpg.

Attachments:
GUI_Dfunction.jpg
GUI_KernelDensity.jpg
GUI_Lfunction.jpg
KDE.jpg

petzlux

982 post(s)
#08-Dec-07 02:00

tskam,

That looks awesome!! Any chance of sharing the scripts ? I would be specifically interested in the Kernel Density functionality, as this interpolation function is sadly missing from Manifold !


Check out the Manifold Wiki with SQL and scripting examples at http://www.manipedia.eu/

Spatial Knowledge, my personal blog.

mdsumner


3,617 post(s)
#08-Dec-07 16:04

Nice one. Do you use RScript?

petzlux

982 post(s)
#10-Mar-08 09:12

Just wanted to chase this up again, do you have any plans of making these tools available? I would be interested specifically in the Kernel Density tool. If you want to discuss with me offline, my email is p.weber@ucl.ac.uk


Check out the Manifold Wiki with SQL and scripting examples at http://www.manipedia.eu/

Spatial Knowledge, my personal blog.

pcardoso


1,452 post(s)
#10-Mar-08 11:01

Yes, it would be nice to know how you got R running into Manifold. Alternatively, are you sending/calling R outputs to recreate a Drawing/Surface?

NVerge

305 post(s)
#09-Dec-07 06:38

Perhaps the biggest obstacle to performing statistical data analysis, and especially when doing it on arrayed information sets and images, is the scale of the number crunching it requires.

A few simple examples to show this:

A typical Landsat 7 band has dimensions of the order of 8000x8000 pixels (or if it has been resolution enhanced of the order of 16000x16000 pixels). To perform a simple a pixel value frequency analysis will require processing 64 million (256 million pixels) and determining how many of each value there are.

Now suppose we wish to perform a biavariate frequency analysis for two Landsat 7 bands. By this i mean we wish to know how many of each unique pixel value combination there are. Landsat data is 8-bit meaning a pixel can have an integer value between 0 and 255 - so one of 256 possible values. The number of unique possible combinations of possible values for two bands of bit depth n is given by sum{1+2+...n-1} that quickly for even small values of n, becomes a rather large number. So for a pan sharpened Landsat 7 image 256 million pixels would have to be sorted and the frequency for each pixel value combination calculated.

Now, some remote sensing data is 32-bit per channel and most other arrayed data will be in 32-bit floating point form. n is therefore 2^32 which gives of the order of 42 billion possible values for a pixel to have.

If one wanted to perform a bivariate or multivariate frequency analysis of 32-bit data fields it would probably take longer than our natural lives to complete.

To avoid such ridicoulous large number crunches, there needs to be some form of sampling strategy to obtain a representative sample of field values when working with arrayed datasets or any dataset comprised of a very large number of samples and/or fields.

adaptagis

480 post(s)
#09-Dec-07 09:04

Let's focus first on the doable things that are required from most of us.. As mentioned the correlation tool is already stated somewhere but not running.. so if we can get toghether a view analysis tool whishes for x-mas and sent it all to Santaclaus and to the manifold sales team, we can unse the impact as a group on this thing..otheriwes it will perisch again in the salesdepartment.. Here my suggestions. -correlation analysis for multiple attributes this will run throu the table and provides a correlation-table

attribute a b c a 1 0.3 0.5 b 1 0.2

- the Median in the table as active collumn statement (like oracle)

- idelaly a regressen analysis (see GeoDa)

- a built in k fuction

- furhter cluster analysis like p function r function etc.

so let's group toghether and mak a statement! I hope that we can discuss matter further at the meeting in Phoenix..

NVerge

305 post(s)
#10-Dec-07 02:56

"so if we can get toghether a view analysis tool whishes for x-mas and sent it all to Santaclaus and to the manifold sales team, we can unse the impact as a group on this thing..otheriwes it will perisch again in the salesdepartment.."

There is very much more that needs to be done and provided than you suggest. i-space statistical and no-statistical analysis is a very large subject area.

The absolutely worst thing that could be done is to be hasty and superficial and to try and come up with suggestions off the top of your head. It needs a lot of researching to find out how things are done, best methods, and the hottest and latest ideas.

If users want versatile and comprehensive statistical and non-statistical i-space analytical, classification and depiction capabilities they should do the research and put together a well-thought through and comprehensive set of suggestions from the scientific principles and math right through to how it might be provided and controlled via the GUI. Consider it a one-time opportunity, to be done properly. I am at one with Paulo (Pcardoso) on this.

I suggest you aim to have a comprehensive ensemble of suggestions and background information ready to present to Manifold.net this time next year, for inclusion in release 10. From my own experiences, dont underestimate how long this will take. If takes longer to do, so be it. If a piece of work is not done well, better it is not done at all.

mdsumner


3,617 post(s)
#10-Dec-07 04:27

http://www.manifold.net/info/suggestions.shtml

gxdata

745 post(s)
#14-Oct-08 05:42

You use RScript.exe exclusively? Is R.NET of little interest?


~Ian Thomas
mdsumner


3,617 post(s)
#14-Oct-08 06:21

Hey we can't all be cutting edge - I've never heard of R.NET - just day dreamed about it. Have you used it?

Lately I've been using Put/GetSymbol() and Evaluate() in StatConnector from C#.

gxdata

745 post(s)
#14-Oct-08 07:14

No/yes, it was a buggy GUI wrapper when I tried it a while back. I will look it up in the next 14 minutes (significant time period) and post the URL, if you haven't seen the project.

That was easy - the R.NET project belongs to Mike Rykman, here.


~Ian Thomas
gxdata

745 post(s)
#14-Oct-08 08:43

I see that there has been movement at the station (since I last messed about with R), for the word had passed around that R really needed a GUI. See here, collated by Philippe Grosjean.

The only project website I have visited (tonight) is Sciviews-R which last had activity in June 2008.


~Ian Thomas
mdsumner


3,617 post(s)
#15-Oct-08 18:42

Thanks Ian, there are a whole bunch of GUIs for various projects. You can find them from the main R site under "Related projects". It's not really .NET is it? Unfortunately . . . but it's worth checking out to see how people approach this (I'm pretty sure R.NET just uses StatConnector basically)

3 msec Copyright (C) 2007-2008 Manifold.net. All rights reserved.