georeference.org
Subscribe to this thread
Home - General / All posts - Distance to Line
mtreglia21 post(s)
#09-Apr-09 08:12

Dear All,

I am trying to calculate/tabulate distance from polygons (wetlands), either from edge or centroid, to lines (streams) of different orders. My end goal is a table with a column with wetland and then a column for distance to nearest stream for every order. I can probably do SQL well enough to get the table I want as a result, but can't remember for the life of me how to calculate distance to a line (nearest part of the line).

Any suggestions?

Best,

Mike


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

tjhb

3,494 post(s)
#09-Apr-09 13:09

See the Spatial Extensions topic in the manual. Distance() and DistanceEarth() both do what you want. They measure between two metrics, from nearest location to nearest location.

Location here means any location, not necessarily a coordinate, along the boundary of a line or area, or inside an area, or of course a point.

If you especially want to measure distance to a centroid, you must first derive a centroid. If, for an area, you want to exclude measurement to contained locations you must derive the boundary.

mtreglia21 post(s)
#09-Apr-09 13:26

Thank you TJB.

I've been trying all day to get this to work and don't think I'm very far off, but I haven't been successful.I will note that I'm far from an expert on SQL and something could be very wrong in my code. (I've added a column "Order1" to "Wetland Centroids Inner Table". I want to find the minimum distance between all points in "Wetland Centroids Inner" and the nearest stream in "Streams_dvdc ARC Drawing" where order ("Strahler") =1 and put them in the table [Wetland Centroids Inner Table].[Order1].

With the following code I get this error: "Unknown column [Streams_drvdc ARC Drawing.STRAHLER].

UPDATE [Wetland Centroids Inner Table] SET [Order1]=

MIN(Distance([Streams_drvdc ARC Drawing].[id],[Wetland Centroids Inner].[id],

"m")) from [Streams_drvdc ARC Drawing],[Wetland Centroids Inner] where [Streams_drvdc ARC Drawing].[STRAHLER] = 1

Order by [Wetland Centroids Inner].[id]

I would appreciate any help you can offer on this

Best,

Mike


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

tjhb

3,494 post(s)
#09-Apr-09 13:33

Yes the SQL structure is not right. Try reformatting the code more logically (line breaks and indentation are free) and you will see the error better. Ask, is this an UPDATE query or a SELECT query? At the moment you have a mix.

Tim

RonHendrickson
190 post(s)
#09-Apr-09 14:18

Try updating the results of an inner join on the Wetlands Centroids Inner and the Streams .etc. tables. If you are still frustrated, you could attach a simplified map with the drawings and tables you have and perhaps someone can help you better.

mtreglia21 post(s)
#09-Apr-09 14:36

I have attached the .map file. I left the queries that I was trying to use (there was one Update and one Select query). Essentially I am trying to find the shortest distance between each wetland and the nearest stream of each order (Strahler) so I can determine whether there are strong associations between stream order and wetlands.

Again, I really appreciate your help. Any suggestions on how to get better with (spatial) SQL? I've mostly been trying to brute-force my way through it, but with very little success.

Thank you,

Mike

Attachments:
MLT_Wetlands.map


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

tjhb

3,494 post(s)
#09-Apr-09 14:44

Yes, keep trying to brute-force your way through it and don't give up. But also:

Always start from the inside out. (Search on "onion" on the forum.)

Format your code, neatly, and in the way you expect the SQL engine to parse it.

When something doesn't work as expected, never just say "oh well" and leave it alone. It's always worth working out your mistake. (It's much harder to learn from what goes right.)

Whenever you fix a mistake, change the formatting to reflect the correction. I.e. to reflect the way you now understand the SQL engine to read your code.

That way, over time, your formatting will become more and more closely aligned with how an SQL engine actually "thinks". That works, because how you format your code on the screen will also be how you visualise it in your head, i.e., the way you think.

[Edited a bit.]

tjhb

3,494 post(s)
#09-Apr-09 14:34

Here is the original code, reformatted.

UPDATE [Wetland Centroids Inner Table]

SET [Order1] =

    MIN(Distance([Streams_drvdc ARC Drawing].[id][Wetland Centroids Inner].[id]"m"))

FROM

    [Streams_drvdc ARC Drawing][Wetland Centroids Inner] 

WHERE [Streams_drvdc ARC Drawing].[STRAHLER] = 1

ORDER BY [Wetland Centroids Inner].[id]

The expression after the "=" is ungrammatical. Here it is alone:

MIN(Distance([Streams_drvdc ARC Drawing].[id], [Wetland Centroids Inner].[id], "m"))

FROM

    [Streams_drvdc ARC Drawing], [Wetland Centroids Inner] 

WHERE [Streams_drvdc ARC Drawing].[STRAHLER] = 1

ORDER BY [Wetland Centroids Inner].[id]

The "obvious" fix is to stick a SELECT in front of it and enclose it in brackets, to make a valid subquery.

... =

(SELECT MIN(Distance([Streams_drvdc ARC Drawing].[id], [Wetland Centroids Inner].[id], "m"))

FROM

    [Streams_drvdc ARC Drawing], [Wetland Centroids Inner] 

WHERE [Streams_drvdc ARC Drawing].[STRAHLER] = 1

ORDER BY [Wetland Centroids Inner].[id]

)

But that's not very helpful. On the one hand, a subquery can be used here, that is, after the "=" sign in a SET clause, provided it returns just one value. It's not always efficient to use a correlated subquery in this way, but in principle it will work. And this subquery will return just one value, so that's good.

What's not helpful is that it will return the same value every time it's called, i.e. for every row of the target table, [Wetland Centroids Inner Table]. That's clearly not what's wanted.

So. The first step when writing an UPDATE query (unless it has, say, one line) is always to write a SELECT statement that correctly returns the target values alongside the source values in a simple table.

mtreglia21 post(s)
#09-Apr-09 14:37

TJHB,

Thank you very much! I'll give that a try

Best,

Mike


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

tjhb

3,494 post(s)
#09-Apr-09 15:25

Essentially I am trying to find the shortest distance between each wetland and the nearest stream of each order (Strahler) so I can determine whether there are strong associations between stream order and wetlands.

You know exactly where you want to end up, and have put it into the clearest possible words, which is brilliant. So now imagine the result table. Here are the rows:

each wetland [one row per wetland]

and here are the columns

each Stahler order in streams [one column per order]

At present, all the orders in the streams table are listed in the "row" axis, i.e. in one column. So as a first step, you'll have to pivot them into the "column" axis, i.e. one column for each order.

How to do this?

You can use this query to list the different orders. These will become (interim) column names.

SELECT DISTINCT [Strahler] FROM [Streams_drvdc ARC Drawing]

Now you can use this list to pivot the original (same) table. For this the tools are a CROSS JOIN and a CASE expression.

(More to come.)

tjhb

3,494 post(s)
#09-Apr-09 16:44

Actually that was not quite the way to do it. With a CASE expression we need to name the result columns explicitly--1, 2, 3, etc., as below--so there is no real advantage in using a SELECT DISTINCT subquery with CASE.

Here are two almost equivalent ways to pivot the Strahler order values into separate columns.

The SELECT version is probably easier to understand, but you do need to name a fixed number of result columns.

The TRANSFORM version is harder to digest, but more flexible, because it will take account automatically of any number of classes. On the other hand, it can't be incorporated as a subquery. And in any case, we're going to have to use fixed column names when this result is used in the next layer outwards.

The result table of both queries is identical. (I've attached a screenshot.)

(1)

SELECT 

    [ID][Geom (I)],

    CASE [STRAHLER]

        WHEN 1 THEN [STRAHLER]

        ELSE NULL

    END AS [1],

    CASE [STRAHLER]

        WHEN 2 THEN [STRAHLER]

        ELSE NULL

    END AS [2],

    CASE [STRAHLER]

        WHEN 3 THEN [STRAHLER]

        ELSE NULL

    END AS [3],

    CASE [STRAHLER]

        WHEN 4 THEN [STRAHLER]

        ELSE NULL

    END AS [4],

    CASE [STRAHLER]

        WHEN 5 THEN [STRAHLER]

        ELSE NULL

    END AS [5],

    CASE [STRAHLER]

        WHEN 6 THEN [STRAHLER]

        ELSE NULL

    END AS [6]

FROM [Streams_drvdc ARC Drawing];

(2)

TRANSFORM 

    MIN([STRAHLER]-- Note 1

        SELECT [ID][Geom (I)] -- Note 2

        FROM [Streams_drvdc ARC Drawing]

        GROUP BY [ID][Geom (I)]

    PIVOT [STRAHLER] -- Note 3

;

-- Notes

-- 1. Where the aggregated expression is identical to the PIVOT expression, 

-- any of MIN(), MAX(), FIRST(), LAST(), AVG() can be used

-- 2. Where the returned columns match the grouping columns, a SELECT ... GROUP BY ... query 

-- is equivalent to a SELECT DISTINCT ... query on the same columns without grouping

-- 3. The expression used for "regrouping", i.e. to create column names

I've added notes to the TRANSFORM query, mostly for my benefit. They may or may not help others. If they don't, please ignore them. I find TRANSFORM queries difficult.

Attachments:
Pivot results.png

tjhb

3,494 post(s)
#09-Apr-09 16:50

It now occurs to me that it would be better (and simpler) to pivot at the end, i.e. after measuring the distances.

Here's the distance-measuring part.

SELECT

    [W].[ID]

    [S].[STRAHLER],

    MIN(Distance([W].[Geom (I)][S].[Geom (I)])) AS [D]

FROM 

    [Wetland Centroids Inner] AS [W]

    CROSS JOIN

    [Streams_drvdc ARC Drawing] AS [S]

GROUP BY [W].[ID][S].[STRAHLER]

tjhb

3,494 post(s)
#09-Apr-09 17:19

Now turn this into a TRANSFORM query, to pivot the results by Strahler order. (This is not so easy I think.)

TRANSFORM 

    MIN(Distance([W].[Geom (I)][S].[Geom (I)]))

    SELECT

        [W].[ID][W].[Geom (I)]

    FROM 

        [Wetland Centroids Inner] AS [W]

        CROSS JOIN

        [Streams_drvdc ARC Drawing] AS [S]

    GROUP BY [W].[ID][W].[Geom (I)]

PIVOT [STRAHLER];

[P.s. I have followed my own advice and changed the way I format a TRANSFORM query, because I understand how they work a bit better after writing this one.]

mtreglia21 post(s)
#09-Apr-09 20:39

TJHB,

Thank you very much for your time. I'll try this in just a little bit, and can't wait to see it work.

Best,

Mike


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

mtreglia21 post(s)
#10-Apr-09 07:59

It works perfect- thanks again!


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

tjhb

3,494 post(s)
#10-Apr-09 08:03

Here's a SELECT-only version of the same thing, i.e. to pivot the distance-measuring query by Strahler order. The number and name of the result columns must be specified explicitly here (the TRANSFORM version has the advantage on that score).

SELECT

    [ID][Geom (I)],

    FIRST(IIf([STRAHLER] = 1, [D]NULL)) AS [1],

    FIRST(IIf([STRAHLER] = 2, [D]NULL)) AS [2],

    FIRST(IIf([STRAHLER] = 3, [D]NULL)) AS [3],

    FIRST(IIf([STRAHLER] = 4, [D]NULL)) AS [4],

    FIRST(IIf([STRAHLER] = 5, [D]NULL)) AS [5],

    FIRST(IIf([STRAHLER] = 6, [D]NULL)) AS [6],

    FIRST(IIf([STRAHLER] = 7, [D]NULL)) AS [7]

FROM

    (SELECT

        [W].[ID][W].[Geom (I)],

        [S].[STRAHLER],

        MIN(Distance([W].[Geom (I)][S].[Geom (I)])) AS [D]

    FROM 

        [Wetland Centroids Inner] AS [W]

        CROSS JOIN

        [Streams_drvdc ARC Drawing] AS [S]

    GROUP BY [W].[ID][W].[Geom (I)][S].[STRAHLER]

    )

GROUP BY [ID][Geom (I)];

So far, then, either version produces the result table that was needed. Getting the results into the original table requires an UPDATE query. The SELECT query can be "wrapped" (using an extra join), or its result table can be included by reference. The result of the TRANSFORM query must be specified by reference.

Mike, I'm learning quite a bit too. There's been a certain amount of thinking out loud on my part, which I hope no one minds. Any questions about how this stuff operates would be welcome since they'd make me think differently about it.

mtreglia21 post(s)
#11-Apr-09 13:31

Thanks. Glad that my inexperience with SQL is helping you too :)

I see what you are doing for the most part so it is definitely helping me figure out what to do in future instances of this.

Best,

Mike


Michael L. Treglia

Graduate Student

Wildlife and Fisheries Sciences

Texas A&M University

http://people.tamu.edu/~mlt35

formako107 post(s)
#18-Sep-09 03:57

Hi! Sorry for the hijack, but I'm stuck with a similar, but clearly not so complex query.

I have two Drawings: [Lake_Polygon] and [Stream_Line]. I want to calculate the distance (meters) from each [Lake_Polygon] (edge) to the closest [Stream_Line]. Then I want this value to be added to the column "Distance". I don't need any other information to be added, just how far is the closest stream to a given lake.

SELECT

[W].[ID],

[S].[ID],

MIN(Distance([W].[Geom (I)],

[S].[Geom (I)])) AS [DISTANCE]

FROM

Lake_Polygon AS [W]

CROSS JOIN

Stream_Line AS [S]

GROUP BY

[W].[ID],

[S].[ID]

Now, I think the end result is a table with the distance from each node of the [Lake_Polygon] to each segment of the [Stream_Line]. Or am I totally wrong here? How do I need to change the query so that I will only get the shortest distances for each polygon?

formako107 post(s)
#18-Sep-09 06:18

OR..

If the resulting table has the following data (for example):

"LAKE_ID";"STREAM_ID";"DISTANCE(m)"

1;2;300

1;3;400

1;4;500

2;2;350

2;3;450

2;4;550

How, with SQL, do I SELECT only those LAKE_ID's and their DISTANCE(m), that have the shortest distance?

I.e.

1;2;300

2;2;350

Many thanks in advance!

tjhb

3,494 post(s)
#18-Sep-09 15:39

You're doing it exactly right—except also a bit wrong.

Distance() is the right tool for the job. It gives minimum distance, not just coordinate-to-coordinate, node-to-segment, but the distance between the nearest locations in or on the given objects. It's convenient and fast.

What you're doing wrong is in the grouping.

Go back to your purpose (clearly put, always a good sign): you want to calculate "the distance (meters) from each [Lake_Polygon] edge to the closest [Stream] line".

The key words are from each... and to the closest...

From each (or for each) shows the columns to be used for grouping. So this column, or these columns if more than one, go in the GROUP BY (and can be listed in the SELECT part of the statement).

To the closest (or some other expression that implies aggregation and comparison) shows the columns to be grouped. This column or these columns don't go in the GROUP BY, and as a corollary, can't be listed in the SELECT part of the statement.

So as you see, the short version for the present case is: take [S].[ID] out in both places and you're done.

formako107 post(s)
#20-Sep-09 23:21

Tim,

Thanks for taking the time to run through the query for me! Makes sense now, even to me. One additional question. How do I add this new column, Distance, to the table of the [Lake_Polygon] Drawing? Can it be done/included in the SQl query or do I need to join the field by other means in Manifold?

formako107 post(s)
#20-Sep-09 23:40

I wonder what's going on? Manifold seems to get stuck when I run this query:

SELECT

[W].[ID],

MIN(Distance([W].[Geom (I)],

[S].[Geom (I)])) AS [DISTANCE]

FROM

Lake_Polygon AS [W]

CROSS JOIN

Stream_Line AS [S]

GROUP BY

[W].[ID]

formako107 post(s)
#22-Sep-09 01:57

Any ideas how I could do this some other way?

tjhb

3,494 post(s)
#22-Sep-09 02:15

What do you mean by "Manifold seems to get stuck"? Does it hang, or just seem unlikely ever to finish, or what?

If you have a lot of objects then the problem is likely to be the CROSS JOIN. It would be better to turn it into a filtered join, for example using a search radius.

But to take this further some sample of the data, and an idea of its scale, would help.

formako107 post(s)
#22-Sep-09 02:33

Sure, sorry for being vague. It seems to unlikely ever to finish:

When running the query Joining Data goes OK, Handling 'Group By' goes to 99% and then Manifold just keeps flicking the table. I think it's doing something as it consumes CPU.

Lake_Polygon has 1000 features and Stream_Line 60 000 features. The geographical extent of these Drawings is approx. 1000km x 300km.

I hope this helps!

tjhb

3,494 post(s)
#22-Sep-09 03:16

Yes it does. What you need to work out (it can be very rough) is the minimum distance from any lake to the nearest stream. Factor in a safety margin, erring on the high side by some whole number factor. (Anything from 2 to 10, depending on how sure you are that you've checked enough samples. The sampling can also be done by code.)

formako107 post(s)
#22-Sep-09 03:23

The minimum distance from any lake to the nearest stream would be 0m as some of the streams "run through" a lake/origin from one.

tjhb

3,494 post(s)
#22-Sep-09 03:30

Ha! Quite right. I should have said the maximum distance from any lake to the nearest stream. Sorry about that.

formako107 post(s)
#22-Sep-09 03:41

Ok. :) The maximum distance from any lake to the nearest stream is, with the margin, 120 000m.

tjhb

3,494 post(s)
#22-Sep-09 03:54

Then try using an UPDATE query like this.

If it doesn't already exist, you first need to create the column [DISTANCE] in the [Lake_Polygon] drawing. Give it type Floating Point (double) unless you have a better idea.

UPDATE

    (SELECT

        [D].[DISTANCE][T].[DISTANCE] AS [D]

    FROM

        [Lake_Polygon] AS [D]

        INNER JOIN

        (SELECT

            [W].[ID],

            MIN(Distance([W].[Geom (I)][S].[Geom (I)])) AS [DISTANCE]

        FROM

            [Lake_Polygon] AS [W]

            LEFT JOIN

            [Stream_Line] AS [S]

            ON Distance([W].[Geom (I)][S].[Geom (I)]) <= 120000

        GROUP BY [W].[ID]

        ) AS [T]

        ON [D].[ID] = [T].[ID]

    )

SET [DISTANCE] = [D];

formako107 post(s)
#22-Sep-09 04:04

Tim, you're an absolute 100% gem! Running [DistanceToLine]: 78.938 sec and [DISTANCE] column was populated by the distance (meters) from each [Lake_Polygon] (edge) to the closest [Stream_Line]. Again, I'm very greatfull for your help and I hope this has helped other SQL-handicapped Manifold users (is there anyone else?) as well.

12 msec Copyright (C) 2007-2008 Manifold.net. All rights reserved.