User Tools

Site Tools


geoinfo2223:groupb:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
geoinfo2223:groupb:start [2023/03/31 22:46] – [Georeferencing 5 stations from the scrape data] sahil001geoinfo2223:groupb:start [2023/03/31 23:39] (current) – [Webscraping of Water gauge Stations from Emscher Genossenschaft Lippe Verband website] sahil001
Line 1: Line 1:
-======= Geo Informatics Final Project : Group B ======= +//Contributors:// Sindhya Babu, Kiara Meço, and Sahil Chande
-==== M-IE_2.02 Geoinformatics, WS2022/23 ==== +
-** Under supervision of Prof. Rolf Becker ** +
-===== Participants:=====+
  
-** 1- Sindhya Babu - 29928 **+======= Webscraping of Water gauge Stations from Emscher Genossenschaft Lippe Verband website =======
  
-** 2- Kiara Meço   - 32358 ** +{{ :geoinfo2223:groupb:cover.png?700 |}}
- +
-** 3- Sahil Chande.- 29927 **+
  
 ===== Introduction ===== ===== Introduction =====
  
-The Emscher Gennosenschaft Lippe Verband provides open raw data about the water level and discharge with daily updated values of the Emscher and Lippe area. The data is updated approximately every 15 minutes and two versions intranet and public versions are published. In our project we have used the open public data.+The Emscher Gennosenschaft Lippe Verband provides open raw data on the water level and discharge with daily updated values of the Emscher and Lippe area. The data is updated approximately every 15 minutes and two versions which is intranet (for registered users) and public versions are published. In our projectwe have used open public data.
  
 ===== Project aim ===== ===== Project aim =====
-The project aim is to scrape time varying data on Water level and discharge from website of Übersichtskarte Pegelstände Emscher Lippe continuously using python library beautiful soup and also geo pandas and save them to PostgreSQL database. Georeference five pegel stations on the map using QGIS along with plotting all stations precise location on the map.+The project aim is to scrape time-varying data on Water level and discharge from the website of Übersichtskarte Pegelstände Emscher Lippe continuously using the python library beautiful soup and also geo pandas and save them to the PostgreSQL database. Then, we have geo-referenced five Pegel (or Water) stations on the map using QGIS along with plotting the stations precise location on the map.
 [[https://howis.eglv.de/pegel/html/uebersicht_internet.php]] [[https://howis.eglv.de/pegel/html/uebersicht_internet.php]]
  
-===== Tools and packaged used =====+===== Tools and packages used =====
  
-  * **python**  +  * **Python**  
-      * For web scraping : BeautifulSoup, pandas, numpy and requests +      * For web-scraping: BeautifulSoup, pandas, numpyand requests. 
-      * For creation of geo data frame: geoPandas, pyproj, shapely.geometry. +      * For the creation of geo data frame: geoPandas, pyproj, shapely.geometry. 
-      * For data base connection to PostgreSQL: sqlalchemy, psycopg2+      * For database connection to PostgreSQL: sqlalchemy, psycopg2.
  
   * **PostgreSQL**   * **PostgreSQL**
       * Database to store data and geometry       * Database to store data and geometry
  
-  * **Pg Admin 4 and POSTGIS extension for PostgreSQL**+  * **Pg Admin 4 and PostGIS extension for PostgreSQL**
       * UI for easier operations with PostgreSQL       * UI for easier operations with PostgreSQL
  
Line 34: Line 29:
       * Application used for plotting different graphs, maps and georeferencing the stations to their precise locations.       * Application used for plotting different graphs, maps and georeferencing the stations to their precise locations.
  
-===== One Time Scraping of Master Data of Gauges ("Pegelstammdaten") ===== +===== One-Time Scraping of Master Data of the gauges ("Pegelstammdaten") ===== 
-The Base data (Stammdaten in German) provided contains the information such as Station number (Pegelnummer), Water body if it is either Lippe or Emscher (Gewässer), River kilometer (Flusskilometer) in km, Level zero-point(Pegelnullpunkt), above sea-level in mNN , total Catchment area (Einzugsgebiet )in km², Easting (Rechtswert) and Northing (Hochwert) Gauss Krüger co-ordinates with Mean High Water level (MHW) in cm, Mean Lowest Water level (MNW) in cm and Medium Water level (MW) in cm for the periods from 2001 to 2010.  In addition, image of the Pegel Station and the map showing the location of the Pegel station is displayed.   +The Base data (Stammdaten in German) provided contains information such as Station number (Pegelnummer), Water body if it is either Lippe or Emscher (Gewässer), River length (Flusskilometer) in km, Level zero-point(Pegelnullpunkt), above sea-level in mNN, total Catchment area (Einzugsgebiet) in km², Easting (Rechtswert) and Northing (Hochwert) Gauss Krüger co-ordinates with Mean High Water level (MHW) in cm, Mean Lowest Water level (MNW) in cm and Medium Water level (MW) in cm for the periods from 2001 to 2010.  In addition to thisthe image of the Pegel Station and the map showing the location of the Pegel station is displayed.   
-Firstly, we scrape the text displayed for the Pegel station and also the corresponding map for each station and store it locally. Below image shows an example of the Master data for Station KA Hamm.+Firstly, we scrape the text displayed for the Pegel station and also the corresponding map for each station and store it locally. The below image shows an example of the Master data for Station KA Hamm.
  
 {{:geoinfo2223:groupb:foto1.png?600|}} {{:geoinfo2223:groupb:foto1.png?600|}}
Line 45: Line 40:
 [[https://howis.eglv.de/pegel/html/stammdaten_html/MO_StammdatenPegel.php?PIDVal=32]] [[https://howis.eglv.de/pegel/html/stammdaten_html/MO_StammdatenPegel.php?PIDVal=32]]
  
-To determine the above-mentioned mentioned values for all the Stations, we scrape the website using Python, beautifulSoup package. We loop over 200 PIDVal to get the master data of all the stations possible. +To determine the above-mentioned values for all the Stations, we scrape the website using Python, beautifulSoup package. We loop over 200 PIDVal to get the master data of all the stations possible. 
-To achieve this, the text stored under the html tags needs to be identified by inspecting the web page. Consider the example of Station KA Hamm, where it can be seen that the master data text is under <div id =” datacontainer” and <tr class=” normtext” html tags. The name of the station is contained in <div id =” popupcontenttitle” and the map image is however stored under the tag <div id =”mapcontainer”  and <img src=. This is extracted with beautiful soup and the same process is repeated for PIDVal from to 200.+To achieve this, the text stored under the HTML tags needs to be identified by inspecting the web page. Consider the example of Station KA Hamm, where it can be seen that the master data text is stored under //<div id =” datacontainer”// and //<tr class=” normtext”// html tags. The name of the station is contained in //<div id =” popupcontenttitle”// and the map image is however stored under the tag //<div id =”mapcontainer”//  and //<img src=//. This is showed under Figure 1. 
  
  
Line 55: Line 50:
 ** Figure 2: Inspecting source code to determine html tags to be extracted. ** ** Figure 2: Inspecting source code to determine html tags to be extracted. **
  
-The data extracted for one station is showed below. The data frame contains two values ‘Station’ and ‘Station Values’. The Station Values column is then split to several columns and renamed and stored as a data frame. + 
 +The data extracted for one station is shown below. The data frame contains two values ‘Station’ and ‘Station Values’. The Station Values column is then split to several columns and renamed and stored as a new data frame. 
  
 {{:geoinfo2223:groupb:screenshot_2023-03-31_at_11.19.43_am.png?400|}} {{:geoinfo2223:groupb:screenshot_2023-03-31_at_11.19.43_am.png?400|}}
Line 61: Line 57:
 ** Figure 3: Python code showing extracting text of station name and values for KA Hamm ** ** Figure 3: Python code showing extracting text of station name and values for KA Hamm **
  
-After looping over, we found that several PIDVal contained no data. We drop these rows and now store the new data frame with non-null water stations. The new data frame contains 131 Stations and only 103 stations had geo-coordinates data available as shown below+After looping over, we found that several PIDVal contained no data. We drop these rows with no data and now store the new data frame with non-null values. The new data frame contains 131 Stations and only 103 stations had geo-coordinates data available as shown under figure 4
  
 {{:geoinfo2223:groupb:fig4_1.png?400|}} {{:geoinfo2223:groupb:fig4_1.png?400|}}
Line 67: Line 63:
 {{:geoinfo2223:groupb:fig4_2.png?400|}} {{:geoinfo2223:groupb:fig4_2.png?400|}}
  
-** Figure 4: Data frame showing the data types and number of non-null column values. **+** Figure 4: Data frame showing the data types and number of non-null column values. **
  
-The geo-coordinates values of Rechtswert and Hochwert stred in the above data frame is still of data type float63. Since we want our co-ordinates to be recognized as geographic location data, we use    geoPandas package in python, to convert pandas data frame to a geo data frame or gdf. Since a geo data frame requires a shapely object, we pass the columns containing Easting and Northing values i.e Rechtswert_(Gauss-Krüger), Hochwert_(Gauss-Krüger) respectively are into the function points_from_xy to transform it to shapely point.+The geo-coordinates values of Rechtswert and Hochwert stored in the above data frame are still of data type float63. Since we want our coordinates to be recognized as geographic location data, we use the geoPandas package in python, to convert the pandas data frame to a geo data frame or gdf. Since a geo data frame requires a shapely object, we pass the columns containing Easting and Northing values i.e Rechtswert_(Gauss-Krüger), Hochwert_(Gauss-Krüger) respectively into the function points_from_xy to transform it to shapely points.
  
-The below figure shows an example of how geo data frame, gdf look like. +The below figure shows an example of what geo data frame, gdf looks like. 
  
 {{:geoinfo2223:groupb:fig5.png?400|}} {{:geoinfo2223:groupb:fig5.png?400|}}
Line 77: Line 73:
 ** Figure 5: Geo data frame containing geometry column as shapely points ** ** Figure 5: Geo data frame containing geometry column as shapely points **
  
-=====  Storing the water stations master data in PostgreSQL database =====+=====  Storing the master data of Water Stations in PostgreSQL database =====
  
-We create a data base env_db and a new schema named ‘eglv’ is created under the data base using super user env_master. Under this schema we create a table ‘eglv_stations’ and upload the geo data frame to the table ‘eglv_stations’. The connection to the PostGIS database from python is enabled by creating a connection engine using sqlalchemy package and we pass this connection engine to_postgis. With chucksize=100, 100 rows will be written at a time to the data base.+We create a database //env_db// and a new schema named //‘eglv’// is created under the database using super user// env_master//. Under this schemawe create a table //‘eglv_stations’// and upload the geo data frame to the table //‘eglv_stations’//. The connection to the PostGIS database from python is enabled by creating a connection engine using sqlalchemy package and we pass this connection engine to_postgis. With chucksize=100, 100 rows will be written at a time to the database. This is shown under figure 6,7. But since the data frame contains only 131 rows, chuksize does not play a significant role when compared to data base with larger values
  
 {{:geoinfo2223:groupb:screenshot_2023-03-31_at_11.37.34_am.png?400|}} {{:geoinfo2223:groupb:screenshot_2023-03-31_at_11.37.34_am.png?400|}}
Line 90: Line 86:
  
 ** Figure 7: ‘eglv_stations’ table created under schema eglv shown in PgAdmin 4 ** ** Figure 7: ‘eglv_stations’ table created under schema eglv shown in PgAdmin 4 **
 +
 +
 +Next, we use a select query to query the table ‘eglv_stations’ to get all the rows and check if all the data has been uploaded correctly. 
 +
  
 {{:geoinfo2223:groupb:fig7.png?400|}} {{:geoinfo2223:groupb:fig7.png?400|}}
Line 97: Line 97:
 ===== Plotting the co-ordinates in Qgis ===== ===== Plotting the co-ordinates in Qgis =====
  
-In QGIS we select the EPSG: 31466 as Projected Coordinate Reference System (CRS) which is the DHDN / 3-degree Gauss-Kruger zone 2 corresponding to the co-ordinate system used by the Emscher Genossenschaft Lippe Verband. We first add PostGIS layer and connect to our data base. After successfully connecting to the data base by entering the super user credentials , we can see that the eglv schema and eglv_station is available, shown as in the below figure. +In QGIS we select the //EPSG: 31466// as the Projected Coordinate Reference System (CRS) which is the// DHDN / 3-degree Gauss-Kruger zone 2// corresponding to the co-ordinate system used by the Emscher Genossenschaft Lippe Verband. We first add the PostGIS layer and connect it to our database. After successfully connecting to the database by entering the superuser credentials, we can see that the eglv schema and eglv_station are available, as shown in the below figure. 
  
 {{:geoinfo2223:groupb:ps8.png?400|}} {{:geoinfo2223:groupb:ps8.png?400|}}
  
-After successful connection to Postgis.+After successful connection to Postgis.
  
-As a base layer, Topographische NRW DTK100 Farbe Map is added , also projected as EPSG: 31466 co-ordinate system as shown in the below figure. The inverted triangles indicate the location of the stations. +As a base layer, we add WMS layer - > //NW Digitale Topographische Karten DTK100 Farbe// Map is added, also projected as EPSG: 31466 coordinate system as shown in the below figure. The inverted triangles indicate the location of the stations. 
  
 {{:geoinfo2223:groupb:fig9.png?400|}} {{:geoinfo2223:groupb:fig9.png?400|}}
  
-Here in the below figure we can see the zoomed out map with all stations with dark red dot with same map Topographische NRW DTK100 Farbe and also projected in EPSG: 31466 co-ordinate system.+Here in the below figurewe can see the zoomed-out map with all stations with dark red dots with the same map Topographische NRW DTK100 Farbe and also projected in EPSG: 31466 coordinate system.
  
 {{:geoinfo2223:groupb:fig9_1.png?400|}} {{:geoinfo2223:groupb:fig9_1.png?400|}}
Line 113: Line 113:
 ** Figure 9: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS ** ** Figure 9: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS **
  
-Figure 10 shows the snippet of the location of few of the stations with the scale of 1 to 1 milliondark red dots are used to mark the station on NRW Topographische Karte Map.+Figure 10 shows the snippet of the location of few of the stations with scale of 1:1000000Dark red dots are used to mark the station on the WMS layer
  
 {{:geoinfo2223:groupb:fig10.png?400|}} {{:geoinfo2223:groupb:fig10.png?400|}}
Line 119: Line 119:
 ** Figure 10: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS on scale 1:1000000 ** ** Figure 10: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS on scale 1:1000000 **
  
-while plotting exact points on map it is also important to take the background map similar to one which we have for the refrencing. Here in the figure 11 below you can see first image as the selected QGIS map for plotting stations and the second image show the map which they have on the website. both the maps shows the location of station in KA Hamm.+While plotting exact points on the map it is also important to take background map similar to the one we have for referencing. Here in figure 11 below it can be seen that the first image is the selected QGIS map for plotting stations and the second image shows the map which they have on the website. Both maps show the location of the station in KA Hamm.
  
 {{:geoinfo2223:groupb:fig11.png?400|}} {{:geoinfo2223:groupb:fig11.png?400|}}
Line 127: Line 127:
 ** Figure 11: Comparison between KA Hamm Station in QGIS Vs KA Hamm Station in Emscher Genossenschaft Lippe Verband web page. ** ** Figure 11: Comparison between KA Hamm Station in QGIS Vs KA Hamm Station in Emscher Genossenschaft Lippe Verband web page. **
  
-In figure 12 we can see that all the stations which are listed on Emscher Genossenschaft Lippe Verband web page with coordinates data are shown below with custom made location marker in dark blue colour.+In figure 12 we can see that all the stations are listed on the Emscher Genossenschaft Lippe Verband web page with coordinates data shown below with custom-made location markers in dark blue color.
  
 {{:geoinfo2223:groupb:fig12.png?400|}} {{:geoinfo2223:groupb:fig12.png?400|}}
  
-** Figure 12: All stations which are listed on Emscher Genossenschaft Lippe Verband web page marked with custom symbol. **+** Figure 12: All stations which are listed on the Emscher Genossenschaft Lippe Verband web page marked with custom symbol. **
  
 ====== Periodic Web Scraping of 'Aktuelle Pegelstände für Emscher und Lippe' ====== ====== Periodic Web Scraping of 'Aktuelle Pegelstände für Emscher und Lippe' ======
Line 369: Line 369:
  
 [[https://github.com/SindhyaBabu/GeoInformatics-Final]] [[https://github.com/SindhyaBabu/GeoInformatics-Final]]
- 
-contents of git repository 
- 
-1 -  Pegel Stations master data scraping 
- 
-2 -  Periodic data of water level and discharge of Pegel Stations 
- 
-3 -  Scraping of map images for all the stations 
- 
-4 -  Storing images to database  
- 
  
 ===== Additional References ===== ===== Additional References =====
geoinfo2223/groupb/start.1680295570.txt.gz · Last modified: 2023/03/31 22:46 by sahil001