geoinfo2223:groupb:start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
geoinfo2223:groupb:start [2023/03/31 22:46] – [Link to Git hub (Jupyter Notebook):] sahil001 | geoinfo2223:groupb:start [2023/03/31 23:39] (current) – [Webscraping of Water gauge Stations from Emscher Genossenschaft Lippe Verband website] sahil001 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ======= Geo Informatics Final Project : Group B ======= | + | // |
- | ==== M-IE_2.02 Geoinformatics, | + | |
- | ** Under supervision of : Prof. Rolf Becker ** | + | |
- | ===== Participants: | + | |
- | ** 1- Sindhya Babu - 29928 ** | + | ======= Webscraping of Water gauge Stations from Emscher Genossenschaft Lippe Verband website ======= |
- | ** 2- Kiara Meço - 32358 ** | + | {{ : |
- | + | ||
- | ** 3- Sahil Chande.- 29927 ** | + | |
===== Introduction ===== | ===== Introduction ===== | ||
- | The Emscher Gennosenschaft Lippe Verband provides open raw data about the water level and discharge with daily updated values of the Emscher and Lippe area. The data is updated approximately every 15 minutes and two versions intranet and public versions are published. In our project we have used the open public data. | + | The Emscher Gennosenschaft Lippe Verband provides open raw data on the water level and discharge with daily updated values of the Emscher and Lippe area. The data is updated approximately every 15 minutes and two versions |
===== Project aim ===== | ===== Project aim ===== | ||
- | The project aim is to scrape time varying data on Water level and discharge from website of Übersichtskarte Pegelstände Emscher Lippe continuously using python library beautiful soup and also geo pandas and save them to PostgreSQL database. | + | The project aim is to scrape time-varying data on Water level and discharge from the website of Übersichtskarte Pegelstände Emscher Lippe continuously using the python library beautiful soup and also geo pandas and save them to the PostgreSQL database. |
[[https:// | [[https:// | ||
- | ===== Tools and packaged | + | ===== Tools and packages |
- | * **python** | + | * **Python** |
- | * For web scraping : BeautifulSoup, | + | * For web-scraping: BeautifulSoup, |
- | * For creation of geo data frame: geoPandas, pyproj, shapely.geometry. | + | * For the creation of geo data frame: geoPandas, pyproj, shapely.geometry. |
- | * For data base connection to PostgreSQL: sqlalchemy, psycopg2 | + | * For database |
* **PostgreSQL** | * **PostgreSQL** | ||
* Database to store data and geometry | * Database to store data and geometry | ||
- | * **Pg Admin 4 and POSTGIS | + | * **Pg Admin 4 and PostGIS |
* UI for easier operations with PostgreSQL | * UI for easier operations with PostgreSQL | ||
Line 34: | Line 29: | ||
* Application used for plotting different graphs, maps and georeferencing the stations to their precise locations. | * Application used for plotting different graphs, maps and georeferencing the stations to their precise locations. | ||
- | ===== One Time Scraping of Master Data of Gauges | + | ===== One-Time Scraping of Master Data of the gauges |
- | The Base data (Stammdaten in German) provided contains | + | The Base data (Stammdaten in German) provided contains information such as Station number (Pegelnummer), |
- | Firstly, we scrape the text displayed for the Pegel station and also the corresponding map for each station and store it locally. | + | Firstly, we scrape the text displayed for the Pegel station and also the corresponding map for each station and store it locally. |
{{: | {{: | ||
Line 45: | Line 40: | ||
[[https:// | [[https:// | ||
- | To determine the above-mentioned | + | To determine the above-mentioned values for all the Stations, we scrape the website using Python, beautifulSoup package. We loop over 200 PIDVal to get the master data of all the stations possible. |
- | To achieve this, the text stored under the html tags needs to be identified by inspecting the web page. Consider the example of Station KA Hamm, where it can be seen that the master data text is under <div id =” datacontainer” and <tr class=” normtext” html tags. The name of the station is contained in <div id =” popupcontenttitle” and the map image is however stored under the tag <div id =”mapcontainer” | + | To achieve this, the text stored under the HTML tags needs to be identified by inspecting the web page. Consider the example of Station KA Hamm, where it can be seen that the master data text is stored |
Line 55: | Line 50: | ||
** Figure 2: Inspecting source code to determine html tags to be extracted. ** | ** Figure 2: Inspecting source code to determine html tags to be extracted. ** | ||
- | The data extracted for one station is showed | + | |
+ | The data extracted for one station is shown below. The data frame contains two values ‘Station’ and ‘Station Values’. The Station Values column is then split to several columns and renamed and stored as a new data frame. | ||
{{: | {{: | ||
Line 61: | Line 57: | ||
** Figure 3: Python code showing extracting text of station name and values for KA Hamm ** | ** Figure 3: Python code showing extracting text of station name and values for KA Hamm ** | ||
- | After looping over, we found that several PIDVal contained no data. We drop these rows and now store the new data frame with non-null | + | After looping over, we found that several PIDVal contained no data. We drop these rows with no data and now store the new data frame with non-null |
{{: | {{: | ||
Line 67: | Line 63: | ||
{{: | {{: | ||
- | ** Figure 4: Data frame showing the data types and number of non-null column values. ** | + | ** Figure 4: Data frame showing the data types and a number of non-null column values. ** |
- | The geo-coordinates values of Rechtswert and Hochwert | + | The geo-coordinates values of Rechtswert and Hochwert |
- | The below figure shows an example of how geo data frame, gdf look like. | + | The below figure shows an example of what geo data frame, gdf looks like. |
{{: | {{: | ||
Line 77: | Line 73: | ||
** Figure 5: Geo data frame containing geometry column as shapely points ** | ** Figure 5: Geo data frame containing geometry column as shapely points ** | ||
- | ===== Storing the water stations | + | ===== Storing the master data of Water Stations |
- | We create a data base env_db and a new schema named ‘eglv’ is created under the data base using super user env_master. Under this schema we create a table ‘eglv_stations’ and upload the geo data frame to the table ‘eglv_stations’. The connection to the PostGIS database from python is enabled by creating a connection engine using sqlalchemy package and we pass this connection engine to_postgis. With chucksize=100, | + | We create a database //env_db// and a new schema named //‘eglv’// is created under the database |
{{: | {{: | ||
Line 90: | Line 86: | ||
** Figure 7: ‘eglv_stations’ table created under schema eglv shown in PgAdmin 4 ** | ** Figure 7: ‘eglv_stations’ table created under schema eglv shown in PgAdmin 4 ** | ||
+ | |||
+ | |||
+ | Next, we use a select query to query the table ‘eglv_stations’ to get all the rows and check if all the data has been uploaded correctly. | ||
+ | |||
{{: | {{: | ||
Line 97: | Line 97: | ||
===== Plotting the co-ordinates in Qgis ===== | ===== Plotting the co-ordinates in Qgis ===== | ||
- | In QGIS we select the EPSG: 31466 as Projected Coordinate Reference System (CRS) which is the DHDN / 3-degree Gauss-Kruger zone 2 corresponding to the co-ordinate system used by the Emscher Genossenschaft Lippe Verband. We first add PostGIS layer and connect to our data base. After successfully connecting to the data base by entering the super user credentials , we can see that the eglv schema and eglv_station | + | In QGIS we select the //EPSG: 31466// as the Projected Coordinate Reference System (CRS) which is the// DHDN / 3-degree Gauss-Kruger zone 2// corresponding to the co-ordinate system used by the Emscher Genossenschaft Lippe Verband. We first add the PostGIS layer and connect |
{{: | {{: | ||
- | After successful connection to Postgis. | + | After a successful connection to Postgis. |
- | As a base layer, Topographische | + | As a base layer, |
{{: | {{: | ||
- | Here in the below figure we can see the zoomed out map with all stations with dark red dot with same map Topographische NRW DTK100 Farbe and also projected in EPSG: 31466 co-ordinate | + | Here in the below figure, we can see the zoomed-out map with all stations with dark red dots with the same map Topographische NRW DTK100 Farbe and also projected in EPSG: 31466 coordinate |
{{: | {{: | ||
Line 113: | Line 113: | ||
** Figure 9: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS ** | ** Figure 9: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS ** | ||
- | Figure 10 shows the snippet of the location of few of the stations with the scale of 1 to 1 million. dark red dots are used to mark the station on NRW Topographische Karte Map. | + | Figure 10 shows the snippet of the location of a few of the stations with a scale of 1:1000000. Dark red dots are used to mark the station on the WMS layer. |
{{: | {{: | ||
Line 119: | Line 119: | ||
** Figure 10: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS on scale 1:1000000 ** | ** Figure 10: The station locations plotted on NRW Topographische Karte Map in EPSG: 31466 CRS on scale 1:1000000 ** | ||
- | while plotting exact points on map it is also important to take the background map similar to one which we have for the refrencing. Here in the figure 11 below you can see first image as the selected QGIS map for plotting stations and the second image show the map which they have on the website. | + | While plotting exact points on the map it is also important to take a background map similar to the one we have for referencing. Here in figure 11 below it can be seen that the first image is the selected QGIS map for plotting stations and the second image shows the map which they have on the website. |
{{: | {{: | ||
Line 127: | Line 127: | ||
** Figure 11: Comparison between KA Hamm Station in QGIS Vs KA Hamm Station in Emscher Genossenschaft Lippe Verband web page. ** | ** Figure 11: Comparison between KA Hamm Station in QGIS Vs KA Hamm Station in Emscher Genossenschaft Lippe Verband web page. ** | ||
- | In figure 12 we can see that all the stations | + | In figure 12 we can see that all the stations are listed on the Emscher Genossenschaft Lippe Verband web page with coordinates data shown below with custom-made location |
{{: | {{: | ||
- | ** Figure 12: All stations which are listed on Emscher Genossenschaft Lippe Verband web page marked with custom symbol. ** | + | ** Figure 12: All stations which are listed on the Emscher Genossenschaft Lippe Verband web page marked with a custom symbol. ** |
====== Periodic Web Scraping of ' | ====== Periodic Web Scraping of ' |
geoinfo2223/groupb/start.1680295595.txt.gz · Last modified: 2023/03/31 22:46 by sahil001