Hellenic Post Shipment Time

An analysis of scraped data during the course of Business Intelligence / Knowledge Management.

Read the report regarding the scraping (greek)

The scraper returned a 17mb JSON dataset with shipment data, including dates, shipment status, locations but no coordinates. To access the time that has passed between the first and the last status of a shipment, another script was created to parse date and time. Records with zero days and records with more than 365 days on the columns "Days needed" were discarded as outliers and only locations that appeared both on the columns "From location" and "To location" were kept, to eliminate erroneous data and data that didn't belong to Post Offices. The result was saved as a separate JSON file.


Explore the results using Tableau:


The above data could be then visualized in a graph with nodes but a map with the exact locations would give a better impression of the amount of shipments and the distribution across the various offices. To do so the coordinates of every Hellenic Post Office would be required, and that data is hopefully online! Before this, all of the office IDs and names were gathered by prefecture, and then combined with the above coordinates using Python and Scrapy.


By simply inspecting the data of the first scraping and the one by Scrapy, it was clear that the official Hellenic Post Office names and the ones provided by the tracking system were not in the same format. Some were lacking the full names of the areas they belonged, or had a street name along with a city name. That problem was solved by classifying the names using the python module difflib.SequenceMatcher. The results contained errors but the obvious ones were omitted from the final visualization.



Map visualization of shipment times

Colour only shipments that need more than:     days to ship   Filter map

Explore and search the raw data:

From location To location Days needed Final status Tracking code