DESIGN PROCESS DATA STORAGE AND ORGANIZE DATA SCRAPING

Falentino Sembiring, Dian Permata Sari

Abstract


In this study Web scraping will explain the process of retrieving urls from similar sites for the erosion process and storing url data on daily, weekly, monthly, and annual databases, so that url data can be valid and invalid urls will be filtered. filtering will be done to make it easier for a number of processes to be moved into the database. The next process will distinguish url based on available content data based on title, tags, keywords like SEO. Each step will be stored in the data warehouse to create the url data center. Hopefully this is the stage to collect data for big data. Problems are limited by designing web crawlers by searching for similar sites and storing processes in the database. From the database it will be directed to the data warehouse data. after in the data warehouse, data will be processed in the interface to the user divided by classification


Full Text:

PDF

References


Teknik Dasar Web Scraping https://blog.javan.co.id/teknik-

dasar-web-scraping-aa7d7e223093 [accessed mar 15 2019].

Implementasi Web Scrapping dan Text Mining untuk Akuisisi dan Kategorisasi Informasi dari Internet (Studi Kasus: Tutorial Hidroponik). Available from:

https://www .researchgate.net/publication/329039083_Implemen

tasi_Web_Scrapping_dan_Text_Mining_untuk_Akuisisi_dan_K

ategorisasi_Informasi_dari_Internet_Studi_Kasus_Tutorial_Hidr

oponik [accessed mar 17 2019].

Cloud Based Web Scraping for Big Data Applications. Available from:

https://www .researchgate.net/publication/321260574_Cloud_Ba sed_Web_Scraping_for_Big_Data_Applications [accessed mar 17 2019].

Penerapan teknik web scraping pada mesin pencari artikel ilmiah Available from:

https://www .researchgate.net/publication/267214300_Penerapan

_teknik_web_scraping_pada_mesin_pencari_artikel_ilmiah [accessed mar 16 2019].


Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 INTEGRATED (Journal of Information Technology and Vocational Education)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Journal has been indexed by: