Navigating On The Web Of Data
Mining And Exploring The Data Landscape …
There are oceans of information out there, and there are segregated seas and webs. The kaleidoscope is an instrument which designers sometimes use in searching for new patterns. The tooling for data acquisition and processing can be that kaleidoscope to help us explore those colorful spectrum of perspectives — once blinded to us — now clearly before our sight. Data acquisition is the first and foremost fundamental before refinement for insights. How to turn data into information, and from information to insights for well-informed decisions is within the chapters of this book.
I get requests from friends on how to obtain information for processing and analytics, hence, here are some notes which I can share.
Getting data from a page:
import requests
import sys
from bs4 import BeautifulSoup
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
proxies = {'http': 'http://127.0.0.1:8080', 'https': 'http://127.0.0.1:8080'}
url_target = 'lab-01/lab-01.txt'
file_with_path = './sample_data/list.txt'
site_main = 'https://raw.githubusercontent.com/<user_ID>/main/'
def read_urls_from_file_to_list(file_with_path):
with open(file_with_path) as f:
list_of_urls = f.read().splitlines()
while("" in list_of_urls)…