Open the Jupyter Notebook in the starter code folder named part_1_mars_news.ipynb
. You will work in this code as you follow the steps below to scrape the Mars News website.
Use automated browsing to visit the Mars news siteLinks to an external site.. Inspect the page to identify which elements to scrape.
Create a Beautiful Soup object and use it to extract text elements from the website.
Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:
Store each title-and-preview pair in a Python dictionary and, give each dictionary two keys: title
and preview
. An example is the following:
{'title': "NASA's MAVEN Observes Martian Light Show Caused by Major Solar Storm",
'preview': "For the first time in its eight years orbiting Mars, NASA’s MAVEN mission witnessed two different types of ultraviolet aurorae simultaneously, the result of solar storms that began on Aug. 27."}
Store all the dictionaries in a Python list.
Print the list in your notebook.
Optionally, store the scraped data in a file (to ease sharing the data with others). To do so, export the scraped data to a JSON file.
Open the Jupyter Notebook in the starter code folder named part_2_mars_weather.
. You will work in this code as you follow the steps below to scrape and analyse Mars weather data.
Use automated browsing to visit the Mars Temperature Data SiteLinks to an external site.. Inspect the page to identify which elements to scrape. Note that the URL is https://static.bc-edx.com/
.
Create a Beautiful Soup object and use it to scrape the data in the HTML table. Note that this can also be achieved by using the Pandas read_html
function. However, use Beautiful Soup here to continue sharpening your web scraping skills.
Assemble the scraped data into a Pandas DataFrame. The columns should have the same headings as the table on the website. Here’s an explanation of the column headings:
id
: the identification number of a single transmission from the Curiosity roverterrestrial_date
: the date on Earthsol
: the number of elapsed sols (Martian days) since Curiosity landed on Marsls
: the solar longitudemonth
: the Martian monthmin_temp
: the minimum temperature, in Celsius, of a single Martian day (sol)pressure
: The atmospheric pressure at Curiosity's locationExamine the data types that are currently associated with each column. If necessary, cast (or convert) the data to the appropriate datetime
, int
, or float
data types.
Analyse your dataset by using Pandas functions to answer the following questions:
Export the DataFrame to a CSV file.