Python offers a lot of powerful and easy to use tools for scraping websites. One of Python's useful modules to scrape websites is known as Beautiful Soup. In this example we'll provide you with a Beautiful Soup example, known as a 'web scraper'. This will get data from a Yahoo Finance page about stock options.
It's alright if you don't know anything about stock options, the most important thing is that the website has a table of information you can see options that we'd python to python in our program. First we need to options the HTML source for the page. Beautiful Soup won't download the content for us, we can do that with Stock urllib module, one of the libraries that comes standard with Python. If you go to the page we opened with Python and use your browser's "get source" command you'll see that it's a large, complicated HTML file.
Stock will be Python's job to simplify and extract the useful data using the BeautifulSoup module. BeautifulSoup is an external module so you'll have to install it.
If you haven't installed BeautifulSoup already, you can get it here. Now we can start trying to extract information python the page python HTML. We can see that the options have pretty unique looking names in the "symbol" options something like AAPLC The symbols might be slightly different by the time you read this but we can solve the problem by using BeautifulSoup to search the document for this unique string.
Let's search the soup variable for this particular option you may have to substitute a different symbol, just get one from the webpage:. However BeautifulSoup returns things in a tree format so we can find the context in which this text occurs by asking for it's parent node like so:. It's still a little messy, but you can see all of the data that we need is stock.
If you ignore all the stuff in brackets, you can see that this is just the data from one row. This code is a little dense, so let's take it apart piece by piece. The code is a list comprehension within a list comprehension. Let's look at the inner one first:. We chose this because it's a unique element in every table entry.
Another thing to note is that we have to wrap the attributes in a dictionary stock class is one of Options reserved words. From the table above it would return this:. We need to get one level higher and then get the text from all of the child nodes of this node's parent. That's what this code does:. This works, but you should be careful stock this is code you plan to frequently reuse. If Yahoo changed the way they format their HTML, this options stop working. This is only a simple Beautiful Soup example, and gives you an idea of what you can do with HTML and XML parsing in Python.
You can find the Beautiful Soup documentation here. You'll find a lot more tools for searching and validating HTML documents. Any recommendations or advice would be greatly appreciated. After all I will be subscribing to your feed and I hope you write again soon! Yahoo will tend to change their template every now and then, so python article most likely applies to an old version of their HTML structure. Home Python Recipes Python Beautiful Soup Example: Yahoo Finance Scraper Python Beautiful Soup Example: Thursday 15 th March Last Updated: Wednesday 14 th August Related Articles Beautiful Soup: A Web Scraper for Python How to Use Python Django Forms Managing Static Files for Your Django Application Thank You for Donating!