Очистка новостей Google с помощью lxml и python

Я пытаюсь очистить новости Google, используя python и lxml. Все идет хорошо, но когда я пытаюсь распечатать данные каждого div с помощью цикла for, все портится. Вот мой код:

# -*- coding: utf-8 -*-

from stem import Signal
from stem.control import Controller
from lxml import html
from lxml import cssselect
from lxml import etree
import requests

proxies = {
    'http' : 'http://127.0.0.1:8123'
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

url = "https://www.google.it/search?hl=en&tbm=nws&as_occt=any&tbs=cdr:1,cd_min:9/1/2014,cd_max:9/1/2014,sbd:1&as_nsrc=Daily%20Mail&start=0"

page = requests.get(url,proxies=proxies,headers=headers)
tree = html.fromstring(page.content)
results = tree.xpath('//div[@class="_cnc"]')

for div in results:
    print(div)

Я получаю этот вывод:

<Element div at 0x7f4154df9470>
<Element div at 0x7f4154df94c8>
<Element div at 0x7f4154df9520>
<Element div at 0x7f4154df9578>
<Element div at 0x7f4154df95d0>
<Element div at 0x7f4154df9628>
<Element div at 0x7f4154df9680>
<Element div at 0x7f4154df96d8>
<Element div at 0x7f4154df9730>
<Element div at 0x7f4154df9788>

Я хотел бы извлечь из каждого div -> title, href и snippet примерно следующее:

....

for div in results:
    title = div.xpath('//a[@class="l _HId"]/text()')
    href = div.xpath('//a[@class="l _HId"]/@href')
    snippet = div.xpath('//div[@class="st"]/text()')
    #for example
    print(title)
....

Когда я пытаюсь распечатать, я получаю один и тот же множественный вывод:

['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']

Кто-нибудь знает, что не так с моим кодом?


person JJack_    schedule 05.03.2016    source источник


Ответы (1)


Почти готово — просто добавьте точки к внутренним выражениям XPath, чтобы сделать их специфичными для контекста текущего узла:

for div in results:
    title = div.xpath('.//a[@class="l _HId"]/text()')
    href = div.xpath('.//a[@class="l _HId"]/@href')
    snippet = div.xpath('.//div[@class="st"]/text()')
    #for example
    print(title)
person alecxe    schedule 05.03.2016