J'essaie de gratter plusieurs liens contenant des informations sur des événements. Je fais tourner mes proxys payants et mes agents utilisateurs générés par la bibliothèque UserAgent. Imperva, qui nécessite une adresse IP américaine, est si sensible que même elle n'autorise pas l'événement de mon navigateur si j'utilise un proxy américain gratuit !
J'ai posé cette question dans un canal Discord lié au scrapiping. Quelqu'un m'a contacté et m'a dit qu'il était possible de contourner Imperva mais il ne peut pas me dire comment parce qu'il ne veut pas de moi comme concurrent sur le marché du grattage de billets:(
En plus des agents utilisateurs et des proxys, j'ai essayé d'imiter les en-têtes de requête réussis du navigateur, mais cela n'a pas fonctionné. J'ai juste des 405 et des 403. Je vais essayer de gratter la section des événements mais je n'ai même pas pu voir une réponse 200 pour l'un des 27 liens que j'ai (j'en ai ajouté quelques-uns ci-dessous)
Comment pensez-vous qu'Imperva pourrait être contourné avec Scrapy ou Requests ? Vous pouvez également me recommander une ressource académique que je peux étudier pour développer mes compétences Scrapy.
Certains des liens que j'essaie de gratter
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
Mon code spider qui est composé d'une classe pour importer mes procurations à partir d'un fichier et du code spider proprement dit. J'ajoute mon proxy en tant que méta-valeur comme indiqué dans la documentation de Scrapy. J'utilise des délais de téléchargement :
import scrapy
from scrapy import Request
from random_user_agent.user_agent import UserAgent
import random
import pandas as pd
class ProxyFunctions:
(...)
class AlexSpider(scrapy.Spider):
name = 'alex'
s = ProxyFunctions()
s.prox_list_fixer() #proxylerin bulunduğu txt'yi düzelip yeni bir txt oluşturdu.
proxies = s.imp_proxies()
def __init__(self):
self.root = "https://partnercarrier.com"
self.start_url = "https://partnercarrier.com/PA/"
#self.initial_links = self.imp_links() dosyadan tüm linkler eklendiğinde kullanılacak
user_agent_rotator = UserAgent(software_names=['chrome'], operating_systems=['windows', 'linux'])
self.user_agents = user_agent_rotator.get_user_agents()
#self.root_link = "https://www.google.com"
self.UA_rand = random.choice(self.user_agents)['user_agent'] #User Agent set
#self.UA_LIST = open("/home/draco/docs/scraping/scrapyyy/thomas/USER_AGENTS.txt","r") #manual UA importation from text
#dosyadaki proxy listesinden random proxy alır
def imp_randp(self, path="/home/draco/docs/scraping/scrapyyy/thomas/proxies.txt"):
with open (path) as PROXIES:
lines = PROXIES.readlines()
return random.choice(lines).strip()
#dosyadan linkleri alır
def imp_links(self, path="/home/draco/docs/scraping/Selenium/inputs.csv"):
x = pd.read_csv(path)
links = x['Url']
links = [i for i in links]
return links
def start_requests(self):
print("INITIAL REQUEST")
links = self.imp_links()
for link in links:
print(f"---INFO: Requesting page=> {link}")
proxy = random.choice(self.proxies)
#print("---INFO: Using proxy => ", proxy)
h = {
'User-Agent': random.choice(self.user_agents)['user_agent'],
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host': link.split("/")[2],
'Sec-Fetch-Dest': 'document',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Mode': 'navigate',
'sec-ch-ua-platform': '"Linux"',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
}
b = 'groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode='
yield Request(
url = link,
callback = self.parse_gen,
headers = {"user-agent": random.choice(self.user_agents)['user_agent']},
meta = {"proxy": proxy},
body = b,
dont_filter= True
)
def parse_gen(self, response):
print("---INFO: General parser opened. PARSER1")
Sortie de mon terminal :
draco@draco:~/docs/scraping/scrapyyy/upwork$ scrapy crawl alex
https://umasstix.evenue.net
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: upwork)
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.13.0-35-generic-x86_64-with-glibc2.29
2022-03-20 20:23:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-20 20:23:01 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
'BOT_NAME': 'upwork',
'CONCURRENT_REQUESTS_PER_DOMAIN': 14,
'HTTPCACHE_ENABLED': True,
'NEWSPIDER_MODULE': 'upwork.spiders',
'SPIDER_MODULES': ['upwork.spiders']}
2022-03-20 20:23:01 [scrapy.extensions.telnet] INFO: Telnet Password: 7f185fdb1347847f
2022-03-20 20:23:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-20 20:23:05 [scrapy.core.engine] INFO: Spider opened
2022-03-20 20:23:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-20 20:23:05 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in /home/draco/docs/scraping/scrapyyy/upwork/.scrapy/httpcache
2022-03-20 20:23:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
INITIAL REQUEST
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
---INFO: General parser opened. PARSER1
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-20 20:23:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 15189,
'downloader/request_count': 27,
'downloader/request_method_count/GET': 27,
'downloader/response_bytes': 304575,
'downloader/response_count': 27,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/403': 16,
'downloader/response_status_count/405': 10,
'elapsed_time_seconds': 0.444587,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 3, 20, 17, 23, 6, 67887),
'httpcache/hit': 27,
'httperror/response_ignored_count': 26,
'httperror/response_ignored_status_count/403': 16,
'httperror/response_ignored_status_count/405': 10,
'log_count/DEBUG': 28,
'log_count/INFO': 36,
'memusage/max': 126562304,
'memusage/startup': 126562304,
'response_received_count': 27,
'scheduler/dequeued': 27,
'scheduler/dequeued/memory': 27,
'scheduler/enqueued': 27,
'scheduler/enqueued/memory': 27,
'start_time': datetime.datetime(2022, 3, 20, 17, 23, 5, 623300)}
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Spider closed (finished)
Solution du problème
i bypass imperva using real chrome browser using browser extension to automate the process and usa mobile proxy.
imperva checks followings,
Aucun commentaire:
Enregistrer un commentaire