Summary
I was searching for a Valorant API to collect and analyse some of my own match data. However the developers are not providing personal API keys for this particular game.
There are some websites which are publishing the Valorant data into the public domain. I decided to scrape my own match data which is being hosted on their web clients.
Though the volume of data highlighted on the website is limited compared to the actual volume of games I’ve played. I was only able to pull 30 matches worth of data.
However, it was worth learning about the overall web-scraping process and adding more capabilities to my toolkit.
Link to the project here: Valorant Web Scrapping Project
Scrapy - Web Scraper
I used a python library called scrapy as my web-crawling tool and the website being interrogated was https://dak.gg/valorant/en/profile/FilPill-EUW
Initialising Scrapy Project
The command below creates all the starting files for building the web-crawler:
scrapy startproject val_scraper
Checking for 200 Response on Target Website
I initialised a scrapy shell with the command:
scrapy shell
Inside the shell, I make a request to the website to check that my scraper has permissions to scrape the website.
fetch ('https://dak.gg/valorant/en/profile/FilPill-EUW](https://dak.gg/valorant/en/profile/FilPill-EUW)a')
A successful connection will return a 200 response, otherwise any 4xx reponse code will inidicate an error or a lack of permissions to perform this specific operation.
Identifying Target For Scraping
My initial approach involved attempting to parse out the HTML classes within the div’s however it did not work successfully. Despite the data being kept inside the HTML tags,I could not access them with my scraper.
When I simulated the webscraper opening the website, it did not return any data at all. Just an empty HTML page with no data. Soonafter, I realised that javascript is being employed to dynamically load the data into the front-end.
Realising this occurrence, I changed my approach. On the inspect elements page, I naviagted to the XHR/Fetch network tab to view what kind of requests were being made to the website, I found an https request which returns my match data in JSON format.
The spider script shown below is what is being ustilised to scrape the match data from the page:
import scrapy
import json
class PlayerState(scrapy.Spider):
name = 'playerState'
start_urls = ['https://dak.gg/valorant/en/profile/FilPill-EUW']
headers = {
"accept":" application/json, text/plain, */*"
}
def parse(self, response):
url = 'https://val.dakgg.io/api/v1/accounts/JPnyLxsiavseiYbL8xtmWSuFRHdupX43u_hVynD5YScr2_Y32Wt2v5K-NvxvfDRWTL67AHdVSmoLTg/matches'
request = scrapy.Request(url,
callback = self.parse_api,
headers=self.headers)
yield request
def parse_api(self,response):
raw_data = response.body
data = json.loads(raw_data)
yield {
'matches':data['matches']
}
Running the spider
To run the spider you run the following command:
scrapy crawl playerState -O playerState.json
The playerState argument in the command is referring to the class defined in my spider.
The results are saved in a json file of my choosing.
Data Analysis
Since we are only pulling out 30 matches, we are fairly limited in generating valuable data visualisations.
However the goal was mainly to gain an understanding of the overall webscraping process.
These are some of the visualisations made using python: