Web Scraping hepsiburada Part-2: parallel & scalable analyzer with multiple windows

kommradHomer
3 min readMar 5, 2020

So, with some encouraging from a friend , after a rough-edge system requirements calculations, and some curiosity, I decided to add scaling capacity to the code base , using multiple browser windows. You can find the new modules , and the old ones here on GITHUB , and part-1 here.

I added 2 new modules , parallel-starter and parallel-analyzer .

parallel-starter , is taking the same old output of get-urls module as input , and splitting it to a number of input files , depending on the CONCURRENCY_LEVEL set. And later , using the threading module on python , starting new threads of parallel-analyzer , again according to the CONCURRENCY_LEVEL set. After starting a number of threads, it waits for all of them to finish , and then combine their outputs in a single CSV file. I was expecting the thread management to be a little tricky , but the threading module really does well , at least on my simple scenario.

So how is the resource consumption ? I can say that , for running 4 windows at once , hence finishing the analysis of the whole list 4 times quicker , you roughly need 4 times the Memory and CPU. Running multiple windows doesn’t help much with the Memory consumption of the browser processes. The whole memory print , between the python modules, chromedriver and chrome , seemed to tally up at 2.5GB , after running for 5 minutes.You can see the release of memory on free output, as megabytes.

Though , I stumbled upon something , where one of the browser processes is using significantly less memory than the other 3 , i don’t know if it’s just a misinterpretation of me or the htop , or is there some implicit explanation.

You can find the new modules , and the old ones here on GITHUB. I think my next move will be using multiple tabs in one browser window and I definitely expect savings on memory usage this time !

--

--

kommradHomer

proud seeder of 146.5GB The.Lord.of.the.Rings.Trilogy.1080p.Extended.Complete.Bluray.DTS-HD-6.1.x264-Grym