我怎样才能用承诺重写这个?

2024-01-21

我正在为 T 恤网站构建内容抓取器。

目标是仅通过一个硬编码的 url 进入网站:http://shirts4mike.com

然后,我将找到每件 T 恤的所有产品页面,然后创建一个包含其详细信息的对象。然后将其添加到数组中。

当数组填满 T 恤时,我将处理该数组并将其记录到 CSV 文件中。

现在,我在请求/响应和函数调用的时间安排上遇到了一些麻烦。

如何确保在正确的时间调用 NEXT 函数?我知道它不起作用,因为它是异步的。

我怎样才能打电话secondScrape, lastScraper and convertJson2Csv在正确的时间,以便他们正在使用的变量不是未定义的?

我尝试使用诸如response.end()但这不起作用。

我假设我需要使用承诺才能使其正常工作?并且要清晰易读?

有任何想法吗?我的代码如下:

//Modules being used:
var cheerio = require('cheerio');
var request = require('request');
var moment = require('moment');

//hardcoded url
var url = 'http://shirts4mike.com/';

//url for tshirt pages
var urlSet = new Set();

var remainder;
var tshirtArray;


// Load front page of shirts4mike
request(url, function(error, response, html) {
    if(!error && response.statusCode == 200){
        var $ = cheerio.load(html);

    //iterate over links with 'shirt'
        $("a[href*=shirt]").each(function(){
            var a = $(this).attr('href');

            //create new link
            var scrapeLink = url + a;

            //for each new link, go in and find out if there is a submit button. 
            //If there, add it to the set
            request(scrapeLink, function(error,response, html){
                if(!error && response.statusCode == 200) {
                    var $ = cheerio.load(html);

                    //if page has a submit it must be a product page
                    if($('[type=submit]').length !== 0){

                        //add page to set
                        urlSet.add(scrapeLink);

                    } else if(remainder === undefined) {
                        //if not a product page, add it to remainder so it another scrape can be performed.
                        remainder = scrapeLink;                     
                    }
                }
            });
        });     
    }
    //call second scrape for remainder
    secondScrape();
});


function secondScrape() {
    request(remainder, function(error, response, html) {
        if(!error && response.statusCode == 200){
            var $ = cheerio.load(html);

            $("a[href*=shirt]").each(function(){
                var a = $(this).attr('href');

                //create new link
                var scrapeLink = url + a;

                request(scrapeLink, function(error,response, html){
                    if(!error && response.statusCode == 200){

                        var $ = cheerio.load(html);

                        //collect remaining product pages and add to set
                        if($('[type=submit]').length !== 0){
                            urlSet.add(scrapeLink);
                        }
                    }
                });
            });     
        }
    });
    console.log(urlSet);
    //call lastScraper so we can grab data from the set (product pages)
    lastScraper();
};



function lastScraper(){
    //scrape set, product pages
    for(var i = 0; i < urlSet.length; i++){
        var url = urlSet[i];

        request(url, function(error, response, html){
            if(!error && response.statusCode == 200){
                var $ = cheerio.load(html);

                //grab data and store as variables
                var price = $('.price').text();
                var img = $('.shirt-picture').find("img").attr("src");
                var title = $('body').find(".shirt-details > h1").text().slice(4);

                var tshirtObject = {};
                //add values into tshirt object

                tshirtObject.price = price;
                tshirtObject.img = img;
                tshirtObject.title = title;
                tshirtObject.url = url;
                tshirtObject.date = moment().format('MMMM Do YYYY, h:mm:ss a');

                //add the object into the array of tshirts
                tshirtArray.push(tshirtObject); 
            }
        });
    }
    //call function to iterate through tshirt objects in array in order to convert to JSON, then into CSV to be logged
    convertJson2Csv();
};

有一个 npm 模块叫做请求-承诺 https://www.npmjs.com/package/request-promise.

simply:

var rp = require("request-promise");

无论您在哪里提出请求,都可以使用请求-承诺进行切换。

例如:

rp(url)
.then(function(value){
  //do whatever
})
.catch(function(err){
  console.log(err)
})
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

我怎样才能用承诺重写这个? 的相关文章

随机推荐