Monday, June 17, 2019

Title: Leveraging C++ STL for Data Processing: A Deep Dive

The C++ Standard Library (STL) provides a set of common classes and interfaces, which includes a suite of algorithms, containers, and iterators that make data processing more efficient and readable. This blog post will walk you through two examples that showcase the power of STL and its different components like map, priority_queue, tuple, and pair.


Example 1: Word Frequency Count Using STL

In this example, we use STL to count the frequency of words in a vector of strings and print them in descending order.

The map STL is used to store each word (string) and its corresponding count (int). This container provides a convenient and efficient way to keep track of the frequency of each word. We then use a priority_queue to sort the words based on their frequency. The priority_queue is implemented as a heap where the top element is always the greatest, thus providing an efficient way to get the most frequent word. We use a tuple to store the count (frequency), word, and a pair of both for easier access.

The function wfreq(vector<string> &sv) takes a vector of strings as an argument, counts the frequency of each word using a map, and then stores the count and word in a priority_queue. The function finally returns a vector of pairs where each pair contains the word and its frequency.


#include<queue> 
#include<vector> 
#include<string> 
#include<map> 
#include<iostream> 
#include<tuple> 
 
 
using namespace std; 
 
vector<pair<int, string>> wfreq(vector<string> &sv)  
{ 
     vector<pair<int, string>> o; 
     map<string, int> m; 
     priority_queue<tuple<int, string, pair<int, string>>> pq; 
     for (auto x: sv)  m[x]++; 
     for (auto x: m) pq.push(make_tuple(x.second, x.first, make_pair(x.second, x.first))); 
     while(!pq.empty()) { 
        o.push_back(get<2>(pq.top())); pq.pop(); 
     } 
     return o; 
} 
 
int main(void) 
{ 
    vector<string> sv = { "abc", "bca","xyz", "xyz", "abc", "bcc", "abc" }; 
    vector<pair<int, string>> o = wfreq(sv); 
    for(auto x: o) cout << x.first <<","<<  x.second << endl; 
    return 0; 
}

Or Simply

 
vector<pair<int, string>> wfreq(vector<string> &sv)  
{ 
   map<string, int> mp;
   for ( auto &x: sv) mp[x] ++;
   vector<pair<string,int>> pv(mp.begin(), mp.end());
   sort(pv.begin(), pv.end(), 
        [](pair<string, int> &a, pair<string, int> &b)
        {     return a.second > b.second;            }
        );
    return pv;
}
     

Example 2: Student Ranking System Using STL and Functors

In this example, we create a student ranking system where we can sort students based on different fields like name and rank.

We create a Student class that contains information about the student's name, math score, physics score, total score, and rank. We then define a CmpFunctor class which is a virtual binary functor. This base functor is used to define two derived functors NameCmpFunctor and RankCmpFunctor that compare two Student objects based on the name and rank fields, respectively.

We then define two functions orderedbyNameField(vector<Student> &sc) and orderbyRankField(vector<Student> &sc, RankCmpFunctor &cf). The first function sorts the students based on their names, and the second function sorts the students based on their ranks. Both functions use a priority_queue to sort the students, and the functors defined earlier are used as the comparison function objects for the priority_queue.

In the main function, we create a vector of Student objects, sort them using the two functions defined earlier, and then print the sorted list of students.

#include<iostream>
#include<vector>
#include<string>
#include<queue>

using namespace std;

class Student{
    public:
 string name; 
 int math; 
 int phy; 
 int total;
 int rank; 
};

// Here we defined a virtual binary functor and its derived classes:
class CmpFunctor{
public:
       bool reverse = false;
       CmpFunctor(bool rev):reverse(rev){};
       virtual bool operator() (const Student &left, const Student &right) = 0;  
};

class NameCmpFunctor: public CmpFunctor{

public:
      NameCmpFunctor(bool rev=false):CmpFunctor(rev){};
      virtual bool operator() (const Student &left, const Student &right) {
          if (reverse == true) return left.name > right.name;  
          else return left.name < right.name;
      };
};

class RankCmpFunctor: public CmpFunctor{

public:
      RankCmpFunctor(bool rev=false):CmpFunctor(rev){};
      virtual bool operator() (const Student &left, const Student &right) {
          if (reverse == true ) return left.rank > right.rank;  
          else return left.rank < right.rank; 
      };
};

vector<Student> orderedbyNameField(vector<Student> &sc)
{
      vector<Student> st;
      // here to initialize functor with name reverse order is true
      priority_queue<Student, vector<Student>, NameCmpFunctor> pq(NameCmpFunctor(true));  
      for (auto x:  sc)   pq.push(x);
      while(!pq.empty()) {
          auto x = pq.top();
          st.push_back(x); 
          pq.pop();
      }
      return st;  
}

vector<Student> orderbyRankField(vector<Student> &sc, RankCmpFunctor &cf)
{
      vector<Student> st;
      priority_queue<Student, vector<Student>, RankCmpFunctor> pq(cf);
      for (auto x:  sc)   pq.push(x);
      while(!pq.empty()) {
          auto x = pq.top();
          st.push_back(x); 
          pq.pop();
      }
      return st;  
}


int main(void)
{
vector<Student> sc;

sc.push_back({"Ammy", 91, 85, 176, 1});
sc.push_back({"Claus", 92, 83,  175, 2});
sc.push_back({"Ann", 90, 80, 170, 3});
sc.push_back({"Bob",85, 80, 165, 4});

vector<Student> scn = orderedbyNameField(sc);
for (auto x: scn) {
cout << x.name <<","<< x.math <<"," << x.phy <<"," << x.total <<","<< x.rank << endl;
}
cout << endl;

RankCmpFunctor cf(true); // create a functor with reverse rank order is true
vector<Student> ns = orderbyRankField(sc, cf);

for (auto x: ns) {
cout << x.name <<","<< x.math <<"," << x.phy <<"," << x.total <<","<< x.rank << endl;
}
cout << endl;

RankCmpFunctor cfa(false);
vector<Student> nsa = orderbyRankField(sc, cfa); // create a functor with reverse rank order is false

for (auto x: nsa) {
cout << x.name <<","<< x.math <<"," << x.phy <<"," << x.total <<","<< x.rank << endl;
}

return 0;
}

Key Takeaway

These examples illustrate the power and flexibility of C++ STL in data processing. Whether you're working with simple data types or complex user-defined objects, STL provides efficient and easy-to-use tools to manage and process your data. The use of functors further enhances this ability, giving you the power to define custom comparison or operation methods that can be used with STL components.

Stay tuned for more deep dives into the world of C++ and happy coding!

No comments:

Post a Comment