Behind the scenes of social networking apps are machine learning models that classify nodes based on the data they contain about users including education, location, or political affiliation. The models use these classifications to recommend people and pages to each user. But bias exists in the recommendations made, as they rely on user features that are highly related to sensitive attributes such as gender or skin color.
Researchers at the Penn State College of Information Sciences and Technology developed a novel framework that estimates sensitive attributes to help make fair recommendations.
The team found that their model, called FairGNN, maintains high performance on node classification using limited, user-supplied sensitive information, while at the same time reducing bias. They trained their model with two real-world datasets: user profiles on Pokec, a popular social network in Slovakia; and a dataset of approximately 400 NBA basketball players. In the Pokec dataset, they treated the region in which each user was from as the sensitive attribute and set the classification task to predict the working field of the users. In the NBA data, they identified players as those in the U.S. and those overseas, using location as the sensitive attribute with the classification task to predict whether the salary of each player is over the median.
They then used the same datasets to test their model, first evaluating FairGNN in terms of fairness and classification performance. Then, they performed “ablation studies” to test the significance of each component to the overall system. They then tested whether FairGNN is effective when different amounts of sensitive attributes are provided in the training set.
The research experiment shows that in terms of fairness, they can make the model much more fair. The findings could be useful in job applicant rankings, crime detection or in financial loan applications, domains where we don’t want to introduce bias.