The social network, a crucial part of our life is plagued by online impersonation and fake accounts. According to the ‘Community Standards Enforcement Report’ published by Facebook on March 2018, about 583 million fake accounts were taken down just in quarter 1 of 2018 and as many as 3-4% of its active accounts during this time were still fake. In this project, we propose a model that could be used to classify an account as fake or genuine. This model uses Support Vector Machine as a classification technique and can process a large dataset of accounts at once, eliminating the need to evaluate each account manually. The community of concern to us here is Fake Accounts and our problem can be said to be a classification or a clustering problem.
In the present generation, the social life of everyone has become associated with the online social networks. Adding new friends and keeping in contact with them and their updates has become easier. The online social networks have impact on the science, education, grassroots organizing, employment, business, etc. Researchers have been studying these online social networks to see the impact they make on the people. Teachers can reach the students easily through this making a friendly environment for the students to study, teachers nowadays are getting themselves familiar to these sites bringing online classroom pages, giving homework, making discussions, etc. which improves education a lot. The employers can use these social networking sites to employ the people who are talented and interested in the work, their background check can be done easily.
- Naive Bayes algorithm having less accuracy.
- Because of Privacy Issues the Facebook dataset is very limited and a lot of details are not made public.
The Application Domain of the following project was Community Detection. Community detection is key to understanding the structure of complex networks, and ultimately extracting useful information from them. In this project, we came up with a framework through which we can detect a fake profile using machine learning algorithms so that the social life of people become secured.
1. Classification starts from the selection of profile that needs to be classified.
2. Once the profile is selected, the useful features are extracted for the purpose of classification.
3. The extracted features are then fed to trained classifier.
4. Classifier is trained regularly as new data is fed into the classifier.
5. Classifier then determines whether the profile is genuine or fake.
6. The result of classification algorithm is then verified and feedback is fed back into the classifier.
7. As the number of training data increases the classifier becomes more and more accurate in predicting the fake profiles.
- The social networking sites are making our social lives better but nevertheless there are a lot of issues with using these social networking sites.
- The issues are privacy, online bullying, potential for misuse, trolling, etc. These are done mostly by using fake profiles.
- In this project, we came up with a framework through which we can detect a fake profile using machine learning algorithms so that the social life of people become secured.
SOFTWARE AND HARDWARE REQUIREMENTS
- OS – Windows 7, 8 and 10 (32 and 64 bit)
- RAM – 4GB
The model presented in this project demonstrates that Support Vector Machine (SVM) is an elegant and robust method for binary classification in a large dataset. Regardless of the non-linearity of the decision boundary, SVM is able to classify between fake and genuine profiles with a reasonable degree of accuracy (>90%). This method can be extended on any platform that needs binary classification to be deployed on public profiles for various purposes. This project uses only publicly available information which makes it convenient for organizations that want to avoid any breach of privacy, but organizations can also use private data available to them to further extend the capabilities of the proposed model.
Since we have limited data to train the classifier, our approach is facing a high variance problem which can be observed in the learning curve as follows High variance problems can usually be mitigated by increasing the size of the dataset which should not be of much concern to Social Networks Organizations which already have fairly large datasets