Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes

Gi Joon Chang; Seoyoon Choi; Gyeongmin Han; Heuiseo Kim; Inselbag Lee

doi:doi:10.11648/j.ijsts.20210906.14

| Peer-Reviewed

Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes

Gi Joon Chang, Seoyoon Choi, Gyeongmin Han, Heuiseo Kim, Inselbag Lee

Published in International Journal of Science, Technology and Society (Volume 9, Issue 6)

Received: 23 September 2021 Accepted: 8 November 2021 Published: 19 November 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

This paper offers insight to the COVID-19 pandemic and its effect on people's attitudes towards certain minority groups, particularly Asians, Asian-Americans, and Pacific Islanders. With the Coronavirus first being identified in Wuhan, China, xenophobia, and racism towards groups pertaining to the supposed origins of the COVID-19 pandemic have been on the rise. Along with the violent physical attacks on these groups, this paper will focus on the online hate and xenophobia that Asians face due to their race, ethnicity, country of origin, and/or others. In this paper, Python is employed as the primary programming language; external libraries such as pandas, NumPy, sklearn, WordCloud, and matplotlib are imported for handling data. In analyzing the racism against Asians, keywords such as “Asian Hate,” “Hate Crime” and “anti-Asian” are utilized, and the Python programming language is employed to sift through Google News articles with these keywords and identify patterns in the words’ usages. Furthermore, the frequencies of the keywords’ usages on online platforms such as Twitter are also analyzed in the form of comma-separated files, with patterns of usage over time before and after the COVID-19 pandemic began being identified. Randomly selected tweets are classified into five categories: anti-Asian, not anti-Asian, not English, hate against others racial groups, and support towards Asians. These tweets are classified by artificial intelligence using machine learning methods of logistic regression, support vector machine, and Naive Bayes; the artificial intelligence was taught using pre-classified data sets. Classified tweets represent the implication and relevance between the tweets and xenophobia. This classification model of xenophobia is expected to be used in social media content censoring and enhance the internet chatting etiquette. The goal of this classification model is to terminate anti-Asian hatred and lower the overall level of societal racism.

Published in	International Journal of Science, Technology and Society (Volume 9, Issue 6)
DOI	10.11648/j.ijsts.20210906.14
Page(s)	281-288
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Asian Hate, COVID-19, Xenophobia, Racism, Online Hate

References

[1]	Gover, A., Harper, S., & Langton, L. (2020). Anti-Asian hate crime during the COVID-19 pandemic: Exploring the reproduction of inequality. American Journal of Criminal Justice, 45 (4), 647-667.
[2]	Brendan Lantz, and Marin R. Wenger. (2021, August). Are Asian Victims Less Likely to Report Hate Crime Victimization to the Police? Implications for Research and Policy in the Wake of the COVID-19 Pandemic, Crime & Delinquency (CAD).
[3]	Hitman, Gadi & Harel, Dror. (2016). Hate Crimes—Methodological, Theoretical & Empirical Difficulties—A Pragmatic & Legal Overview. Journal of Cultural and Religious Studies. 4. 10.17265/2328-2177/2016.01.001..
[4]	Tavernise, S., & Oppel, R. A. (2020, March 23). Spit On, Yelled At, Attacked: Chinese-Americans Fear for Their Safety. The New York Times. https://www.nytimes.com/2020/03/23/us/chinese-coronavirus-racist-attacks.html.
[5]	Martin, A. (2021, July 15). Why is it so difficult to stop abuse on social media? Sky News. https://news.sky.com/story/why-is-it-so-difficult-to-stop-abuse-on-social-media-12354192.
[6]	Shimizu, K. (2020, February 11). 2019-nCoV, fake news, and racism. The Lancet. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30357-3/fulltext.
[7]	Afolabi, Oyeronke & Holder, Raymond. (2021). Social Media and Racism in 21 st Century America: A Case Study of Twitter. Merriam-Webster. (n.d.). Xenophobia vs. racism: Explaining the difference. Merriam-Webster.
[8]	AJMC. (2021, January 2). A Timeline of COVID-19 developments in 2020. AJMC. https://www.ajmc.com/view/a-timeline-of-COVID19-developments-in-2020.
[9]	Anderson, M. (2020, August 20). Social media conversations about race. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2016/08/15/social-media-conversations-about-race/.
[10]	https://www.MachineLearningMastery. (2020, April 7). 4 Types of Classification Tasks in Machine Learning. Retrieved August 5, 2021, from Machine Learning Mastery website: https://machinelearningmastery.com/types-of-classification-in-machine-learning/.
[11]	Rohith Gandhi. (2018, June 7). Support Vector Machine — Introduction to Machine Learning Algorithms. Retrieved August 5, 2021, from Medium website: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
[12]	Peng, Joanne & Lee, Kuk & Ingersoll, Gary. (2002). An Introduction to Logistic Regression Analysis and Reporting. Journal of Educational Research - J EDUC RES. 96. 3-14. 10.1080/00220670209598786.
[13]	Kaviani, Pouria & Dhotre, Sunita. (2017). Short Survey on Naive Bayes Algorithm. International Journal of Advance Research in Computer Science and Management. 04.
[14]	Wibawa, Aji & Kurniawan, Ahmad & Murti, Della & Adiperkasa, Risky Perdana & Putra, Sandika & Kurniawan, Sulton & Nugraha, Youngga. (2019). Naïve Bayes Classifier for Journal Quartile Classification. International Journal of Recent Contributions from Engineering, Science & IT (iJES).
[15]	Rish, Irina. (2001). An Empirical Study of the Naïve Bayes Classifier. IJCAI 2001 Work Empir Methods Artif Intell. 3.

Cite This Article

Plain Text BibTeX RIS

APA Style

Gi Joon Chang, Seoyoon Choi, Gyeongmin Han, Heuiseo Kim, Inselbag Lee. (2021). Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. International Journal of Science, Technology and Society, 9(6), 281-288. https://doi.org/10.11648/j.ijsts.20210906.14

Copy | Download

ACS Style

Gi Joon Chang; Seoyoon Choi; Gyeongmin Han; Heuiseo Kim; Inselbag Lee. Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. Int. J. Sci. Technol. Soc. 2021, 9(6), 281-288. doi: 10.11648/j.ijsts.20210906.14

Copy | Download

AMA Style

Gi Joon Chang, Seoyoon Choi, Gyeongmin Han, Heuiseo Kim, Inselbag Lee. Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. Int J Sci Technol Soc. 2021;9(6):281-288. doi: 10.11648/j.ijsts.20210906.14

Copy | Download

@article{10.11648/j.ijsts.20210906.14,
  author = {Gi Joon Chang and Seoyoon Choi and Gyeongmin Han and Heuiseo Kim and Inselbag Lee},
  title = {Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes},
  journal = {International Journal of Science, Technology and Society},
  volume = {9},
  number = {6},
  pages = {281-288},
  doi = {10.11648/j.ijsts.20210906.14},
  url = {https://doi.org/10.11648/j.ijsts.20210906.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsts.20210906.14},
  abstract = {This paper offers insight to the COVID-19 pandemic and its effect on people's attitudes towards certain minority groups, particularly Asians, Asian-Americans, and Pacific Islanders. With the Coronavirus first being identified in Wuhan, China, xenophobia, and racism towards groups pertaining to the supposed origins of the COVID-19 pandemic have been on the rise. Along with the violent physical attacks on these groups, this paper will focus on the online hate and xenophobia that Asians face due to their race, ethnicity, country of origin, and/or others. In this paper, Python is employed as the primary programming language; external libraries such as pandas, NumPy, sklearn, WordCloud, and matplotlib are imported for handling data. In analyzing the racism against Asians, keywords such as “Asian Hate,” “Hate Crime” and “anti-Asian” are utilized, and the Python programming language is employed to sift through Google News articles with these keywords and identify patterns in the words’ usages. Furthermore, the frequencies of the keywords’ usages on online platforms such as Twitter are also analyzed in the form of comma-separated files, with patterns of usage over time before and after the COVID-19 pandemic began being identified. Randomly selected tweets are classified into five categories: anti-Asian, not anti-Asian, not English, hate against others racial groups, and support towards Asians. These tweets are classified by artificial intelligence using machine learning methods of logistic regression, support vector machine, and Naive Bayes; the artificial intelligence was taught using pre-classified data sets. Classified tweets represent the implication and relevance between the tweets and xenophobia. This classification model of xenophobia is expected to be used in social media content censoring and enhance the internet chatting etiquette. The goal of this classification model is to terminate anti-Asian hatred and lower the overall level of societal racism.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes
AU  - Gi Joon Chang
AU  - Seoyoon Choi
AU  - Gyeongmin Han
AU  - Heuiseo Kim
AU  - Inselbag Lee
Y1  - 2021/11/19
PY  - 2021
N1  - https://doi.org/10.11648/j.ijsts.20210906.14
DO  - 10.11648/j.ijsts.20210906.14
T2  - International Journal of Science, Technology and Society
JF  - International Journal of Science, Technology and Society
JO  - International Journal of Science, Technology and Society
SP  - 281
EP  - 288
PB  - Science Publishing Group
SN  - 2330-7420
UR  - https://doi.org/10.11648/j.ijsts.20210906.14
AB  - This paper offers insight to the COVID-19 pandemic and its effect on people's attitudes towards certain minority groups, particularly Asians, Asian-Americans, and Pacific Islanders. With the Coronavirus first being identified in Wuhan, China, xenophobia, and racism towards groups pertaining to the supposed origins of the COVID-19 pandemic have been on the rise. Along with the violent physical attacks on these groups, this paper will focus on the online hate and xenophobia that Asians face due to their race, ethnicity, country of origin, and/or others. In this paper, Python is employed as the primary programming language; external libraries such as pandas, NumPy, sklearn, WordCloud, and matplotlib are imported for handling data. In analyzing the racism against Asians, keywords such as “Asian Hate,” “Hate Crime” and “anti-Asian” are utilized, and the Python programming language is employed to sift through Google News articles with these keywords and identify patterns in the words’ usages. Furthermore, the frequencies of the keywords’ usages on online platforms such as Twitter are also analyzed in the form of comma-separated files, with patterns of usage over time before and after the COVID-19 pandemic began being identified. Randomly selected tweets are classified into five categories: anti-Asian, not anti-Asian, not English, hate against others racial groups, and support towards Asians. These tweets are classified by artificial intelligence using machine learning methods of logistic regression, support vector machine, and Naive Bayes; the artificial intelligence was taught using pre-classified data sets. Classified tweets represent the implication and relevance between the tweets and xenophobia. This classification model of xenophobia is expected to be used in social media content censoring and enhance the internet chatting etiquette. The goal of this classification model is to terminate anti-Asian hatred and lower the overall level of societal racism.
VL  - 9
IS  - 6
ER  -

Copy | Download

Author Information

Gi Joon Chang

Big Heart Christian School, YongIn, South Korea
Seoyoon Choi

Seoul International School, Seongnam, South Korea
Gyeongmin Han

Cardigan Mountain School, Canaan, United States
Heuiseo Kim

Palisades Park High School, Palisades Park, United States
Inselbag Lee

St. Mark’s School, Southborough, United States

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Gi Joon Chang, Seoyoon Choi, Gyeongmin Han, Heuiseo Kim, Inselbag Lee. (2021). Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. International Journal of Science, Technology and Society, 9(6), 281-288. https://doi.org/10.11648/j.ijsts.20210906.14

Copy | Download

ACS Style

Gi Joon Chang; Seoyoon Choi; Gyeongmin Han; Heuiseo Kim; Inselbag Lee. Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. Int. J. Sci. Technol. Soc. 2021, 9(6), 281-288. doi: 10.11648/j.ijsts.20210906.14

Copy | Download

AMA Style

Gi Joon Chang, Seoyoon Choi, Gyeongmin Han, Heuiseo Kim, Inselbag Lee. Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes. Int J Sci Technol Soc. 2021;9(6):281-288. doi: 10.11648/j.ijsts.20210906.14

Copy | Download

@article{10.11648/j.ijsts.20210906.14,
  author = {Gi Joon Chang and Seoyoon Choi and Gyeongmin Han and Heuiseo Kim and Inselbag Lee},
  title = {Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes},
  journal = {International Journal of Science, Technology and Society},
  volume = {9},
  number = {6},
  pages = {281-288},
  doi = {10.11648/j.ijsts.20210906.14},
  url = {https://doi.org/10.11648/j.ijsts.20210906.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsts.20210906.14},
  abstract = {This paper offers insight to the COVID-19 pandemic and its effect on people's attitudes towards certain minority groups, particularly Asians, Asian-Americans, and Pacific Islanders. With the Coronavirus first being identified in Wuhan, China, xenophobia, and racism towards groups pertaining to the supposed origins of the COVID-19 pandemic have been on the rise. Along with the violent physical attacks on these groups, this paper will focus on the online hate and xenophobia that Asians face due to their race, ethnicity, country of origin, and/or others. In this paper, Python is employed as the primary programming language; external libraries such as pandas, NumPy, sklearn, WordCloud, and matplotlib are imported for handling data. In analyzing the racism against Asians, keywords such as “Asian Hate,” “Hate Crime” and “anti-Asian” are utilized, and the Python programming language is employed to sift through Google News articles with these keywords and identify patterns in the words’ usages. Furthermore, the frequencies of the keywords’ usages on online platforms such as Twitter are also analyzed in the form of comma-separated files, with patterns of usage over time before and after the COVID-19 pandemic began being identified. Randomly selected tweets are classified into five categories: anti-Asian, not anti-Asian, not English, hate against others racial groups, and support towards Asians. These tweets are classified by artificial intelligence using machine learning methods of logistic regression, support vector machine, and Naive Bayes; the artificial intelligence was taught using pre-classified data sets. Classified tweets represent the implication and relevance between the tweets and xenophobia. This classification model of xenophobia is expected to be used in social media content censoring and enhance the internet chatting etiquette. The goal of this classification model is to terminate anti-Asian hatred and lower the overall level of societal racism.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Applying Machine Learning Models to Classify Xenophobic Tweets Against Asians, With Data Analysis of Hate Crimes
AU  - Gi Joon Chang
AU  - Seoyoon Choi
AU  - Gyeongmin Han
AU  - Heuiseo Kim
AU  - Inselbag Lee
Y1  - 2021/11/19
PY  - 2021
N1  - https://doi.org/10.11648/j.ijsts.20210906.14
DO  - 10.11648/j.ijsts.20210906.14
T2  - International Journal of Science, Technology and Society
JF  - International Journal of Science, Technology and Society
JO  - International Journal of Science, Technology and Society
SP  - 281
EP  - 288
PB  - Science Publishing Group
SN  - 2330-7420
UR  - https://doi.org/10.11648/j.ijsts.20210906.14
AB  - This paper offers insight to the COVID-19 pandemic and its effect on people's attitudes towards certain minority groups, particularly Asians, Asian-Americans, and Pacific Islanders. With the Coronavirus first being identified in Wuhan, China, xenophobia, and racism towards groups pertaining to the supposed origins of the COVID-19 pandemic have been on the rise. Along with the violent physical attacks on these groups, this paper will focus on the online hate and xenophobia that Asians face due to their race, ethnicity, country of origin, and/or others. In this paper, Python is employed as the primary programming language; external libraries such as pandas, NumPy, sklearn, WordCloud, and matplotlib are imported for handling data. In analyzing the racism against Asians, keywords such as “Asian Hate,” “Hate Crime” and “anti-Asian” are utilized, and the Python programming language is employed to sift through Google News articles with these keywords and identify patterns in the words’ usages. Furthermore, the frequencies of the keywords’ usages on online platforms such as Twitter are also analyzed in the form of comma-separated files, with patterns of usage over time before and after the COVID-19 pandemic began being identified. Randomly selected tweets are classified into five categories: anti-Asian, not anti-Asian, not English, hate against others racial groups, and support towards Asians. These tweets are classified by artificial intelligence using machine learning methods of logistic regression, support vector machine, and Naive Bayes; the artificial intelligence was taught using pre-classified data sets. Classified tweets represent the implication and relevance between the tweets and xenophobia. This classification model of xenophobia is expected to be used in social media content censoring and enhance the internet chatting etiquette. The goal of this classification model is to terminate anti-Asian hatred and lower the overall level of societal racism.
VL  - 9
IS  - 6
ER  -

Copy | Download