About this project

Project title: Prototype of a tool for generating text and structures for social networks
Call no. 02, Programme: 01.2.1-MITA-T-852 Inostartas

The entry into force of the General Data Protection Regulation (GDPR) on 25 May 2018, which imposes strict requirements on the collection, use and protection of personal data, has had an impact not only on companies and organisations that manage personal data in one form or another, but also on research and experimental development opportunities, as, even if such data is available in public sources (search engines, social networking platforms, etc.), it may not be used for scientific or other research purposes if it has not been properly anonymised. Automating this task is a complex task, which often involves starting from scratch, as annotated corpora suitable for targeted research scenarios are rare. The automated anonymisation task is even more difficult for social network data, due to the limitations of collecting such data in general: it is difficult to access, protected by privacy restrictions (especially for content on Facebook) that prevent the collection, sharing and dissemination of social network data.

The inaccessibility of the required data limits the potential of artificial intelligence technologies, as they require large amounts of data. To address this bottleneck, synthetically generated data is proposed as an alternative to real data, which would reflect the characteristics of real data (i.e. data created by people in social networks), but which could be used for research purposes without compromising the privacy of social network users. The large synthetic datasets generated can be further used to train artificial intelligence models for various purposes. One possible application is the prevention of cyber and terrorist attacks by training models to recognise the signs of such attacks on social media.

The aim of the project is to develop a methodology and a prototype for generating synthetic social network data (messages and network structures). The prototype allows the generation of data that can be used for the creation of artificial intelligence models and for the simulation of various events. The tool allows the generation of different styles of messages, dynamic changes in the network structure, simulated dialogues, discussions, real speakers and robots, and fake news. The data generated by such a tool can be used to evaluate the effectiveness of various artificial intelligence techniques on data that does not exist or is inaccessible, to assess the spread of information in the case of advertising campaigns or propaganda, and to assess the spread of fake news.

Apie projektą

Project title: Prototype of a tool for generating text and structures for social networks
Call no. 02, Programme: 01.2.1-MITA-T-852 Inostartas

The entry into force of the General Data Protection Regulation (GDPR) on 25 May 2018, which imposes strict requirements on the collection, use and protection of personal data, has had an impact not only on companies and organisations that manage personal data in one form or another, but also on research and experimental development opportunities, as, even if such data is available in public sources (search engines, social networking platforms, etc.), it may not be used for scientific or other research purposes if it has not been properly anonymised. Automating this task is a complex task, which often involves starting from scratch, as annotated corpora suitable for targeted research scenarios are rare. The automated anonymisation task is even more difficult for social network data, due to the limitations of collecting such data in general: it is difficult to access, protected by privacy restrictions (especially for content on Facebook) that prevent the collection, sharing and dissemination of social network data.

The inaccessibility of the required data limits the potential of artificial intelligence technologies, as they require large amounts of data. To address this bottleneck, synthetically generated data is proposed as an alternative to real data, which would reflect the characteristics of real data (i.e. data created by people in social networks), but which could be used for research purposes without compromising the privacy of social network users. The large synthetic datasets generated can be further used to train artificial intelligence models for various purposes. One possible application is the prevention of cyber and terrorist attacks by training models to recognise the signs of such attacks on social media.

The aim of the project is to develop a methodology and a prototype for generating synthetic social network data (messages and network structures). The prototype allows the generation of data that can be used for the creation of artificial intelligence models and for the simulation of various events. The tool allows the generation of different styles of messages, dynamic changes in the network structure, simulated dialogues, discussions, real speakers and robots, and fake news. The data generated by such a tool can be used to evaluate the effectiveness of various artificial intelligence techniques on data that does not exist or is inaccessible, to assess the spread of information in the case of advertising campaigns or propaganda, and to assess the spread of fake news.