DIAlog AI: Part Four: Behind the Green Screen: Is it time to Pull the CORD-19 on CT-BERTie the Bus's Q&A? CT=Clinical-Trial & Covid-Twitter & Conspiracy-Theory
DIAl-a-Docs Dirty Dozen 10 to 12: CT-BERTie & CO went OS at Warp Speed for CO-V-ID. CO-Author of Modelling Interventions & Geo-tags with Emer's Prestigious Paris Pill Dispenser from 1 Step to 1 Health
Dial-a-Docs 10) Alphabetical A2I Descent through Time to the Origin-Al BERT from Current CO-V-ID CONstraint Open Source COpies to “Google AI Language” Coming out as BERT Q&A pre-trained & CO-dependant on the COrpus.
A) PRESENTED NOV 2022 UK Clinical Trial BERT
Eva-Lisa Meldau (Data Scientist): “Safety monitoring and signal detection for the novel COVID-19 vaccines Eva-Lisa Meldau (Data Scientist, Uppsala Monitoring Centre): Automated redaction of narratives from the UK Yellow Card Scheme using BERT” Nov 2022
Sources:
https://twitter.com/ISoPonline/status/1599863359804874752
https://isoponline.org/wp-content/uploads/2022/11/2022-ISoP-Boston-Seminar-final.pdf
B) Published JAN 2022 Received OCT 2021. Fake or real news about COVID-19? Pretrained transformer model to detect potential misleading news
“The primary goal of this paper is to educate society about the importance of accurate information and prevent the spread of fake information...
…categorize given tweets as either fake or real news…
..tested various deep learning models on the COVID-19 fake dataset. Finally, the CT-BERT and RoBERTa deep learning models outperformed other deep learning models like BERT, BERTweet, AlBERT, and DistlBERT…
…Fake news COVID-19 dataset In the COVID-19 outbreak (2020), Constraint@AAAI 2021 workshop organizers provided the COVID-19 fake news English dataset [38] with the id, tweet, label (“Fake” and “Real”) ….collected from tweets, instagram posts, facebook posts, press releases, or any other popular media content…
…Using the Twitter API, real news was gathered from potential real tweets. Official accounts such as the Indian Council of Medical Research (ICMR), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), Covid India Seva, and others may have real tweets. They give valuable COVID-19 information such as vaccine progress, dates, hotspots, government policies, and so on….
..demonstrate how to use a novel NLP application to detect real or fake COVID-19 tweets… assist individuals in avoiding hysteria about COVID-19 tweets… improvement of COVID-19 therapies and public health measures.”
Sources:
https://pubmed.ncbi.nlm.nih.gov/35039760
https://www.nitt.edu/home/academics/departments/ca/facultymca/alphonse/
C) Submitted DEC 2020 g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection
“In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English….
…using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models. ..models used, the ways of text preprocessing and adding extra data..
….best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams”
Sources:
https://arxiv.org/abs/2012.11967v3
https://arxiv.org/pdf/2012.11967v3.pdf
D) CONSTRAINT 2021 First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation Collocated with AAAI 2021 8th February, 2021CONSTRAINT Workshop !!!!
Source:
https://www.lcs2.in/CONSTRAINT-2021/
E) Submitted NOV 2020 Fighting an Infodemic: COVID-19 Fake News Dataset
“Real - Tweets from verified sources and give useful information on COVID-19. • Fake - Tweets, posts, articles which make claims and speculations about COVID-19 which are verified to be not true…
.. Content is related to the topic of COVID-19…..
Fake news data from public fact- verification websites and social media. Facebook posts, tweets, a news piece, Instragram posts, public statements, press releases, or any other popular media content, are leveraged towards collecting fake news.
…Besides these, popular fact-verification websites like PolitiFact, Snopes 8 , Boomlive are also used as they play a crucial role towards collating the manually adjudicated details of the veracity of the claims becoming viral. These websites host COVID-19 and other generic topic related verdicts. The factually verified (fake) content can be easily found from such websites.
Real News.. crawl tweets from official and verified twitter handles of the relevant sources using twitter API. The relevant sources are the official government accounts, medical institutes, news channels, etc. We collect tweets from 14 such sources, e.g., World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), Covid India Seva, Indian Council of Medical Research (ICMR), etc….
Sources:
https://arxiv.org/pdf/2011.03327.pdf
https://arxiv.org/abs/2011.03327
F) Submitted SEP 2021 Clinical Trial Information Extraction with BERT
“Comments: Health NLP 2021, IEEE International Conference on Healthcare Informatics (ICHI 2021)
…Natural language processing (NLP) of clinical trial documents can be useful in new trial design. Here we identify entity types relevant to clinical trial design and propose a framework called CT-BERT for information extraction from clinical trial text. We trained named entity recognition (NER) models to extract eligibility criteria entities by fine-tuning a set of pre-trained BERT models. We then compared the performance of CT-BERT with recent baseline methods including attention-based BiLSTM and Criteria2Query. The results demonstrate the superiority of CT-BERT in clinical trial NLP….
…In this study, we introduced a new framework CT-BERT and trained NER models to leverage BERT-based modelling for clinical trial information extraction. We studied how pre-trained BERT models may impact the NER performance….
…Collectively, CT-BERT shows significant improvement in model quality. Getting high accuracy in information extraction paves the way for automatic AI-driven clinical trial design.”
Sources:
https://arxiv.org/pdf/2110.10027.pdf
https://arxiv.org/abs/2110.10027
G) Submitted JUNE 2021 COBERT: COVID-19 Question Answering System Using BERT
“Abstract The risks are most certainly not trivial, as decisions made on fallacious, answers may endanger trust or general well being and security of the public. But, with thousands of research papers being dispensed on the topic, making it more difficult to keep track of the latest research...
..proposed COBERT: a retriever-reader dual algorithmic system that answers the complex queries by searching a document of 59K corona virus-related literature made accessible through the Coronavirus Open Research Dataset Challenge (CORD-19). The retriever is composed of a TF-IDF vectorizer capturing the top 500 documents with optimal scores. The reader which is pre-trained Bidirectional Encoder Representations from Transformers (BERT) on SQuAD 1.1 dev dataset built on top of the HuggingFace BERT transformers, refines the sentences from the filtered documents, which are then passed into ranker which compares the logits scores to produce a short answer, title of the paper and source article of extraction.
BERT: stands for Bidirectional Encoder Representation from Transformer the most recent refinement of a series of neural models that make substantial use of pretraining, and has prompted noteworthy gains in numerous natural language processing tasks, going from a text classification to do tasks like question answering from the corpus….
Distil BERT: It is a BERT based cheap, small, fast, and light Transformer model… English language model, pre-trained on the same data used to pre-trained BERT (concatenation of the Toronto Book Corpus and full English Wikipedia) using distillation with the supervision of the Bert-base-uncased version of Bert. The model has 6 layers, 768 dimensions, and 12 heads, totalizing 66M parameters….
Ranker: presents the top 3 answers based on a weighted score between the retriever score (based on TF-IDF cosine similarity explained in Sect. 4.2) and reader score (based on DistilBERT QA Q-A pair probability). Cosine similarity (cos(𝑞⃗ ,𝑑⃗ )), as shown in Equation 4, helps to find the similarity among two vectors of inner product space and conclude whether these vectors are indicating in the same direction. We frequently calculate the similarity of documents in tasks such as text analysis.
Experimental Setup Dataset: COBERT system uses the dataset of CORD-19: COVID Open Research Data set collected by The White House, with the help of leading research groups like Allen Institute for AI and Kaggle [26]. The dataset consists of a collection of work of several Researchers and its analysis. It consists of around 59 thousand papers and around 41 thousand full texts [27] incorporating papers distributed in more than 3200 journals. Many texts are related to institutions situated in the United States (over 16 thousand papers) followed by the United Kingdom (over 3 thousand papers) and the European Union and then Asian countries. Chinese organizations have seen a brilliant ascent this year (over 5K papers) because of China’s status as the principal focal point of the COVID-19 episode, thus, making the source of data very much diverse”
Source:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8220121/
H) Submitted MAY 2020 COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter
“COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19.
..10-30% marginal improvement compared to its base model, BERT-Large…
.. largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter….
…Method The CT-BERT model is trained on a corpus of 160M tweets about the coronavirus collected through the Crowdbreaks platform [7] during the period from January 12 to April 16, 2020. Crowdbreaks uses the Twitter filter stream API to listen to a set of COVID-19-related keywords in the English language. Prior to training, the original corpus was cleaned for retweet tags. Each tweet was pseudonymised by replacing all Twitter usernames with a common text token. A similar procedure was performed on all URLs to web pages…”
CrowdBreaks: “Goal: For many health-related issues human behaviour is of central importance for Public Health to design appropriate policies. Health behaviors are partially influenced by people's opinion which has been traditionally assessed in surveys. Social media can be used to complement traditional surveys and serve as a low-cost, global, and real-time addition to the toolset of Public Health surveillance.
Sources:
https://arxiv.org/pdf/2005.07503.pdf
https://arxiv.org/abs/2005.07503
https://github.com/digitalepidemiologylab/covid-twitter-bert
https://colab.research.google.com/github/digitalepidemiologylab/covid-twitter-bert/blob/master/CT_BERT_Huggingface_(GPU_training).ipynb
https://github.com/digitalepidemiologylab/crowdbreaks-welcome
https://github.com/crowdAI/crowdai
I) Submitted OCT 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding…… “Google AI LAnguage”
“We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task specific architecture modifications.”
Sources:
https://arxiv.org/abs/1810.04805
https://arxiv.org/pdf/1810.04805.pdf
Dial-a-Docs 11A) A CT-BERT Author Citations
geo-tagged with Emer's Prestigious Paris Pill Dispenser (TB AMR) at the home of the Particle Accelerator to school/Lead us on “Life Sciences” & “Computer & Comms Sciences” in: [Lab + Epidemiology + Digital]
A Pigeon Poo-generated Pic & Word Chrono-Cloud thingee:
11 B) DIAl-a-Docs CT-BERTie the Bus Fuel Depot [Convergence]: Real World Data, Cable & Corpus Linkages to both Catholic Ascension Hospital Chain & Darwin’s Descension on an isle with the weirdest set of random coincidences Q/ Who Wrote this Script Again? A/
18th November 2019: “Google-Ascension deal reveals murky side of sharing health data: The Google-Ascension partnership is a 'PR failure' that demonstrates a need for greater transparency about what happens to patient health data….
…Ascension, a Catholic health system based in St. Louis, partnered with Google to transition the health system's infrastructure to the Google Cloud Platform, to use the Google G Suite productivity and collaboration tools, and to explore the tech giant's artificial intelligence and machine learning applications. By doing so, it is giving Google access to patient data, which the search giant can use to inform its own products….
Source:
https://www.techtarget.com/searchhealthit/feature/Google-Ascension-deal-reveals-murky-side-of-sharing-health-data
November 13th 2019: Technology that improves patients’ lives, caregivers’ experience…
The “secret” code name: For planning purposes, Ascension and Google named our collaboration Project Nightingale as a shorthand way of referring to it. The name reflects the work of Florence Nightingale, a trailblazing figure in nursing who greatly affected 19th- and 20th-century policies around proper care…
..About the data: Ascension’s clinical data, hosted in the Google Cloud Platform, is housed within an Ascension-owned virtual private space. Google is not permitted to use the data for marketing or research purposes. Hospitals and clinical software vendors across the country have converted or are in the process of converting to electronic health records stored in the cloud and soon the entire industry will be adopting this approach.
…And it’s secure: All of Google’s work with Ascension is in compliance with applicable regulations, including the Health Insurance Portability and Accountability Act (HIPAA), and is covered by a Business Associate Agreement (BAA) that governs Protected Health Information (PHI)”
“Artificial intelligence/machine learning will help provide insights, with a licensed clinician always making the final treatment decisions….
Last year, we provided $2 billion in care of persons living in poverty and other community benefit programs….
…We selected Google to help us on our journey of transformation. As a Catholic health ministry, we conducted an Ethics Review to ensure this collaboration is aligned with our Mission and Values, and that it is consistent with our Catholic identity.
Source:
https://www.ascension.org/News/News-Articles/2019/11/12/21/45/Technology-that-improve-patients-lives-caregivers-experience
November 15, 2019 Curie subsea cable set to transmit to Chile, with a pit stop to Panama
“Once again, we’re reminded that the cloud isn’t in the sky—it’s in the ocean”
June 29, 2019 Introducing Equiano, a subsea cable from Portugal to South Africa
“because Equiano is fully funded by Google, we’re able to expedite our construction timeline and optimize the number of negotiating parties. …first phase of the project, connecting South Africa with Portugal, is expected to be completed in 2021…
…..Between 2016 and 2018, Google invested US$47 billion in capex…”
“History of the Atlantic Cable & Undersea Communications
from the first submarine cable of 1850 to the worldwide fiber optic network…”
Sources:
https://cloud.google.com/blog/products/infrastructure/curie-subsea-cable-set-to-transmit-to-chile-with-a-pit-stop-to-panama
https://cloud.google.com/blog/products/infrastructure/introducing-equiano-a-subsea-cable-from-portugal-to-south-africa
https://atlantic-cable.com/CableCos/Ascension/
1 September 2021 Fibre Optic Cable Landed on St Helena St Helena Govt (SHG)
“Sunday, 29 August 2021, was a ground-breaking day in St Helena’s digital history as the Island’s branch of the Equiano Subsea Cable was landed here…
… acknowledges the €21.5 million allocated by the EU under the EDF’11 programme to the territory, of which St Helena received around €17 million to support the delivery of the SHG Digital Strategy and to achieve the goals of the 10 Year Plan….
..driving the Island forward in the digital age...
….The high-speed fibre cable should offer opportunities for private sector development, distance learning, tele-medicine and e-commerce…
…Notes to Editors: In December 2019, SHG signed a contract with Google to connect St Helena Island to the Equiano Subsea Fibre Optic Cable, delivering St Helena’s first high-speed, fibre-optic connectivity.”
Source:
https://www.sainthelena.gov.sh/2021/news/fibre-optic-cable-landed-on-st-helena
Ascension govt: “Ascension, like other Overseas Territories and the Crown Territories, is not part of the United Kingdom. It has its own Constitution (shared with St Helena and Tristan da Cunha), is internally self-governing, makes its own laws, has a separate fiscal jurisdiction and has tax raising powers through the Governor. The United Kingdom is responsible for the defence, international relations, and internal security of the territory”
BBC: Darwin discussed how to make Ascension more habitable for humans with his friend Joseph Hooker, later director of the Royal Botanic Gardens at Kew, who visited in 1843. Hooker devised a plan..
..Hooker, to his credit, knew his planting scheme would push out the endemic ferns. What he perhaps didn't realise was just how much havoc it would cause…”
Darwin Initiative: Lead Organisations: Gov of Ascension - AIGCD
The Darwin Initiative is a UK govt grants scheme that helps protect biodiversity, the natural environment and the local communities that live alongside it in developing countries… building environmental knowledge.. capacity building.. research.. implementing international biodiversity agreements ..
..Announced by the UK Government at the Rio Earth Summit in 1992. ..Since …awarded over £164m to more than 1,143 projects across 159 countries
…supports developing countries to conserve biodiversity and reduce poverty…
..funded by Defra (Food & Rural Affairs) , DFID (Int. Dev.) and FCO (Foreign & Commonwealth Office) in the UK …..provides grants to meet their objectives under: the Convention on Biological Diversity (CBD), the Nagoya Protocol on Access and Benefit-Sharing (ABS), the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA), the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES).
…The Darwin Expert Committee (currently chaired by Professor Stephen Blackmore), consists of experts from government, academia, science and the private sector….
Sources:
https://www.ascension.gov.ac/
https://www.bbc.com/news/magazine-36076411
https://www.darwininitiative.org.uk/project/institution/lead/4220/
https://www.darwininitiative.org.uk/about-us/
Dial-a-docs 12) BERTie & Fren’s I/Os, Replications & Bias's are COnstrained by the Fat Controller COmmand via the COrpus. Why did BERTie & Co go Visibly Bananas WarpSpeed Style for CO-V-ID?
Tech Crunch March 16 2020: “In a briefing on Monday, research leaders across tech, academia and the government joined the White House to announce an open data set full of scientific literature on the novel coronavirus……
…Sharing vital information across scientific and medical communities is key to accelerating our ability to respond to the coronavirus pandemic,” Chan Zuckerberg Initiative Head of Science Cori Bargmann said of the project….
….The Chan Zuckerberg Initiative hopes that the global machine learning community will be able to help the science community connect the dots on some of the enduring mysteries about the novel coronavirus as scientests pursue knowledge around prevention, treatment and a vaccine…..
The CORD-19 data set announcement is certain to roll out more smoothly than the White House’s last attempt at a coronavirus-related partnership with the tech industry. The White House came under criticism last week for President Trump’s announcement that Google would build a dedicated website for COVID-19 screening. In fact, the site was in development by Verily, Alphabet’s life science research group, and intended to serve California residents, beginning with San Mateo and Santa Clara County. (Alphabet is the parent company of Google.)”
Data Scientest: March 18 2020:
“….address [COVID-19] problem, researchers and leaders from the leading AI institutes, including Allen Institute for AI and Microsoft, and the federal government agency (i.e., the National Library of Medicine) have teamed together with extensive collaboration, resulting in the release of the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2, and other kinds of coronavirus. ..
In a briefing on Monday, research leaders across tech, academia and the government joined the White House to announce an open data set full of scientific literature on the novel coronavirus……”
COVID-19 Open Research Dataset Challenge (CORD-19): An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House
“This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.
… a resource of over 1,000,000 scholarly articles, including over 400,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease..
…A list of our initial key questions can be found under the Tasks section of this dataset. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organization’s R&D Blueprint for COVID-19.
Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions….”