Abstracts from pubmed Formatted for postgreSQL databases.

Annotated vs. Raw meaning
-------------------------

Annotated is synonym for lemmatized, processed bags of words with pubmed id and a boolean label.

Raw is synonym for the raw sentences used to generate the annotated bag of words and are also labeled.

All_sentences
-------------

Ex. These are 1000 identical sentences retrieved by pubmed trying to recapitulate the  RDoC concept of ‘Arousal’. The identical sentences are labeled according to the file name and meant to be combined in various ways with the other files.

- annotated_sentences_arousal_1_1000_false
- annotated_sentences_arousal_1_1000_nulled
- annotated_sentences_arousal_1_1000_true

Column data

  | pubmed id | sentence or bag of words | label | null (useful as empty field for DeepDive to apply keys) |

Column type

   | int | text or postgreSQL array | postgreSQL boolean (t,f,\N) | \N |

Experimental folders (all other folders)
----------------------------------------

Sentences from all_sentences folder were combined for use in DeepDive apps.

Within individual experiment folders:

Ex. Auditory_2_1000__vs__Psyc_1000__pdx__un_Aud_1_1000

- 1000 sentences labeled ’t’ for auditory perception.
- 1000 sentences labeled ‘f’ for auditory perception (because they belong to a generalized psychology instead).
- 1000 sentences labeled ‘\N’ for an independent set of auditory perception.

In our DeepDive apps, the ’t’ and ‘f’ sentences are used for training and the ‘\N’ labeled sentences will be predicted.

Due to a peculiarity of the version of DeepDive we used, we also added ‘\N’ labels to all the ’t’ and ‘f’. This was required to enable our evaluation of the training as well as the holdout sentences in DeepDive on a sentence by sentence basis, something DeepDive did not clearly enable otherwise.

