condor.models package

Submodules

condor.models.base module

Common functionality for sqlalchemy schemas.

class condor.models.base.AuditableMixing[source]

Bases: object

Table mixing to add the common fields in our database, to make it auditable we add timestamps and an eid field as primary key just for convenience.

classmethod count(database)[source]
created = Column(None, DateTime(), table=None, nullable=False, default=ColumnDefault(<function ColumnDefault._maybe_wrap_callable.<locals>.<lambda>>))
eid = Column(None, Unicode(length=40), table=None, primary_key=True, nullable=False, default=ColumnDefault(<function ColumnDefault._maybe_wrap_callable.<locals>.<lambda>>))
classmethod find_by_eid(database, eid)[source]

Finds a model by it’s eid of a chunk of it, returns None if there are no models matching the eid chunk.

Parameters:
  • db – sqlalchemy session to find the item.
  • eid – eid to match, can be a partial.
Returns:

the model if its found None otherwise.

classmethod latest(database)[source]

Finds the latest model in the given db.

Parameters:db – sqlalchemy session to find the item.
Returns:the latest model if its found None otherwise.
classmethod list(database, count=None)[source]
modified = Column(None, DateTime(), table=None, nullable=False, onupdate=ColumnDefault(<function ColumnDefault._maybe_wrap_callable.<locals>.<lambda>>), default=ColumnDefault(<function ColumnDefault._maybe_wrap_callable.<locals>.<lambda>>))
condor.models.base.eid_gen()[source]

Generates an unique eid based on a random string.

Returns:a unique random string.

condor.models.bibliography module

Tools for handling a set of documents called bibliography.

class condor.models.bibliography.Bibliography(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

Describes a group of documents.

created
description
documents
eid
modified
queries
term_document_matrices
words(fields, normalizer_class)[source]

List of normalized words from the given fields.

Parameters:
  • fields – list of fields to check
  • normalizer_class – normalizer to use
Returns:

list of words

condor.models.document module

A tool for managing single documents within a bibliography.

class condor.models.document.Document(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

Describes a single document.

bibliography
bibliography_eid
classmethod count(database, bibliography_eid)[source]

Different to the usual count, it counts the number of documents in a given bibliography.

created
description
eid
full_text

Retrieve full text. :return: string with the full text

full_text_path
hash
keywords
language
classmethod list(database, bibliography_eid, count=None)[source]

Different to the usual list this one should just return records related to just one bibliography.

static load_full_text(record, files, force=False)[source]
static mappings_from_files(file_names, record_type, full_text_path=None, force=False, **kwargs)[source]

Creates document mappings out of files.

Parameters:
  • file_names – paths to the files
  • record_type – type of record to extract
  • kwargs – extra fields to include in the mappings
  • full_text_path – path to look for full text pdf files
  • force – force reading the full text from pdf files
Returns:

an iterable over mappings

modified
raw_data(fields, normalizer_class)[source]

Get the raw data from the given fields in this record.

Parameters:
  • fields – fields of interest
  • normalizer_class – normalizer for the data
Returns:

list of normalized data

title

condor.models.query module

class condor.models.query.Query(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

bibliography
bibliography_eid
contributor
created
eid
modified
query_string
results
topic
class condor.models.query.QueryResult(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

created
document
document_eid
eid
modified
query
query_eid

condor.models.ranking_matrix module

class condor.models.ranking_matrix.RankingMatrix(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

build_options
created
eid
kind
classmethod lsa_from_term_document_matrix(term_document_matrix, covariance)[source]

Builds an lsa ranking matrix.

This will ccut the matrix so taht it keeps the given covariance.

Parameters:
  • term_document_matrix – term document matrix to use
  • covariance (float) – amount of covariance to keep.
Returns:

the ranking matrix

matrix
modified
query(tokens, limit=None, cosine=None)[source]

Find the most relevant documents in the index given this tokens.

The tokens are normalized and language guessed internally! So no worries, just pass in the tokens as entered by the user.

Parameters:
  • tokens – query string to search
  • limit – limit of documents to use
  • cosine – limit cosine to use
Returns:

list of documents

ranking_matrix_path
term_document_matrix
term_document_matrix_eid

condor.models.term_document_matrix module

Just a normal text document matrix representation, the actual matrix is stored off site on a numpy file.

class condor.models.term_document_matrix.TermDocumentMatrix(**kwargs)[source]

Bases: condor.models.base.AuditableMixing, sqlalchemy.ext.declarative.api.Base

Represents a term document matrix.

bibliography
bibliography_eid
bibliography_options
created
eid
classmethod from_bibliography_set(bibliography, regularise=True, fields=None, normalizer_class=None)[source]

Build a matrix from a document set.

This has the side effect of creating the matrix a and terms files.

Parameters:
  • bibliography – a document set
  • regularise – apply TF IDF regularization.
  • fields – fields of interest
  • normalizer_class – normalizer class to use
Returns:

a term document matrix.

matrix

Load the matrix stored off site.

matrix_path
modified
processing_options
ranking_matrices
term_list_path
words

Load the words stored off site.

Module contents