Pandas Mongo

Post on 22-Mar-2022

23 views 0 download

Transcript of Pandas Mongo

Pandas MongoRelease 0.1.0

May 05, 2020

Contents

1 Overview 1

2 Quick Start 3

3 Installation 53.1 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Installation 7

5 Quick Start 9

6 Reading dataframes from MongoDB using aggregation 11

7 Reference 137.1 pdmongo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

8 Contributing 158.1 Bug reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.2 Documentation improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.3 Feature requests and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.4 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

9 Authors 17

10 Changelog 1910.1 0.1.0 (2020-05-05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.2 0.0.2 (2020-05-04) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.3 0.0.1 (2020-04-30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.4 0.0.0 (2020-03-22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

11 Indices and tables 21

Python Module Index 23

Index 25

i

ii

Pandas Mongo, Release 0.1.0

2 Chapter 1. Overview

CHAPTER 2

Quick Start

Writing a pandas DataFrame to a MongoDB collection:

import pdmongo as pdmimport pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")df.to_mongo(df, collection, uri)

Reading a MongoDB collection into a pandas DataFrame:

import pdmongo as pdmdf = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")print(df)

3

Pandas Mongo, Release 0.1.0

4 Chapter 2. Quick Start

CHAPTER 3

Installation

pip install pdmongo

You can also install the in-development version with:

pip install https://github.com/pakallis/python-pandas-mongo/archive/master.zip

3.1 Documentation

https://python-pandas-mongo.readthedocs.io/

3.2 Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windowsset PYTEST_ADDOPTS=--cov-appendtox

OtherPYTEST_ADDOPTS=--cov-append tox

5

Pandas Mongo, Release 0.1.0

6 Chapter 3. Installation

CHAPTER 4

Installation

At the command line:

pip install pdmongo

7

Pandas Mongo, Release 0.1.0

8 Chapter 4. Installation

CHAPTER 5

Quick Start

Writing a pandas DataFrame to a MongoDB collection:

import pdmongo as pdmimport pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")df.to_mongo(df, collection, uri)

Reading a MongoDB collection into a pandas DataFrame:

import pdmongo as pdmdf = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")print(df)

9

Pandas Mongo, Release 0.1.0

10 Chapter 5. Quick Start

CHAPTER 6

Reading dataframes from MongoDB using aggregation

You can use an aggregation query to filter/transform data in MongoDB before fetching them into a data frame.

Reading a collection from MongoDB into a pandas DataFrame by using an aggregation query:

import pdmongo as pdmquery = [

{"$match": {

'A': 1}

}]df = pdm.read_mongo("MyCollection", query, "mongodb://localhost:27017/mydb")print(df)

The query accepts the same arguments as method aggregate of pymongo package.

11

Pandas Mongo, Release 0.1.0

12 Chapter 6. Reading dataframes from MongoDB using aggregation

CHAPTER 7

Reference

7.1 pdmongo

pdmongo.read_mongo(collection: str, query: List[Dict[str, Any]], db: Union[str, py-mongo.database.Database], index_col: Union[str, List[str], None] = None,extra: Optional[Dict[str, Any]] = None, chunksize: Optional[int] = None) →pandas.core.frame.DataFrame

Read MongoDB query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query. Optionally provide an index_col parameter touse one of the columns as the index, otherwise default integer index will be used.

Parameters

• collection (str) – Mongo collection to select for querying

• query (list) – Must be an aggregate query. The input will be passed to pymongo .aggregate

• db (pymongo.database.Database or database string URI) – The database to use

• index_col (str or list of str, optional, default: None) – Column(s) to set as in-dex(MultiIndex).

• extra (dict, optional, default: None) – List of parameters to pass to aggregate method.

• chunksize (int, default None) – If specified, return an iterator where chunksize is the numberof docs to include in each chunk.

Returns Dataframe

pdmongo.to_mongo(frame: pandas.core.frame.DataFrame, name: str, db: Union[str, py-mongo.database.Database], if_exists: Optional[str] = ’fail’, index: Optional[bool]= True, index_label: Union[str, Sequence[str], None] = None, chunksize:Optional[int] = None) → Union[List[pymongo.results.InsertManyResult], py-mongo.results.InsertManyResult]

Write records stored in a DataFrame to a MongoDB collection.

Parameters

13

Pandas Mongo, Release 0.1.0

• frame (DataFrame, Series)

• name (str) – Name of collection.

• db (pymongo.database.Database or database string URI) – The database to write to

• if_exists ({‘fail’, ‘replace’, ‘append’}, default ‘fail’) –

– fail: If table exists, do nothing.

– replace: If table exists, drop it, recreate it, and insert data.

– append: If table exists, insert data. Create if does not exist.

• index (boolean, default True) – Write DataFrame index as a column.

• index_label (str or sequence, optional) – Column label for index column(s). If None isgiven (default) and index is True, then the index names are used. A sequence should begiven if the DataFrame uses MultiIndex.

• chunksize (int, optional) – Specify the number of rows in each batch to be written at a time.By default, all rows will be written at once.

14 Chapter 7. Reference

CHAPTER 8

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

8.1 Bug reports

When reporting a bug please include:

• Your operating system name and version.

• Any details about your local setup that might be helpful in troubleshooting.

• Detailed steps to reproduce the bug.

8.2 Documentation improvements

Pandas Mongo could always use more documentation, whether as part of the official Pandas Mongo docs, in docstrings,or even on the web in blog posts, articles, and such.

8.3 Feature requests and feedback

The best way to send feedback is to file an issue at https://github.com/pakallis/python-pandas-mongo/issues.

If you are proposing a feature:

• Explain in detail how it would work.

• Keep the scope as narrow as possible, to make it easier to implement.

• Remember that this is a volunteer-driven project, and that code contributions are welcome :)

15

Pandas Mongo, Release 0.1.0

8.4 Development

To set up python-pandas-mongo for local development:

1. Fork python-pandas-mongo (look for the “Fork” button).

2. Clone your fork locally:

git clone git@github.com:pakallis/python-pandas-mongo.git

3. Create a branch for local development:

git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

4. When you’re done making changes run all the checks and docs builder with tox one command:

tox

5. Commit your changes and push your branch to GitHub:

git add .git commit -m "Your detailed description of your changes."git push origin name-of-your-bugfix-or-feature

6. Submit a pull request through the GitHub website.

8.4.1 Pull Request Guidelines

If you need some code review or feedback while you’re developing the code just make the pull request.

For merging, you should:

1. Include passing tests (run tox)1.

2. Update documentation when there’s new API, functionality etc.

3. Add a note to CHANGELOG.rst about the changes.

4. Add yourself to AUTHORS.rst.

8.4.2 Tips

To run a subset of tests:

tox -e envname -- pytest -k test_myfeature

To run all the test environments in parallel (you need to pip install detox):

detox

1 If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in thepull request.

It will be slower though . . .

16 Chapter 8. Contributing

CHAPTER 9

Authors

• Pavlos Kallis - https://pakallis.github.com

17

Pandas Mongo, Release 0.1.0

18 Chapter 9. Authors

CHAPTER 10

Changelog

10.1 0.1.0 (2020-05-05)

• Added static typing

• Added mypy to travis CI

• Removed unecessary params

10.2 0.0.2 (2020-05-04)

• Dropped support for pypy3

10.3 0.0.1 (2020-04-30)

• Added read_mongo and basic support for reading MongoDB collections into pandas dataframes

• Added to_mongo and basic support for writing pandas dataframes in MongoDB collections

10.4 0.0.0 (2020-03-22)

• First release on PyPI.

19

Pandas Mongo, Release 0.1.0

20 Chapter 10. Changelog

CHAPTER 11

Indices and tables

• genindex

• modindex

• search

21

Pandas Mongo, Release 0.1.0

22 Chapter 11. Indices and tables

Python Module Index

ppdmongo, 13

23

Pandas Mongo, Release 0.1.0

24 Python Module Index

Index

Ppdmongo (module), 13

Rread_mongo() (in module pdmongo), 13

Tto_mongo() (in module pdmongo), 13

25