How to use Pandas on Python

pandas powerful python data analysis toolkit how to install pandas python on ubuntu and how does python pandas work and how to use pandas python
Prof.SteveBarros Profile Pic
Prof.SteveBarros,United Kingdom,Teacher
Published Date:28-07-2017
Your Website URL(Optional)
Comment
pandas: powerful Python data analysis toolkit Release0.20.3 Wes McKinney & PyData Development Team Jul 07, 2017CHAPTER ONE WHAT’S NEW These are new features and improvements of note in each release. 1.1 v0.20.3 (July 7, 2017) This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes and bug fixes. We recommend that all users upgrade to this version. What’s new in v0.20.3 • Bug Fixes – Conversion – Indexing – I/O – Plotting – Reshaping – Categorical 1.1.1 Bug Fixes • Fixed a bug in failing to compute rolling computations of a column-MultiIndexed DataFrame (GH16789, GH16825) • Fixed a pytest marker failing downstream packages’ tests suites (GH16680) 1.1.1.1 Conversion • Bug in pickle compat prior to the v0.20.x series, when UTC is a timezone in a Series/DataFrame/Index (GH16608) • Bug inSeries construction when passing aSeries withdtype='category' (GH16524). • Bug inDataFrame.astype() when passing aSeries as thedtype kwarg. (GH16717). 3pandas: powerful Python data analysis toolkit, Release 0.20.3 1.1.1.2 Indexing • Bug inFloat64Index causing an empty array instead ofNone to be returned from.get(np.nan) on a Series whose index did not contain anyNaN s (GH8569) • Bug inMultiIndex.isin causing an error when passing an empty iterable (GH16777) • Fixed a bug in a slicing DataFrame/Series that have aTimedeltaIndex (GH16637) 1.1.1.3 I/O • Bug inread_csv() in which files weren’t opened as binary files by the C engine on Windows, causing EOF characters mid-field, which would fail (GH16039, GH16559, GH16675) • Bug inread_hdf() in which reading aSeries saved to an HDF file in ‘fixed’ format fails when an explicit mode='r' argument is supplied (GH16583) • Bug in DataFrame.to_latex() where bold_rows was wrongly specified to be True by default, whereas in reality row labels remained non-bold whatever parameter provided. (GH16707) • Fixed an issue withDataFrame.style() where generated element ids were not unique (GH16780) • Fixed loading aDataFrame with aPeriodIndex, from aformat='fixed' HDFStore, in Python 3, that was written in Python 2 (GH16781) 1.1.1.4 Plotting • Fixed regression that prevented RGB and RGBA tuples from being used as color arguments (GH16233) • Fixed an issue withDataFrame.plot.scatter() that incorrectly raised aKeyError when categorical data is used for plotting (GH16199) 1.1.1.5 Reshaping • PeriodIndex /TimedeltaIndex.join was missing thesort= kwarg (GH16541) • Bug in joining on aMultiIndex with acategory dtype for a level (GH16627). • Bug inmerge() when merging/joining with multiple categorical columns (GH16767) 1.1.1.6 Categorical • Bug inDataFrame.sort_values not respecting thekind parameter with categorical data (GH16793) 1.2 v0.20.2 (June 4, 2017) This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version. What’s new in v0.20.2 • Enhancements 4 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 • Performance Improvements • Bug Fixes – Conversion – Indexing – I/O – Plotting – Groupby/Resample/Rolling – Sparse – Reshaping – Numeric – Categorical – Other 1.2.1 Enhancements • Unblocked access to additional compression types supported in pytables: ‘blosc:blosclz, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’ (GH14478) • Series provides ato_latex method (GH16180) • A new groupby methodngroup(), parallel to the existingcumcount(), has been added to return the group order (GH11642); see here. 1.2.2 Performance Improvements • Performance regression fix when indexing with a list-like (GH16285) • Performance regression fix for MultiIndexes (GH16319, GH16346) • Improved performance of.clip() with scalar arguments (GH15400) • Improved performance of groupby with categorical groupers (GH16413) • Improved performance ofMultiIndex.remove_unused_levels() (GH16556) 1.2.3 Bug Fixes • Silenced a warning on some Windows environments about “tput: terminal attributes: No such device or address” when detecting the terminal size. This fix only applies to python 3 (GH16496) • Bug in usingpathlib.Path orpy.path.local objects with io functions (GH16291) • Bug in Index.symmetric_difference() on two equal MultiIndex’s, results in a TypeError (:issue 13490) • Bug inDataFrame.update() withoverwrite=False andNaN values (GH15593) • Passing an invalid engine to read_csv() now raises an informative ValueError rather than UnboundLocalError. (GH16511) 1.2. v0.20.2 (June 4, 2017) 5pandas: powerful Python data analysis toolkit, Release 0.20.3 • Bug inunique() on an array of tuples (GH16519) • Bug incut() whenlabels are set, resulting in incorrect label ordering (GH16459) • Fixed a compatibility issue with IPython 6.0’s tab completion showing deprecation warnings on Categoricals (GH16409) 1.2.3.1 Conversion • Bug into_numeric() in which empty data inputs were causing a segfault of the interpreter (GH16302) • Silence numpy warnings when broadcasting DataFrame to Series with comparison ops (GH16378, GH16306) 1.2.3.2 Indexing • Bug inDataFrame.reset_index(level=) with single level index (GH16263) • Bug in partial string indexing with a monotonic, but not strictly-monotonic, index incorrectly reversing the slice bounds (GH16515) • Bug in MultiIndex.remove_unused_levels() that would not return a MultiIndex equal to the original. (GH16556) 1.2.3.3 I/O • Bug inread_csv() whencomment is passed in a space delimited text file (GH16472) • Bug inread_csv() not raising an exception with nonexistent columns inusecols when it had the correct length (GH14671) • Bug that would force importing of the clipboard routines unnecessarily, potentially causing an import error on startup (GH16288) • Bug that raisedIndexError when HTML-rendering an emptyDataFrame (GH15953) • Bug in read_csv() in which tarfile object inputs were raising an error in Python 2.x for the C engine (GH16530) • Bug whereDataFrame.to_html() ignored theindex_names parameter (GH16493) • Bug wherepd.read_hdf() returns numpy strings for index names (GH13492) • Bug inHDFStore.select_as_multiple() where start/stop arguments were not respected (GH16209) 1.2.3.4 Plotting • Bug inDataFrame.plot with a single column and a list-likecolor (GH3486) • Bug inplot whereNaT inDatetimeIndex results inTimestamp.min (:issue: 12405) • Bug in DataFrame.boxplot where figsize keyword was not respected for non-grouped boxplots (GH11959) 6 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 1.2.3.5 Groupby/Resample/Rolling • Bug in creating a time-based rolling window on an emptyDataFrame (GH15819) • Bug inrolling.cov() with offset window (GH16058) • Bug in.resample() and.groupby() when aggregating on integers (GH16361) 1.2.3.6 Sparse • Bug in construction ofSparseDataFrame fromscipy.sparse.dok_matrix (GH16179) 1.2.3.7 Reshaping • Bug inDataFrame.stack with unsorted levels inMultiIndex columns (GH16323) • Bug inpd.wide_to_long() where no error was raised wheni was not a unique identifier (GH16382) • Bug inSeries.isin(..) with a list of tuples (GH16394) • Bug in construction of aDataFrame with mixed dtypes including an all-NaT column. (GH16395) • Bug inDataFrame.agg() andSeries.agg() with aggregating on non-callable attributes (GH16405) 1.2.3.8 Numeric • Bug in .interpolate(), where limit_direction was not respected when limit=None (default) was passed (GH16282) 1.2.3.9 Categorical • Fixed comparison operations considering the order of the categories when both categoricals are unordered (GH16014) 1.2.3.10 Other • Bug inDataFrame.drop() with an empty-list with non-unique indices (GH16270) 1.3 v0.20.1 (May 5, 2017) This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. Highlights include: • New.agg() API for Series/DataFrame similar to the groupby-rolling-resample API’s, see here • Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() method, see here. • The.ix indexer has been deprecated, see here • Panel has been deprecated, see here 1.3. v0.20.1 (May 5, 2017) 7pandas: powerful Python data analysis toolkit, Release 0.20.3 • Addition of anIntervalIndex andInterval scalar type, see here • Improved user API when grouping by index levels in.groupby(), see here • Improved support forUInt64 dtypes, see here • A new orient for JSON serialization,orient='table', that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see here • Experimental support for exporting styled DataFrames (DataFrame.style) to Excel, see here • Window binary corr/cov operations now return a MultiIndexedDataFrame rather than aPanel, asPanel is now deprecated, see here • Support for S3 handling now usess3fs, see here • Google BigQuery support now uses thepandas-gbq library, see here Warning: Pandas has changed the internal structure and layout of the codebase. This can affect imports that are not from the top-levelpandas. namespace, please see the changes here. Check the API Changes and deprecations before updating. Note: This is a combined release for 0.20.0 and and 0.20.1. Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas’utils routines. (GH16250) What’s new in v0.20.0 • New features – agg API for DataFrame/Series – dtype keyword for data IO – .to_datetime() has gained anorigin parameter – Groupby Enhancements – Better support for compressed URLs inread_csv – Pickle file I/O now supports compression – UInt64 Support Improved – GroupBy on Categoricals – Table Schema Output – SciPy sparse matrix from/to SparseDataFrame – Excel output for styled DataFrames – IntervalIndex – Other Enhancements • Backwards incompatible API changes – Possible incompatibility for HDF5 formats created with pandas 0.13.0 – Map on Index types now return other Index types 8 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 – Accessing datetime fields of Index now return Index – pd.unique will now be consistent with extension types – S3 File Handling – Partial String Indexing Changes – Concat of different float dtypes will not automatically upcast – Pandas Google BigQuery support has moved – Memory Usage for Index is more Accurate – DataFrame.sort_index changes – Groupby Describe Formatting – Window Binary Corr/Cov operations return a MultiIndex DataFrame – HDFStore where string comparison – Index.intersection and inner join now preserve the order of the left Index – Pivot Table always returns a DataFrame – Other API Changes • Reorganization of the library: Privacy Changes – Modules Privacy Has Changed – pandas.errors – pandas.testing – pandas.plotting – Other Development Changes • Deprecations – Deprecate.ix – Deprecate Panel – Deprecate groupby.agg() with a dictionary when renaming – Deprecate .plotting – Other Deprecations • Removal of prior version deprecations/changes • Performance Improvements • Bug Fixes – Conversion – Indexing – I/O – Plotting – Groupby/Resample/Rolling – Sparse 1.3. v0.20.1 (May 5, 2017) 9pandas: powerful Python data analysis toolkit, Release 0.20.3 – Reshaping – Numeric – Other 1.3.1 New features 1.3.1.1 agg API for DataFrame/Series Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API from groupby, window operations, and resampling. This allows aggregation operations in a concise way by using agg() and transform(). The full documentation is here (GH1623). Here is a sample In 1: df = pd.DataFrame(np.random.randn(10, 3), columns='A', 'B', 'C', ...: index=pd.date_range('1/1/2000', periods=10)) ...: In 2: df.iloc3:7 = np.nan In 3: df Out3: A B C 2000-01-01 1.474071 -0.064034 -1.282782 2000-01-02 0.781836 -1.071357 0.441153 2000-01-03 2.353925 0.583787 0.221471 2000-01-04 NaN NaN NaN 2000-01-05 NaN NaN NaN 2000-01-06 NaN NaN NaN 2000-01-07 NaN NaN NaN 2000-01-08 0.901805 1.171216 0.520260 2000-01-09 -1.197071 -1.066969 -0.303421 2000-01-10 -0.858447 0.306996 -0.028665 One can operate using string function names, callables, lists, or dictionaries of these. Using a single function is equivalent to.apply. In 4: df.agg('sum') Out4: A 3.456119 B -0.140361 C -0.431984 dtype: float64 Multiple aggregations with a list of functions. In 5: df.agg('sum', 'min') Out5: A B C sum 3.456119 -0.140361 -0.431984 min -1.197071 -1.071357 -1.282782 Using a dict provides the ability to apply specific aggregations per column. You will get a matrix-like output of all of the aggregators. The output has one column per unique function. Those functions applied to a particular column will 10 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 beNaN: In 6: df.agg('A' : 'sum', 'min', 'B' : 'min', 'max') Out6: A B max NaN 1.171216 min -1.197071 -1.071357 sum 3.456119 NaN The API also supports a.transform() function for broadcasting results. In 7: df.transform('abs', lambda x: x - x.min()) Out7: A B C abs lambda abs lambda abs lambda 2000-01-01 1.474071 2.671143 0.064034 1.007322 1.282782 0.000000 2000-01-02 0.781836 1.978907 1.071357 0.000000 0.441153 1.723935 2000-01-03 2.353925 3.550996 0.583787 1.655143 0.221471 1.504252 2000-01-04 NaN NaN NaN NaN NaN NaN 2000-01-05 NaN NaN NaN NaN NaN NaN 2000-01-06 NaN NaN NaN NaN NaN NaN 2000-01-07 NaN NaN NaN NaN NaN NaN 2000-01-08 0.901805 2.098877 1.171216 2.242573 0.520260 1.803042 2000-01-09 1.197071 0.000000 1.066969 0.004388 0.303421 0.979361 2000-01-10 0.858447 0.338624 0.306996 1.378353 0.028665 1.254117 When presented with mixed dtypes that cannot be aggregated,.agg() will only take the valid aggregations. This is similiar to how groupby.agg() works. (GH15015) In 8: df = pd.DataFrame('A': 1, 2, 3, ...: 'B': 1., 2., 3., ...: 'C': 'foo', 'bar', 'baz', ...: 'D': pd.date_range('20130101', periods=3)) ...: In 9: df.dtypes Out9: A int64 B float64 C object D datetime64ns dtype: object In 10: df.agg('min', 'sum') Out10: A B C D min 1 1.0 bar 2013-01-01 sum 6 6.0 foobarbaz NaT 1.3.1.2 dtype keyword for data IO The 'python' engine for read_csv(), as well as the read_fwf() function for parsing fixed-width text files and read_excel() for parsing Excel files, now accept the dtype keyword argument for specifying the types of specific columns (GH14295). See the io docs for more information. 1.3. v0.20.1 (May 5, 2017) 11pandas: powerful Python data analysis toolkit, Release 0.20.3 In 11: data = "a b\n1 2\n3 4" In 12: pd.read_fwf(StringIO(data)).dtypes Out12: a int64 b int64 dtype: object In 13: pd.read_fwf(StringIO(data), dtype='a':'float64', 'b':'object').dtypes \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out13: a float64 b object dtype: object 1.3.1.3 .to_datetime() has gained anorigin parameter to_datetime() has gained a new parameter, origin, to define a reference date from where to compute the resulting timestamps when parsing numerical values with a specificunit specified. (GH11276, GH11745) For example, with 1960-01-01 as the starting date: In 14: pd.to_datetime(1, 2, 3, unit='D', origin=pd.Timestamp('1960-01-01')) Out14: DatetimeIndex('1960-01-02', '1960-01-03', '1960-01-04', dtype= ˓→'datetime64ns', freq=None) The default is set at origin='unix', which defaults to 1970-01-01 00:00:00, which is commonly called ‘unix epoch’ or POSIX time. This was the previous default, so this is a backward compatible change. In 15: pd.to_datetime(1, 2, 3, unit='D') Out15: DatetimeIndex('1970-01-02', '1970-01-03', '1970-01-04', dtype= ˓→'datetime64ns', freq=None) 1.3.1.4 Groupby Enhancements Strings passed toDataFrame.groupby() as theby parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (GH5677) In 16: arrays = 'bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux', ....: 'one', 'two', 'one', 'two', 'one', 'two', 'one', 'two' ....: In 17: index = pd.MultiIndex.from_arrays(arrays, names='first', 'second') In 18: df = pd.DataFrame('A': 1, 1, 1, 1, 2, 2, 3, 3, ....: 'B': np.arange(8), ....: index=index) ....: In 19: df Out19: A B first second bar one 1 0 two 1 1 12 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 baz one 1 2 two 1 3 foo one 2 4 two 2 5 qux one 3 6 two 3 7 In 20: df.groupby('second', 'A').sum() \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out20: ˓→ B second A one 1 2 2 4 3 6 two 1 4 2 5 3 7 1.3.1.5 Better support for compressed URLs inread_csv The compression code was refactored (GH12688). As a result, reading dataframes from URLs inread_csv() or read_table() now supports additional compression methods: xz, bz2, and zip (GH14570). Previously, only gzip compression was supported. By default, compression of URLs and paths are now inferred using their file extensions. Additionally, support for bz2 compression in the python 2 C-engine improved (GH14874). In 21: url = 'https://github.com/repo/raw/branch/path'.format( ....: repo = 'pandas-dev/pandas', ....: branch = 'master', ....: path = 'pandas/tests/io/parser/data/salaries.csv.bz2', ....: ) ....: In 22: df = pd.read_table(url, compression='infer') default, infer compression In 23: df = pd.read_table(url, compression='bz2') explicitly specify compression In 24: df.head(2) Out24: S X E M 0 13876 1 1 1 1 11608 1 3 0 1.3.1.6 Pickle file I/O now supports compression read_pickle(), DataFrame.to_pickle() and Series.to_pickle() can now read from and write to compressed pickle files. Compression methods can be an explicit parameter or be inferred from the file extension. See the docs here. In 25: df = pd.DataFrame( ....: 'A': np.random.randn(1000), ....: 'B': 'foo', ....: 'C': pd.date_range('20130101', periods=1000, freq='s')) ....: 1.3. v0.20.1 (May 5, 2017) 13pandas: powerful Python data analysis toolkit, Release 0.20.3 Using an explicit compression type In 26: df.to_pickle("data.pkl.compress", compression="gzip") In 27: rt = pd.read_pickle("data.pkl.compress", compression="gzip") In 28: rt.head() Out28: A B C 0 0.384316 foo 2013-01-01 00:00:00 1 1.574159 foo 2013-01-01 00:00:01 2 1.588931 foo 2013-01-01 00:00:02 3 0.476720 foo 2013-01-01 00:00:03 4 0.473424 foo 2013-01-01 00:00:04 The default is to infer the compression type from the extension (compression='infer'): In 29: df.to_pickle("data.pkl.gz") In 30: rt = pd.read_pickle("data.pkl.gz") In 31: rt.head() Out31: A B C 0 0.384316 foo 2013-01-01 00:00:00 1 1.574159 foo 2013-01-01 00:00:01 2 1.588931 foo 2013-01-01 00:00:02 3 0.476720 foo 2013-01-01 00:00:03 4 0.473424 foo 2013-01-01 00:00:04 In 32: df"A".to_pickle("s1.pkl.bz2") In 33: rt = pd.read_pickle("s1.pkl.bz2") In 34: rt.head() Out34: 0 0.384316 1 1.574159 2 1.588931 3 0.476720 4 0.473424 Name: A, dtype: float64 1.3.1.7 UInt64 Support Improved Pandas has significantly improved support for operations involving unsigned, or purely non-negative, integers. Pre- viously, handling these integers would result in improper rounding or data-type casting, leading to incorrect results. Notably, a new numerical index,UInt64Index, has been created (GH14937) In 35: idx = pd.UInt64Index(1, 2, 3) In 36: df = pd.DataFrame('A': 'a', 'b', 'c', index=idx) In 37: df.index Out37: UInt64Index(1, 2, 3, dtype='uint64') • Bug in converting object elements of array-like objects to unsigned 64-bit integers (GH4471, GH14982) 14 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 • Bug inSeries.unique() in which unsigned 64-bit integers were causing overflow (GH14721) • Bug in DataFrame construction in which unsigned 64-bit integer elements were being converted to objects (GH14881) • Bug inpd.read_csv() in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (GH14983) • Bug inpd.unique() in which unsigned 64-bit integers were causing overflow (GH14915) • Bug in pd.value_counts() in which unsigned 64-bit integers were being erroneously truncated in the output (GH14934) 1.3.1.8 GroupBy on Categoricals In previous versions,.groupby(..., sort=False) would fail with aValueError when grouping on a cat- egorical series with some categories not appearing in the data. (GH13179) In 38: chromosomes = np.r_np.arange(1, 23).astype(str), 'X', 'Y' In 39: df = pd.DataFrame( ....: 'A': np.random.randint(100), ....: 'B': np.random.randint(100), ....: 'C': np.random.randint(100), ....: 'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100), ....: categories=chromosomes, ....: ordered=True)) ....: In 40: df Out40: A B C chromosomes 0 21 62 10 17 1 21 62 10 Y 2 21 62 10 13 3 21 62 10 8 4 21 62 10 22 5 21 62 10 3 6 21 62 10 19 .. .. .. .. ... 93 21 62 10 17 94 21 62 10 Y 95 21 62 10 Y 96 21 62 10 22 97 21 62 10 5 98 21 62 10 20 99 21 62 10 X 100 rows x 4 columns Previous Behavior: In 3: dfdf.chromosomes = '1'.groupby('chromosomes', sort=False).sum() - ValueError: items in new_categories are not the same as in old categories New Behavior: 1.3. v0.20.1 (May 5, 2017) 15pandas: powerful Python data analysis toolkit, Release 0.20.3 In 41: dfdf.chromosomes = '1'.groupby('chromosomes', sort=False).sum() Out41: A B C chromosomes 2 42.0 124.0 20.0 3 105.0 310.0 50.0 4 63.0 186.0 30.0 5 84.0 248.0 40.0 6 84.0 248.0 40.0 7 63.0 186.0 30.0 8 189.0 558.0 90.0 ... ... ... ... 20 126.0 372.0 60.0 21 42.0 124.0 20.0 22 84.0 248.0 40.0 X 63.0 186.0 30.0 Y 126.0 372.0 60.0 1 NaN NaN NaN 12 NaN NaN NaN 24 rows x 3 columns 1.3.1.9 Table Schema Output The new orient'table' forDataFrame.to_json() will generate a Table Schema compatible string represen- tation of the data. In 42: df = pd.DataFrame( ....: 'A': 1, 2, 3, ....: 'B': 'a', 'b', 'c', ....: 'C': pd.date_range('2016-01-01', freq='d', periods=3), ....: , index=pd.Index(range(3), name='idx')) ....: In 43: df Out43: A B C idx 0 1 a 2016-01-01 1 2 b 2016-01-02 2 3 c 2016-01-03 In 44: df.to_json(orient='table') \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out44: ˓→'"schema": "fields":"name":"idx","type":"integer","name":"A","type":"integer" ˓→,"name":"B","type":"string","name":"C","type":"datetime","primaryKey":"idx", ˓→"pandas_version":"0.20.0", "data": "idx":0,"A":1,"B":"a","C":"2016-01- ˓→01T00:00:00.000Z","idx":1,"A":2,"B":"b","C":"2016-01-02T00:00:00.000Z","idx":2, ˓→"A":3,"B":"c","C":"2016-01-03T00:00:00.000Z"' See IO: Table Schema for more information. Additionally, the repr forDataFrame andSeries can now publish this JSON Table schema representation of the Series or DataFrame if you are using IPython (or another frontend like nteract using the Jupyter messaging protocol). This gives frontends like the Jupyter notebook and nteract more flexiblity in how they display pandas objects, since they have more information about the data. You must enable this by setting thedisplay.html.table_schema option toTrue. 16 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 1.3.1.10 SciPy sparse matrix from/to SparseDataFrame Pandas now supports creating sparse dataframes directly fromscipy.sparse.spmatrix instances. See the doc- umentation for more information. (GH4343) All sparse formats are supported, but matrices that are not inCOOrdinate format will be converted, copying data as needed. In 45: from scipy.sparse import csr_matrix In 46: arr = np.random.random(size=(1000, 5)) In 47: arrarr .9 = 0 In 48: sp_arr = csr_matrix(arr) In 49: sp_arr Out49: 1000x5 sparse matrix of type 'class 'numpy.float64'' with 500 stored elements in Compressed Sparse Row format In 50: sdf = pd.SparseDataFrame(sp_arr) In 51: sdf Out51: 0 1 2 3 4 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN 0.997522 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN 0.911034 6 NaN NaN NaN NaN NaN .. ... .. .. .. ... 993 0.925879 NaN NaN NaN NaN 994 NaN NaN NaN NaN 0.955585 995 NaN NaN NaN NaN NaN 996 NaN NaN NaN NaN NaN 997 NaN NaN NaN NaN NaN 998 NaN NaN NaN NaN 0.904855 999 NaN NaN NaN NaN NaN 1000 rows x 5 columns To convert aSparseDataFrame back to sparse SciPy matrix in COO format, you can use: In 52: sdf.to_coo() Out52: 1000x5 sparse matrix of type 'class 'numpy.float64'' with 500 stored elements in COOrdinate format 1.3.1.11 Excel output for styled DataFrames Experimental support has been added to exportDataFrame.style formats to Excel using theopenpyxl engine. (GH15530) For example, after running the following,styled.xlsx renders as below: 1.3. v0.20.1 (May 5, 2017) 17pandas: powerful Python data analysis toolkit, Release 0.20.3 In 53: np.random.seed(24) In 54: df = pd.DataFrame('A': np.linspace(1, 10, 10)) In 55: df = pd.concat(df, pd.DataFrame(np.random.RandomState(24).randn(10, 4), ....: columns=list('BCDE')), ....: axis=1) ....: In 56: df.iloc0, 2 = np.nan In 57: df Out57: A B C D E 0 1.0 1.329212 NaN -0.316280 -0.990810 1 2.0 -1.070816 -1.438713 0.564417 0.295722 2 3.0 -1.626404 0.219565 0.678805 1.889273 3 4.0 0.961538 0.104011 -0.481165 0.850229 4 5.0 1.453425 1.057737 0.165562 0.515018 5 6.0 -1.336936 0.562861 1.392855 -0.063328 6 7.0 0.121668 1.207603 -0.002040 1.627796 7 8.0 0.354493 1.037528 -0.385684 0.519818 8 9.0 1.686583 -1.325963 1.428984 -2.089354 9 10.0 -0.129820 0.631523 -0.586538 0.290720 In 58: styled = df.style.\ ....: applymap(lambda val: 'color: %s' % 'red' if val 0 else 'black').\ ....: highlight_max() ....: In 59: styled.to_excel('styled.xlsx', engine='openpyxl') See the Style documentation for more detail. 18 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 1.3.1.12 IntervalIndex pandas has gained anIntervalIndex with its own dtype,interval as well as theInterval scalar type. These allow first-class support for interval notation, specifically as a return type for the categories incut() andqcut(). TheIntervalIndex allows some unique indexing, see the docs. (GH7640, GH8625) Warning: These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome. Previous behavior: The returned categories were strings, representing Intervals In 1: c = pd.cut(range(4), bins=2) In 2: c Out2: (-0.003, 1.5, (-0.003, 1.5, (1.5, 3, (1.5, 3 Categories (2, object): (-0.003, 1.5 (1.5, 3 In 3: c.categories Out3: Index('(-0.003, 1.5', '(1.5, 3', dtype='object') New behavior: In 60: c = pd.cut(range(4), bins=2) In 61: c Out61: (-0.003, 1.5, (-0.003, 1.5, (1.5, 3.0, (1.5, 3.0 Categories (2, intervalfloat64): (-0.003, 1.5 (1.5, 3.0 In 62: c.categories \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out62: ˓→ IntervalIndex((-0.003, 1.5, (1.5, 3.0 closed='right', dtype='intervalfloat64') Furthermore, this allows one to bin other data with these same bins, withNaN representing a missing value similar to other dtypes. In 63: pd.cut(0, 3, 5, 1, bins=c.categories) Out63: (-0.003, 1.5, (1.5, 3.0, NaN, (-0.003, 1.5 Categories (2, intervalfloat64): (-0.003, 1.5 (1.5, 3.0 AnIntervalIndex can also be used inSeries andDataFrame as the index. In 64: df = pd.DataFrame('A': range(4), ....: 'B': pd.cut(0, 3, 1, 1, bins=c.categories) ....: ).set_index('B') ....: In 65: df Out65: 1.3. v0.20.1 (May 5, 2017) 19pandas: powerful Python data analysis toolkit, Release 0.20.3 A B (-0.003, 1.5 0 (1.5, 3.0 1 (-0.003, 1.5 2 (-0.003, 1.5 3 Selecting via a specific interval: In 66: df.locpd.Interval(1.5, 3.0) Out66: A 1 Name: (1.5, 3.0, dtype: int64 Selecting via a scalar value that is contained in the intervals. In 67: df.loc0 Out67: A B (-0.003, 1.5 0 (-0.003, 1.5 2 (-0.003, 1.5 3 1.3.1.13 Other Enhancements • DataFrame.rolling() now accepts the parameterclosed='right''left''both''neither' to choose the rolling window-endpoint closedness. See the documentation (GH13965) • Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() method, see here. • Series.str.replace() now accepts a callable, as replacement, which is passed tore.sub (GH15055) • Series.str.replace() now accepts a compiled regular expression as a pattern (GH15446) • Series.sort_index accepts parameterskind andna_position (GH13589, GH14444) • DataFrame andDataFrame.groupby() have gained anunique() method to count the distinct values over an axis (GH14336, GH15197). • DataFrame has gained a melt() method, equivalent to pd.melt(), for unpivoting from a wide to long format (GH12640). • pd.read_excel() now preserves sheet order when usingsheetname=None (GH9930) • Multiple offset aliases with decimal points are now supported (e.g. 0.5min is parsed as30s) (GH8419) • .isnull() and .notnull() have been added to Index object to make them more consistent with the Series API (GH15300) • NewUnsortedIndexError (subclass ofKeyError) raised when indexing/slicing into an unsorted Mul- tiIndex (GH11897). This allows differentiation between errors due to lack of sorting or an incorrect key. See here • MultiIndex has gained a.to_frame() method to convert to aDataFrame (GH12397) • pd.cut andpd.qcut now support datetime64 and timedelta64 dtypes (GH14714, GH14798) • pd.qcut has gained theduplicates='raise''drop' option to control whether to raise on duplicated edges (GH7751) 20 Chapter 1. What’s Newpandas: powerful Python data analysis toolkit, Release 0.20.3 • Series provides ato_excel method to output Excel files (GH8825) • Theusecols argument inpd.read_csv() now accepts a callable function as a value (GH14154) • Theskiprows argument inpd.read_csv() now accepts a callable function as a value (GH10882) • The nrows and chunksize arguments in pd.read_csv() are supported if both are passed (GH6774, GH15755) • DataFrame.plot now prints a title above each subplot if suplots=True and title is a list of strings (GH14753) • DataFrame.plot can pass the matplotlib 2.0 default color cycle as a single string as color parameter, see here. (GH15516) • Series.interpolate() now supports timedelta as an index type withmethod='time' (GH6424) • Addition of alevel keyword toDataFrame/Series.rename to rename labels in the specified level of a MultiIndex (GH4160). • DataFrame.reset_index() will now interpret a tupleindex.name as a key spanning across levels of columns, if this is aMultiIndex (GH16164) • Timedelta.isoformat method added for formatting Timedeltas as an ISO 8601 duration. See the Timedelta docs (GH15136) • .select_dtypes() now allows the stringdatetimetz to generically select datetimes with tz (GH14910) • The.to_latex() method will now acceptmulticolumn andmultirow arguments to use the accompa- nying LaTeX enhancements • pd.merge_asof() gained the option direction='backward''forward''nearest' (GH14887) • Series/DataFrame.asfreq() have gained afill_value parameter, to fill missing values (GH3715). • Series/DataFrame.resample.asfreq have gained afill_value parameter, to fill missing values during resampling (GH3715). • pandas.util.hash_pandas_object() has gained the ability to hash aMultiIndex (GH15224) • Series/DataFrame.squeeze() have gained theaxis parameter. (GH15339) • DataFrame.to_excel() has a newfreeze_panes parameter to turn on Freeze Panes when exporting to Excel (GH15160) • pd.read_html() will parse multiple header rows, creating a MutliIndex header. (GH13434). • HTML table output skipscolspan orrowspan attribute if equal to 1. (GH15403) • pandas.io.formats.style.Styler template now has blocks for easier extension, see the example notebook (GH15649) • Styler.render() now accepts kwargs to allow user-defined variables in the template (GH15649) • Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (GH15379) • TimedeltaIndex now has a custom date-tick formatter specifically designed for nanosecond level precision (GH8711) • pd.api.types.union_categoricals gained theignore_ordered argument to allow ignoring the ordered attribute of unioned categoricals (GH13410). See the categorical union docs for more information. • DataFrame.to_latex() and DataFrame.to_string() now allow optional header aliases. (GH15536) 1.3. v0.20.1 (May 5, 2017) 21

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.