<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>import Python, Finance, Scientific Computing - Latest Comments</title><link>http://wesmckinn-blog.disqus.com/</link><description></description><atom:link href="https://wesmckinn-blog.disqus.com/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Tue, 24 Dec 2013 16:19:49 -0000</lastBuildDate><item><title>Re: Strata NYC 2013 and PyData 2013 Talks</title><link>http://wesmckinney.com/blog/?p=697#comment-1176075475</link><description>&lt;p&gt;Hi Wes,&lt;/p&gt;&lt;p&gt;when exploring what could be used with Python for SEO work I stumbled upon pandas and your book as well.&lt;/p&gt;&lt;p&gt;Now, seeing development of badger, I have few questions_&lt;/p&gt;&lt;p&gt;a) what will be the future of pandas development?&lt;/p&gt;&lt;p&gt;b) is badger meant to replace panda?&lt;/p&gt;&lt;p&gt;c) will badger be initially provided only as some kind of SaaS?&lt;/p&gt;&lt;p&gt;d) considering that you're expert in diciing &amp;amp; slicing data, which tool you consider more appropriate for SEO work? (although I prefer old-school desktop tools)&lt;/p&gt;&lt;p&gt;Sincerely,&lt;br&gt;Gour&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">gour_atmarama</dc:creator><pubDate>Tue, 24 Dec 2013 16:19:49 -0000</pubDate></item><item><title>Re: A Roadmap for Rich Scientific Data Structures in Python</title><link>http://wesmckinney.com/blog/?p=77#comment-1174880038</link><description>&lt;p&gt;Have you looked into RootPy? Not PyRoot but Rootpy. It does what you want and is integrated with scipy and numpy.  No pandas however. There are never enough pandas. &lt;a href="http://www.rootpy.org/" rel="nofollow noopener" target="_blank" title="http://www.rootpy.org/"&gt;http://www.rootpy.org/&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Merlyn</dc:creator><pubDate>Mon, 23 Dec 2013 13:34:07 -0000</pubDate></item><item><title>Re: Strata NYC 2013 and PyData 2013 Talks</title><link>http://wesmckinney.com/blog/?p=697#comment-1129303757</link><description>&lt;p&gt;You give DBAs too much credit. :) In the real world I see lots of single tables with VARCHAR columns.Good talk, nice to see DataPad coming along.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Gagi</dc:creator><pubDate>Tue, 19 Nov 2013 03:20:25 -0000</pubDate></item><item><title>Re: Adventures in Aggregating Data (Group By)</title><link>http://wesmckinney.com/blog/?p=8#comment-1106945171</link><description>&lt;p&gt;Could you ask this question on the pydata mailing list?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Sun, 03 Nov 2013 01:17:39 -0000</pubDate></item><item><title>Re: Adventures in Aggregating Data (Group By)</title><link>http://wesmckinney.com/blog/?p=8#comment-1099153787</link><description>&lt;p&gt;Wes, I love pandas, and I've been working through your book and changing the way my organisation appraoches data management and analysis. However, I can't for the life of my work out how to pass the values from more than one DataFrame Column to a groupby.transform function. That is I do not want to passmultiple columns one by one to the same function to transform, but I want to transform one column but utilising values found in other columns. I am able to get the result by simply iterating through the bygroups and making lists of the data, but I want to do it with transform. Any tips?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Rory</dc:creator><pubDate>Mon, 28 Oct 2013 01:05:32 -0000</pubDate></item><item><title>Re: Whirlwind tour of pandas in 10 minutes</title><link>http://wesmckinney.com/blog/?p=647#comment-1063416795</link><description>&lt;p&gt;I second that, I'm trying to follow the video tutorial&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jeannie</dc:creator><pubDate>Sun, 29 Sep 2013 09:25:58 -0000</pubDate></item><item><title>Re: Whirlwind tour of pandas in 10 minutes</title><link>http://wesmckinney.com/blog/?p=647#comment-1042335927</link><description>&lt;p&gt;Where do you have your data from that you are using in your video?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Max Richter</dc:creator><pubDate>Fri, 13 Sep 2013 10:12:35 -0000</pubDate></item><item><title>Re: Whirlwind tour of pandas in 10 minutes</title><link>http://wesmckinney.com/blog/?p=647#comment-1042333482</link><description>&lt;p&gt;Where do you have your stock data from that you are using in your video?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Max Richter</dc:creator><pubDate>Fri, 13 Sep 2013 10:11:27 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-1032460189</link><description>&lt;p&gt;Maybe ask on StackOverflow?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Fri, 06 Sep 2013 13:57:10 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-1032024726</link><description>&lt;p&gt;Hello,&lt;/p&gt;&lt;p&gt;Regarding your book, I have to congratulate you for bringing me into Python, as my first choice until now was R. I have a few difficulties trying to fetch JSON data from my online broker (can connect to the server, but when I stream data it comes as JSON). Is there any way to transform each incoming piece of information to pandas DataFrame?&lt;br&gt;Thanks.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cristian</dc:creator><pubDate>Fri, 06 Sep 2013 06:07:24 -0000</pubDate></item><item><title>Re: Why I&amp;#8217;m not on the Julia bandwagon (yet)</title><link>http://wesmckinney.com/blog/?p=475#comment-1026724857</link><description>&lt;p&gt;As an update a year later Julia is still over-selling their performance on their homepage. A simple array addition in a loop is 70x slower than C++:&lt;/p&gt;&lt;p&gt;julia&amp;gt; function f()&lt;br&gt;  ans = [1.0 2.0]&lt;br&gt;  for i=1:1000^2&lt;br&gt;    ans += [1.0 2.0]&lt;br&gt;  end &lt;br&gt;  return ans&lt;br&gt;       end&lt;/p&gt;&lt;p&gt;Takes 0.23 secs on my system versus 0.0033 secs for C code. If you write scalar code that adds elements 1 and 2 it is only about 1.5x slower than C, but I don't want to write scalar code.&lt;/p&gt;&lt;p&gt;Also Dijkstra settled the 0 and 1 indexing debate in the 1960s:&lt;br&gt;&lt;a href="http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html" rel="nofollow noopener" target="_blank" title="http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html"&gt;http://www.cs.utexas.edu/us...&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The language does look technically nice except the 1 indexing flaw. Maybe someone can fork and make a 0 indexed Julia.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Connelly Barnes</dc:creator><pubDate>Mon, 02 Sep 2013 15:09:26 -0000</pubDate></item><item><title>Re: Filtering out duplicate pandas.DataFrame rows</title><link>http://wesmckinney.com/blog/?p=340#comment-1024296682</link><description>&lt;p&gt;Thanks, by using "groupby" and "duplicated" I managed to get what i needed :)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Nikita</dc:creator><pubDate>Sat, 31 Aug 2013 07:55:33 -0000</pubDate></item><item><title>Re: Filtering out duplicate pandas.DataFrame rows</title><link>http://wesmckinney.com/blog/?p=340#comment-1019777269</link><description>&lt;p&gt;Yes use `duplicated`&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Tue, 27 Aug 2013 18:22:12 -0000</pubDate></item><item><title>Re: Filtering out duplicate pandas.DataFrame rows</title><link>http://wesmckinney.com/blog/?p=340#comment-1018474122</link><description>&lt;p&gt;Great way to drop duplicates based on multiple columns. Thanks!&lt;br&gt;Is there a way to get the indices or values in a specific column of the data which is dropped? e.g. I want to know what value of C is in the row where A and B are found to have duplictes. &lt;br&gt;I really mis the which() function from R.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Nikita</dc:creator><pubDate>Mon, 26 Aug 2013 17:50:56 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-968248475</link><description>&lt;p&gt;Thanks a lot for the link. &lt;br&gt;Quora has a great collection of question and answers I was thinking of downloading some data for playing around with it. It does not have any api. What method do you suggest to get data from Quora?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Shishir Pandey</dc:creator><pubDate>Fri, 19 Jul 2013 02:18:31 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-967683525</link><description>&lt;p&gt;check out &lt;a href="http://data.stackexchange.com" rel="nofollow noopener" target="_blank" title="data.stackexchange.com"&gt;data.stackexchange.com&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Thu, 18 Jul 2013 17:27:45 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-966632993</link><description>&lt;p&gt;I am interested in the script that you used to get data from stackexchange. Can you share that?&lt;br&gt;Thanks.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Shishir Pandey</dc:creator><pubDate>Thu, 18 Jul 2013 02:04:48 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-952233404</link><description>&lt;p&gt;please transfer these comments to github to start a discussion with the other developers about your needs. we're keen to improve the read_csv function in this regard&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Thu, 04 Jul 2013 18:51:54 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-951490914</link><description>&lt;p&gt;Actually, I dont really need a read_csv option that will kick out any values that dont parse to numeric. I need one of these alternatives:&lt;/p&gt;&lt;p&gt;A) I do not want non numeric values to be represented in different ways. Because when I try to clean up the data, I first have to handle "NaN", then I have to handle "None", then I need to handle "?", etc. I only want one representation of non numerics, such as "NaN" only. Then I can handle all non numerics once, with "NaN" statements.&lt;/p&gt;&lt;p&gt;B) If there are many ways to represent non numerics, "NaN", "None", "Inf", "?", etc - then I want an explicit list in the manual of all the non numerics I must handle. This list should be easy to access, i.e. it should not be hidden somewhere. For instance in the read_csv manual: "These are the following non numeric values that can be found in a data frame after read_csv: NaN, None, Inf, ?, ...". Then I can explicitly handle each case.&lt;/p&gt;&lt;p&gt;What I dont want is the situation where I worry if I have non numerics left in my data. Have I handled all non numerics? As of now, I only did a "pandas.drop", does that cover all "None", "NaN", etc? I dont know. I hope I only have numerics left. But I dont know. I hope when we go live with my statarb algo, we dont loose lot of money just because it is not clear how to wash the pandas data?&lt;/p&gt;&lt;p&gt;BTW. I prefer Pandas to R, and are trying to convert people from R to Pandas. Great job you are doing! :)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike</dc:creator><pubDate>Thu, 04 Jul 2013 06:45:33 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-948191883</link><description>&lt;p&gt;Great! This was a huge pain in R. Until then, do you have a link that shows how to catch all possible non numeric values there is in Pandas? How can I catch all of them values? I must look for "NaN", "none", ... ?  Do you have a complete list of non numerics that I can look for? Or some code snippet?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike</dc:creator><pubDate>Mon, 01 Jul 2013 14:57:49 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-944498300</link><description>&lt;p&gt;We've been planning to add a parsing option to read_csv that will "kick out" any values that don't parse to numeric. Haven't done so yet, your voice on the GitHub issue list with example data would be helpful.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Thu, 27 Jun 2013 15:37:06 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-944105592</link><description>&lt;p&gt;I have recently switched to Pandas from R, one thing that annoyed me in R, was that there was many types of NaN values. For instance, I removed all "NaN" with R.dropna or some similar command. Much later I discovered I had "N/A" values left in my dataset. So I had to remove those values as well. Maybe there was other non values left, such as "?". How could I know? I would like a simple way to remove _all_ values that are not numbers, so I could catch all NaN, N/A, ?, etc. Could you please make sure that Pandas does not have this problem with pandas.dropna. It was a real pain to try to catch all non values. So when I use pandas.dropna, I would like it to catch all values that are not floats. AFAIK, pandas.dropna only catches "NaN"?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike</dc:creator><pubDate>Thu, 27 Jun 2013 10:15:55 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-932800114</link><description>&lt;p&gt;Hi Wes, I really appreciate the talk. The demo with IPython shell is really useful and interesting! I'm already trying out pandas for data visualization using data from various internet sources. Thanks a great deal!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Boon Kwee</dc:creator><pubDate>Mon, 17 Jun 2013 00:36:01 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-932765091</link><description>&lt;p&gt;pandas could run on pypy but all of the Cython extensions would need to be ported to pure Python or C with cffi. I don't see any conflict between pandas and numba; I see good opportunities to use numba kernels to accelerate operations in pandas. I'm waiting for the pull request.&lt;/p&gt;&lt;p&gt;I don't yet see any strides being made to improve general data processing / data preparation (as it relates to business analytics, for example) in pypy or numba which is my main area of interest.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Wes McKinney</dc:creator><pubDate>Sun, 16 Jun 2013 23:10:15 -0000</pubDate></item><item><title>Re: PyCon Singapore 2013</title><link>http://wesmckinney.com/blog/?p=687#comment-932614063</link><description>&lt;p&gt;Great presentation; one thing I'm really curious about is how you see the future for Pandas (with its Cython setup) in a numba or pypy world? I have positive experiences with pypy, but for pandas (set aside possible cython in pypy issues) i would guess it means more that "the rest" of Python would get accelerated too? (and not pandas itself, as that has already been pushed/optimized as much as possible to cython/c level?). Would love to understand this better!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Carst Vaartjes</dc:creator><pubDate>Sun, 16 Jun 2013 18:12:22 -0000</pubDate></item></channel></rss>