3 data science books you can read for free

Data Visualization with Javascript

If you are looking for a tutorial to teach you how to make wonderful visualizations on the web, look no further. Data Visualization with JavaScriptis a free online book for learning data visualization with Javascript. It provides tons of examples and step by step instructions for how to create the graphs, charts, and other visualizations. Here is a quick list of the topics:

  • Graphs
  • D3.js
  • Interactive Charts
  • Geographic Plots
  • Timelines

Frontiers in Massive Datasets

Frontiers in Massive Datasets is a report all about how science, business, communications, national security and others need to learn to handle massive amounts of data. Whether the data has been sitting in a database for years or it is now just screaming into the systems, massive data is now a problem for almost every industry. This report covers many of the topics that need to be addressed when dealing with big data. Here is a very brief overview of the topics:

  • Limitations
  • Sampling
  • Building Models from Massive Data
  • Real-time Algorithms
  • 7 Computational Giants of Massive Data Analysis

Foundations of Data Science

Foundations of Data Science is a draft of textbook written by John Hopcroft and Ravindran Kannan. It is intended to be a text for computer science with an emphasis more on probability and statistics rather than discrete mathematics. The authors argue that knowledge of working with data is a necessary skill for computer scientists of the future. This is clearly the most technical and academic of the 3 books, but if that is your thing, your should really enjoy browsing through this book. Here are some of the topics.

  • High-Dimensional Space
  • Clustering
  • Algorithms for Massive Data Problems
  • Singular Value Decomposition
  • Graphical Models

eScience courses

Quite a good resources for eScience courses in Sweden

Below is a list of courses scheduled for HT 2014.

  1. Introduction to high performance computing/PDC summer school, August
  2. Matrix Computations in Statistics and with Applications, September, v39
  3. Introduction to programming in science and technology, September
  4. Computational Python, October, v42
  5. Numerical Linear Algebra, October, v42
  6. Scientific Visualisation, November 3-21
  7. Scientific Software Development Toolbox, December
  8. Topics in CFD, December 1-19
  9. Tentative Courses Autumn 2014

Details here : http://sese.nu/courses-autumn-2014/

Free courses : Johns Hopkins Data Science Specialization

The quantity and quality of data science education resources just took a step forward with the announcement of the new Johns Hopkins University Data Science Specialization series on the Coursera platform. The series consists of 9 free courses. The first course starts in April 2014. If you want a formal certificate there is a $49 fee for each course passed). Each course runs one month.

The courses are taught by well-known MOOC instructors – biostatistics professors Roger Peng, Jeff Leek, and Brian Caffo, all of Johns Hopkins. Roger and Jeff also run the Simply Statistics blog. Here’s a complete list of all the courses:

The timing for the launch of this new program is excellent as more and more people are looking to retool themselves in order to take part of the big data revolution. I personally beta tested the first 3 courses on the list and I think they’ll make an excellent addition to your resume. Highly recommended. You can get started HERE.

Online Resources for Windows HPC Server

Online resources for this release of Windows HPC Server 2008 are available online:

  • Windows HPC Server 2008 Technical Library (http://go.microsoft.com/fwlink/?LinkId=117920)
  • Windows HPC Server 2008 Command Reference (http://go.microsoft.com/fwlink/?LinkId=120724)
  • Windows HPC Server 2008 PowerShell Reference (http://go.microsoft.com/fwlink/?LinkId=120725)
  • HPC Server Basic Profile Web Service Operations Guide (http://go.microsoft.com/fwlink/?LinkId=122648)
  • Configuring Failover Clustering in Windows HPC Server 2008 Step-by-Step Guide (http://go.microsoft.com/fwlink/?LinkId=123894)
  • HPC Cluster Manager Help (http://go.microsoft.com/fwlink/?LinkId=124146)

Open VPN for SMB Share on Windows HPC

Background: (from Beginners openvpn book)

Layer 2 and Layer 3 VPN: OpenVPN offers two basic modes, which run either as Layer 2 or Layer 3 VPN. Thus, OpenVPN tunnels on Layer 2 can also transport Ethernet frames, IPX packets, and Windows Network Browsing packets (NETBIOS), all of which are problems in most other VPN solutions.

Only one port in the firewall must be opened to allow incoming  connections: Since OpenVPN 2.0, the special server mode allows multiple incoming connections on the same TCP or UDP port, while still using different configurations for every single connection.

Choosing the TUN/TAP devices as a networking model immediately offered a flexibility that other VPN solutions could not offer. While other SSL/TLS-based VPN solutions needed a browser to establish connections, OpenVPN would prepare almost real (but still virtual) network devices, on which almost all networking activities can be carried out.

A TUN device can be used like a virtual point-to-point interface, like a modem or DSL link. This is called routed mode because routes are set up to the VPN partner.

However, a TAP device can be used like a virtual Ethernet adapter. This enables the daemon listening on the interface to capture Ethernet frames, which is not possible with TUN devices. This mode is called bridging mode because the networks are connected as if over a hardware bridge. Applications can read/write to this interface. Software (the tunnel driver) will take all the data and use the cryptographic libraries of SSL/TLS to encrypt them. The data is packaged and sent to the other end of the tunnel. This packaging is done with standardized UDP or optional TCP packets. UDP should be the first choice, but TCP can be helpful in some cases. You are almost completely free to choose the configuration parameters such as protocol or port numbers, as long as both tunnel ends agree on the same figures.

from wikipedia;

TAP (as in network tap) simulates an Ethernet device and it operates with layer 2 packets such as Ethernet frames. TUN (as in network TUNnel) simulates a network layer device and it operates with layer 3 packets such as IP packets. TAP is used to create a network bridge, while TUN is used with routing.