VMware virtualization technology to create a safe and stable operation of medical Cloud
Posted by Daniel J Su, [Dec-21th , 2016]
Four of today's information technology trends affect each industry, there are Cloud (cloud computing), Mobile (mobile devices), Social (social networking), Big Data (large data).
In industry, for example, just like Alibaba Group, listed in the US, is beginning construction of shopping through the Internet platform, to today's derived from a variety of service site, its cash flow platform also branched out into banking, bank deposit to attract users keep the Internet have changed the matter to the attention of the State Council also shaken, and this is an important case of the Internet on the financial industry shocks.
The mainland industries, are facing the BAT (Baidu + + Alibaba Tencent) competition. Alibaba for example, is applied to a large number of cloud computing technology, so Compute resources can be infinitely extended.
Alibaba during get ride off the solution from intel, Oracle and EMC, completed the last one IBM UNIX host offline milestone, the All x86 virtualized cloud infrastructure, this architecture to support the service's various sites,
The virtualization also allows IBM Power System revenue has been declining, which means that IT infrastructure all walks of life began to cloud, so only on the basis of the environment, in order to support and develop more new multi applications.
The distal end of the medical being rammed States wind "cloud" Chung
Recently in Shanghai to attend the forum Software Association, the spindle surrounded by smart cities, health care and the health of future industrial development. Medical assistance through the IT evolution of information technology to enhance the competitiveness of the hospital.
Forbes also said that health care industry will embrace cloud computing technology, 83% of medical institutions using cloud-based App.
IT support to the hospital, "cloud" to "Client" architecture, the intermediate layer includes HIS, PACS, EMR / EHR, HR, outpatient system, and its derivatives medical / care / case / pharmaceutical / health cloud applications.
Underlying these applications, the cloud must be supported, including private hospitals cloud (Cloud Data Center) of the cloud virtualization (such as servers x86, network, storage) technology, so that the upper layer of the mobile medical applications can be utilized actions terminal (such as hospitals Windows workstation, iPad / Android tablet, personal and home Notebook, mobile car care, smart phones, etc.), with all kinds of App provide various medical services.
VMware offers the world's most advanced technology to construct a complete cloud-to-end cloud infrastructure. In the bottom part of the cloud with VMware vSphere (servo virtualization), VMware VSAN (SVC), VMware NSX (network virtualization); at the end part of a VMware Horizon (virtual desktops), AirWatch (mobile devices).
Cloud virtualization medical information to promote health care environment wisdom action
Current challenges and needs encountered in medical action, that medical information system (including PACS / EMR / HIS) must support different devices, reaching action-oriented and fast services to ensure security and data protection acess, stands ready to provide medical information systems. Therefore, to achieve the first step in support of the action, is the "desktop virtualization" of the desktop environment through virtualization mode, on the server side centralized management, high efficiency through the distal end of the protocol, so that the use of by any means for remote access.
Currently medical units imported virtual desktop units, including emergency / nursing station operations, human resources between the Ministry of Information Planning Division, dialysis rooms, clinics, medical classes, in remote areas outside the points to see the doctor;
As VMware Horizon wisdom medical application scenarios, including action-medical tablet PC, such as doctors rounds 1, 2 Action car care, 3. document signoff, 4 radiologists Dundas report, 5 patients physician View Profile 6. on call service telemedicine security without interrupting other scenes.
IT management application in the scene, there is
1. The security alert medical care data,
2. streamline software licensing fees,
3. Ease of use and maintenance of all types of medical USB device containing medical card,
4. After the old software to the new operating system deployment package,
5. XP Win7 PC upgrades and entity management,
6. easily manage desktops, desktop reached without interruption.
VMware virtual desktop infrastructure provided (VDI) support for phone / tablet / laptop device with a variety of operating systems, it can be deployed to each mobile device, health care workers can use iPad or Android tablet, remote login to the familiar Windows system to query data input, analysis and sign-off documents and other applications, without the need to change existing habits, nor shall be bound to use specific features of a fixed seat in the hospital, even if a business trip, but also to connect back to the hospital's computer use; in the security section, do not worry about data input into the half, due to network instability and must re-enter, afraid of a capital and other privacy leaks.
VMware virtual desktop and mobile security management technology, far ahead of the industry in the various competitions are first. Today, VMware is introducing the medical industry customers to construct a low-cost, high-efficiency, green hospital private cloud architecture, provide any terminal device using a zero interrupt, familiar user interface without compromising data security of mobile health care environment.
Big Data and Hadoop MapReduce application development
Posted by Daniel J Su, [Dec-10th , 2016]
Even Big Data abundant treasure trove of hidden treasures, however, intended to dig out these babies, you need to prepare business tools, are still used in the past, the usual relational database, SQL syntax, ETL (Extract, Transform, Load). In addition to a huge amount of information has almost become synonymous with Hadoop outside and inside the same framework of MapReduce, HDFS and other technology companies who must engage in active learning.
When Big Data IT professional media overwhelming captured important forum, many of the past is still little knowledge of this business, they have to amend standing on the sidelines of mind, I began to spend time to study and found that many of the world well-known companies are already surrounded by enthusiastic Big data and fruitful application of the results obtained, so without further ado, to catch the boom is borne.
But, for the RDBMS, SQL syntax, Schema and other Lord of light on things already driving light IT staff, now life title massive data reading of the boss, only to find inside is full of unfamiliar terms, for example, always with a yellow elephant schematically of Hadoop, Big data relating to all the vocabulary, the highest frequency of occurrence, and then find out, only to find it has been hailed as the most suitable for processing, storage and query Big data platform, since Hadoop so much praise, even if it is not adept in the past, the future does not intend to obtain its license, but at least it can not continue in ignorance.
The main reason is that a huge amount of information is intended to open the door of hope, almost certainly, absolutely can not afford less Hadoop that key!
Hadoop software framework to delve inside, suddenly scalp tingling, really wanted surrendered, but whether it is a huge amount of thought data, or even cloud computing and other hot topics, all related with the stuff, so no matter how painful, and only bite the bullet and continue to K down.
"Adder" Hadoop among all associated, in the end what? Two of the most core of the project, one of the MapReduce programming model to perform distributed processing, and the other is the virtual HDFS distributed file system, using specially arithmetic, two pillars of storage, firmly hold up Hadoop architecture.
Turning first to MapReduce. Recalling those familiar business intelligence (BI) mode of operation, it must be from among multiple systems, the analysis model of the desired information to be pooled, and therefore is bound to get through a lot of IT people is afraid to let the program, and that is ETL (Extract, Transform, Load); now enter the Big data world, MapReduce as if playing ETL role, responsible for the processing of the raw data.
In fact, MapReduce can be divided into "Map", "Reduce" has two sections, the former is the one from corresponds to the list function, as the name suggests it is holding the Key, paired with each piece of data, then come up with metadata; As for Reduce, is a correspondence from the list function is responsible for a lot of value ', also metadata is various, convergence becomes a value ''.
In Hadoop system architecture, often spend a lot of called "Worker" unit of work, accept the "Master" of dispatching (task assigned by Job Tracker first to Task Tracker, and then assigned to go), separately executed Map and Reduce tasks ; these Worker done their work, the outcome will return to Task Tracker.
Secondly, when it comes to HDFS distributed file system, and the aforementioned Master node also can not get away, and the Master unit has JobTracker TaskTracker responsible for delivering computing tasks, but also NameNode DataNode and two other generals, etc., relating to the distribution of information on particular portfolio.
NameNode features and traditional file systems have similarities, will put a file into many parts Block, however, these traditional file system will be stored in the Block Drive into the same physical host, NameNode is not, and will disperse Block to different DataNode; for those who understand Linux technology, then inevitably there will be a sense of deja vu, because NameNode look just like the Linux file system inside the Inode, if someone wants to ask, all Block certain dossiers, what are arranged where at? Inode alone or NameNode this type of key players in order to give an answer.
Thus, when a user needs to read a particular file, you can with the help of NameNode, so that each store had a Block 5 hosts, these Block released, while a user a read-5 Block after, and then combines them into a complete file, the benefits of this model is very efficient, if it is otherwise in accordance with a sequential read from the same server station 1 Block 1 ~ 5, during which light is frequently read lock (read lock)
But then again, if you do not have much experience, you want to start to write MapReduce programs, I am afraid not so simple, sometimes just some seemingly insignificant little mistake, you can make the program developers in the context ghost hit the wall a long time can not get back on track; fortunately, with the huge amount of data issues fever, some manufacturers began to design a quite approachable Hadoop suite to Microsoft, for example, they put their accumulated from Bing search technology, intellectual SQL Server data mining technology, packaging become one Template, and through Azure Marketplace app to provide the type to go out, with its help, app developers can go from a lot of costly mistakes, from development to accelerate the completion of MapReduce programs, and help to provide program content correctness.
Accompanied by the relevant aids mature, in fact, whether it is old or less IT staff, can quickly enter the case, or indeed without the Hadoop MapReduce imagined such horror.
MapReduce, HDFS "double arrow" outside, looking around the Hadoop software framework, in fact, there are many inside the weapon, it is worth aspiring to invest a huge amount of information companies make good use of. MapReduce programs written by the aforementioned difficulty, in fact, has just "Mahout" The auxiliary tool, you can save a lot of trouble; the main reason is that, Mahout is a set of MapReduce library, there have been a number of ready-made templates for easy programming developers use the call to be, significantly reduce the burden of writing the program area.
In addition there is a called a "Pig" thing, is enough to relieve the pressure of writing MapReduce programs. Pig's full name Pig Latin, it is specifically designed to perform data analysis of the huge amount of language, such as due to the large selection GROUP, FILTER, etc., or JOIN command very high affinity, it was very easy to use, but this can also Pig script generated at the program are automatically converted into MapReduce Java program, it is tantamount to provide another a shortcut.
Such as remaining "HBase" This plant Column Hadoop based database system, as well as some SQL-like language, and yet not the SQL language "Hive" or "HiveQL," in fact the aforementioned Mahout or Pig Latin, like, there are the same purpose wonderful, wonderful to play a considerable degree of efficiency, in order to make the uninitiated a glimpse into the leading edge of the huge amount of data, it becomes a lot of relatively simple.
Take the route search engine, easily entered the Big Data world.
For frequently involved a huge amount of information seminars, occasionally reading relevant literature people, Splunk must have heard of the term, in this "time-series search engine started" and "IT Google," the company self-positioning, the use of a very special solution that allows advertised fully understand Hadoop, never deploy any BI tools business, also quickly entered the Big Data world, At first, in fact, many niche point.
Splunk reason play unique demands, mainly originated from a single platform versatility, whether it is data collection, computing, storage, query, indexing, analysis, monitoring or display, a variety of needs, all are covered including; the one hand, if too much trouble to build enterprise Hadoop environment, the second aspect is again based on a particular topic analysis, successive to write programs, manual data will be sent to Hadoop data warehouse, in contrast, all-embracing Splunk, does not have small affinity.
But strength lies in analyzing "machine data" of Splunk, although can effectively control the huge amount of information among the fastest growing part, but after all, is part of the machine data, rather than the whole picture (such as the inability to analyze the image drawing data), rather than uphold hoodwink , it would be better integrated with Hadoop operation, in order to maximize synergy.
The Splunk really do, has launched integration with Hadoop package; this way, the user can Splunk data into Hadoop, to facilitate promote academic research, also will go to Splunk Hadoop, perform visual analysis, report production the other tasks through the two join forces and, together with immediacy also insufficient to fill the Hadoop Achilles Heel.
Big Data in Critical Business
Posted by Daniel Su [Nov-18th, , 2016]
If one has a new data analysis technique, but no clear commercial application, then Big Data does not have any value, however, relatively, in the past cannot solve the problem of massive data analysis, just need to find the target for commercial applications, the Big Data will become a big hero.
Big Data is a technical component heavier topics as data processing breakthroughs in technology, such as MapReduce, Hadoop and other big data processing Distributed asked the city, just let us have more ways to meet future data processing and analysis challenges, such as the rapid increase in the volume of data, frequency and speed of information flow faster, and a large increase in geometric progression of unstructured data.
However, the light of these new data analysis techniques, and will not let companies become smarter, earn more money. If there is no clear commercial applications, Big Data will not bring any value. However, IT departments know how Big Data business applications where, is a big problem.
Big Data vendors say there is no shortage of information executives asking: What Big Data can be done? In fact, this problem is also faced by foreign companies today, while back in the one of Big Data seminar which many CIO reactions have this problem, because it is difficult to find IT department application opportunities of Big Data, but rather the company's management or business head unit more able to find an opportunity to take advantage of Big Data, however, the boss or business unit IT executives often do not understand the technology, it is unclear today Big Data Analysis technology to what extent, of course, cannot think of what issues businesses can rely on Big Data to improve, or even create new opportunities.
Many of the CIO also mentioned the IT department are mainly deal with structured data from relational data library data to come up with new uses, or methods to improve business processes, and not difficult down IT staff, but for IT personnel are not familiar with unstructured data, from which expect new applications more difficult.
So, if IT departments begin to implement Big Data technology, let's look at how you can use, you may not have very good outcome. Unlike cloud computing, even if you start wrong direction, but always start with the IT infrastructure virtualization start, even without reaching the final goal of the pre-set, but at least can produce effective results virtualization single.
EMC CTO recently pointed out in an interview which he saw Big Data has been some change in the development in the U.S. which many of the companies has grown from explore some specific technologies into application opportunities from the commercial point of looking for Big Data, he believes that such a Development for Big Data in terms of a good result.
Big Data processing tool allows you to hold a huge amount of data analysis, if the application of a clear purpose for the company or the community, might bring great benefits, but it takes a weapon without a clear goal, the Big Data will be end up useless.
Recently there is a Big Data event which I was very impressed saw that one of Japanese company have a Big Data application program by collecting their car with the car computer information, statistics car suddenly brakes, shift, and so unexpected situations, then through statistical analysis, they generally will identify the driver brakes the location, and then go to the local observation of actual traffic conditions, the condition of the site based on traffic signals to adjust or modify the rules of the road, the results of specific reducing the incidence of traffic accidents This is a great for the people in terms of well-being.
The plan must collect a large number of vehicles with computer information, in order to effectively identify all possible locations of traffic accidents, so the challenge is bound to get a lot of data analysis.
In fact, after the technical capacity to understand Big Data, and all walks of life have a good application cases, such as banks used to predict more fickle nature of global financial markets, the rapid dispatch of global investment; a food companies to analyze abnormal weather, different farms in the world to adjust crop planting strategies; even movie companies can Big Data technology to save the actor's every word will be able to quickly find the clip in each fragment, trim out the most touching results.
Previously unsolvable problem of massive data analysis, and now, Big Data can be technically be helped, and just need to find commercial applications for the goal.
Understand the five concept of NoSQL
Posted by Daniel J Su, [Oct-25th , 2016]
Back in 1998, it has been suggested that the concept of NoSQL databases, however, the technology did not become mainstream. Until recent years, there has been a large number of users contribute information website, led to the demand distributed database, these sites have a large number of users contribute information, but also continue to grow. In order to meet the expanding needs of data growth, the traditional commercial relational database technology by means of a database cluster level can be resolved, however, the high investment in hardware and software which have expanded funding.
Web site operators in order to solve such as TB or even PB grade rating massive data storage and expansion issues, began the development of a variety of low-cost distributed build open source database, Google's BigTable own R & D is one of the best examples. Others such as Amazon, Yahoo invested in recent years have also developed this type of NoSQL databases. Even Microsoft's Azure cloud platform NoSQL technology is also used to access the data.
Same situation, like Facebook, Twitter, Zynga social type sites in order to solve such a huge user interaction data, but also extensive use of NoSQL database technology. Such as Facebook developed Cassandra database on more than 600 core cluster computing systems, storage within more than 120TB of outbound messages.
To address the rapid growth of user contribution data problem, Facebook has developed a NoSQL database Cassandra, on more than 600 core cluster computing systems, storage within more than 120TB of outbound mail data.
In 2009, the open source community to re-use the term NoSQL to represent collectively the distributed non-relational databases.
In fact, NoSQL databases, including a dozen database system, it is very unlike a relational database as the basis for a set of common database theory. However, there are a few key to understanding NoSQL database must know, as long as the master these critical of NoSQL databases can have a basic understanding.
(1) NoSQL is Not Only SQL
Because language is the standard query language SQL relational database, the original NoSQL database system used to represent those who can not provide SQL database query language, which is mostly open source database system for distributed database systems, but there are a few commercial NoSQL database system with features such as ways to store data on Microsoft's Azure platform.
This year the open source community is there another way to a new definition, the NoSQL considered as "Not Only SQL", not just SQL mean, that mix of relational databases and NoSQL databases to achieve the best storage results, for example, the front is to use force to NoSQL database technology to store large amount of data the user state data, but other information is still using relational database to use the benefits of SQL syntax.
(2) Increase the machine will automatically expand the data storage capacity
Another important feature of NoSQL database is a level of scalability, simply add a new server node, you can continue to expand the capacity of the database system. And can take advantage of low-cost general level of computer will be able to expand horizontally, unlike relational database cluster systems often require the performance and capacity of larger servers to be competent. NoSQL database can be used to create a lower cost TB or PB grade level large database systems.
Some NoSQL database can even be non-stop or in the case does not affect the application, the online database will be able to directly expand the capacity of the system.
For example, Cassandra can dynamically expand the new database node, as long as you start a new database node, the old database node will automatically copy the data to a new node, each data access load balancing. Do not like the common practice of cutting as database, you must manually de-normalized database, cutting table, copy the data, specify the application links and other processes.
In simple terms, the level of expansion means that as long as the ability to add new server equipment, can automatically increase the capacity of the database, from a management point of view, it can also reduce long-term maintenance of a database of manpower.
(3) To break the limit Schema field architecture
Relational database tables must be to establish correlation between the structure of the field through the Schema database, Schema is usually pre-designed architecture, the future on the line to make a field change is very difficult, especially when you want to change the Schema huge amount of data very difficult, such as Twitter in order to adjust the data fields, just execute Alter Table command to change the definition of tables and ran for a week.
NoSQL database is switched to Key-Value data model to solve the huge problems transaction data. Key-Value model is to simplify the structure of a data value corresponding to only one Key to a Value value is not related between each piece of data, it can be cut or adjusted, but also can be distributed to different servers create copies .
Some NoSQL database is to increase the concept Column, usage fine so you can use more Key values corresponding Value, such as Cassandra provides four layers or five layers Key-Value data structure, you can use 3 Key to value corresponds to a value. For example, with "user account", "personal files", "birthday" of the three Key value to get a particular user's date of birth. Column design using NoSQL database than only Key-Value Data architecture database more flexible, reducing the difficulty of developing data access program.
Because there is no Schema NoSQL database architecture, therefore, can not support the standard SQL syntax to query data. NoSQL databases typically through a simple API to add, update, or delete the contents of the database, the database will provide some SQL-like syntax Select query mechanism, but usually can not perform complex Join instructions, such as Google App Engine provides a GQL syntax allows developers to query data on BigTable.
(4) Would be consistent with the information sooner or later
To ensure the integrity, relational database using transaction (Transaction) design information, so that data access or transaction process will not be disturbed. Characteristics Transaction database is ACID, in SQL implementation process to ensure that the transaction as a minimum operating units (Atomicity), the entire transaction process to ensure database consistency (Consistency), when executing multiple transactions can be isolated transaction data is not affected by other transactions (Isolation) and the transaction process does not change the original data persistence (Durability).
But ACID database schema expansion difficult, so most did not design NoSQL database transactions, instead of using another different CAP database theory.
There are three key CAP theory, including data consistency (Consistent), availability (Availability) and interrupt tolerance (Partition Tolerance). Theoretically could not taking into account the CAP three characteristics, so, NoSQL databases typically choose two features to design, usually choose CP or AP.
Most NoSQL database choice is CP's design, however, the significance of NoSQL database information about consistency and relational databases are different. NoSQL database will take Eventually Consistency (data sooner or later the same) approach, because the design will be distributed NoSQL data replicated to different nodes scattered, each node can each transaction data, and then synchronize with each other. There will be a time gap between the synchronization process, if the read data simultaneously on different nodes, the case data inconsistencies occur.
NoSQL database expansion in order to maintain a decentralized architecture to allow such a situation, only to ensure that the information will reach final agreement. And yet within a short time data synchronization requires the developer to resolve the conflict or missing data problem on their own, or with a NoSQL database to record those lower data accuracy requirements, such as Facebook's Like button, and even less for a few a commendable record, the user is not easy to find, it is suitable for use NoSQL database to store. When you import a NoSQL database, developers must first assess the nature of the information, whether the risk of data loss.
(5) Lack of maturity, high-risk version upgrade
Because in recent years, the prevalence of Web 2.0 sites and social networking sites, users began to appear to solve the problem with the contribution of information soared NoSQL database. Many NoSQL databases are 2,3 years before it appeared, so the function of the database itself is not complete, there are less mature and stable version, the version upgrade process will easily appear incompatible situation.
On the other hand, these are mostly used to access the database through the API data, if the new version adds new features, will change the way these parameters or call API. For developers, equal to have to re-modify the application in order to obtain the correct database content. Even a file format stored in the database itself will change, the new version after upgrading the database, but can not read the old files must be formatted file conversion work.
Find the right NoSQL database, on the one hand to pick the database used by well-known sites, because these sites are usually well-known contributors to these databases is that they use in order to solve their own problems, it would be more actively to improve the database.
Also, consider its technical capacity and ability to learn to master foreign technology development, although NoSQL database provides another low-cost distributed database, expand the node function automatically saves database maintenance manpower, however, relatively , also have to bear the risk of changes in technology is not mature enough time. Fast recognize four categories of mainstream NoSQL database
Long before the term popular NoSQL database, it has appeared in a variety of non-relational databases, these databases have different characteristics, it is difficult as the relational database as a set of common ideas can all understand . Only individually understand each NoSQL database features and applications.
There are four kinds of comparison of concern NoSQL databases are Key-Value database, memory database (In-memory Database), graphics library (Graph Database) and document database (Document Database).
First Type: Key-Value type database
Key-Value NoSQL database is the largest database type, the kind of information the biggest feature is the use of Key-Value Data architecture, canceled the original relational database architecture commonly used in the field (Schema), each respective data independent, so you can create a feature of distributed and high expansion capabilities.
Include things like Google's BigTable, Hadoop's HBase, Amazon's Dynamo, Cassandra, Hypertable are all kind of Key-Value database.
Google developed its own BigTable built on Google File System GFS, Google's own applications specifically for use, such as Gmail, Google Reader, Google Maps, YouTube and other application data are stored in the BigTable. Now Google is also open to other people to use BigTable to store the information through Google App Engine service.
BigTable is like a lot of machine tables integrate all the information there is a table with a single data table can store the contents of PB grades. Google App Engine provides a GQL query language, allowing developers to use Select syntax to query the data in BigTable, but not like this GQL language SQL language syntax that can be used for cross-Join-table query.
Because Google does not release BigTable and related cloud computing platform, and later appeared in another set of Google cloud computing reference architecture Hadoop platform and developed a HBase distributed database. Hadoop HBase database platform is used for storing Hadoop MapReduce parallel computing data carried. Similar to Google's BigTable, is stored in a large number of rows in the data table, the structure of each line also has a major Key value and any number of columns field.
Amazon Dynamo developed distributed database is used in the Amazon network services, such as S3 storage service, but also take the Key-Value Storage way to build distributed high-availability environments. Amazon's shopping cart is to use the database Dynamo, Dynamo will copy the data to build a replica on many servers periodically synchronize with each other. However, due to the Dynamo can not ensure that every copy of the information instantly synchronize data in order to solve the problem of conflict and lost, Amazon has developed another conflict resolution techniques to ensure data consistency.
In the Key-Value database type, there is a recently very popular NoSQL database, which is the power of the technology can be used in Cassandra. It was released in 2008, Facebook distributed database support Java platform. Facebook use Cassandra to store up to 120TB of stations within the mailbox (inbox) data, in March 2009 to take over the maintenance by the Apache Foundation, is now one of the top-level Apache project focused on the development.
This master-slave and HBase distributed database architecture different, Cassandra is the same in each cluster node database, there is no master-slave relationship, so that when the build distributed database, Cassandra least as long as the establishment of two a server node will be able to perform this function and the role of the two nodes is almost exactly the same, you only need to specify better communicate with each other in the profile of IP URL. After starting the database, these two nodes will self replicate data, distributed storage, database access load balancing.
2nd type: Memory Database Cache is the preferred tool known website
Memory Database (In-memory Database) is the data stored in the memory NoSQL database, including the Memcached, Redis, Velocity, Tuple space and so on. In fact, like Memcached, Redis is a kind of Key-Value Data architecture of the database, but this kind of database change the data stored in memory to improve read in efficiency, mostly commonly used to cache web pages, web pages faster delivery speed, reduce the number of hard to read, but could not be saved after system shutdown.
Memcached 2003 years to occur is an important tool for many well-known websites to improve the efficiency of web browsing, such as YouTube, Facebook, Zynga, Twitter and others have to use Memcached. Google App Engine application hosting service also offers Memcached service.
On Facebook's most popular game Farm Ville farm to improve the game is to use Memcached fluency. Farm Ville number of users log on every day up to a million people, in order to allow users to read and write data during the operation does not need to wait for a delay, Farm Ville take two layered architecture, using Memcached to pass the station on the user's data, slightly then the entire batch will write data to the back-end MySQL database, stored on the hard disk. However, this risk framework is that when the system crashes, lost a whole batch of data is stored in memory of.
In addition to veteran Memcached, 2009 appeared a new open-source memory database Redis. In addition to providing a distributed cache, Redis and Memcached biggest difference is that, Redis provides an information architecture that can automatically sort the data stored in Redis, allowing developers to obtain information sorted.
Redis get VMware sponsored in March of this year. September has just released the new version 2.0, adds new design such as virtual memory, so that developers can put more information on the memory capacity of the quantity. USA classified ads website Craigslist and code hosting site Github are using Redis to accelerate access speed.
3rd type: Document library for storing unstructured data
File database is primarily used to store unstructured documents, such as unstructured data is the most common HTML pages. An HTML page structure not as a fixed field general form, each field has a specific data type and size. Such as Web pages, there are Head and Body structure, there may be Body element 10 paragraphs, paragraphs have text, links, images and so on. Data structure file databases are often loose tree structure.
Many documents are commercial database database system, the concept of file database from IBM's Lotus Notes way to store files, XML database is a file database. Common source file database like CouchDB, MongoDB and Riak so on.
4th type: Graphics library can be used to record the social relations
The last category is a graphics library, which is not designed to handle the image database, but refers to the use of graphics architecture to store data architecture relationship between nodes, for example, a tree structure to organize affiliation or mesh structure to save friend relationships, geographic map data systems typically will use graphics database to store the relationship between each point and the neighboring points on the map, or use graphics database to calculate the shortest distance between points, the same concept can also be used to calculate the shortest distance between people dating. Graphics Library biggest feature is the expansion of the complexity of the force, the more complex data relationships more suitable for use graphics database.
Data structure such information is not standard practice, basic graphics data includes node (Node), relations (Relation) and property (Property) three structures. For example, with nodes Facebook account to log on with a relationship to record a friend relationship with the attribute to describe this account and other personal information. Finally, you can use the network map showing the status of Facebook dating between users. Common graphics library such as Neo4j, InfoGrid, AllegroGrph so on.
These four categories are more streamlined NoSQL database distinction, it can be used to quickly understand the characteristics and differences between NoSQL databases, another like Wikipedia is from the application point of the NoSQL divided into 10 categories, the classification method is particularly the Key-Value more information Reservoir separation of different types of sub-classification application, you can better understand more features NoSQL databases.
Is Big Data a hype?
Posted by Daniel J Su, [ Oct-13th , 2016]
Looking at the trends, cloud computing can be said that the big transformation computing architecture , big data is the Great Leap Forward information technology. Computing and information that the two dimensions of information architecture .
About 5 to 6 years ago, cloud computing is said to be the subject of speculation since all manufacturers are enthusiastic talk about cloud computing , it seems that no one can now cloud computing vendor information slightest .
Provides network mailbox , said he has long been a practitioner of the cloud model . E-commerce company as a cloud stocks were able to provide convenience store to pick up , but simply say that the cloud convenience store . Does not seem to add the word cloud , it will become obsolete in this era .
Today, five years later , and replaced with the Big Data debut. Jufan with relevant information , databases , data warehousing, storage systems, or even file transfers , etc., are all that big data vendors .
The situation with five years ago, cloud computing , as identical. Is Big Data is also speculation ingredients majority ?
"The Economist" 's Economist Intelligence Unit survey company , recently announced a big head of information for the investigation , the title hit on the " hype and hope " , the results of the survey response : Most executives agree that the company has a large data help , and even help to improve revenue , but in fact a large extent invested enterprise data is far less than expected.
According to the Economist Intelligence Unit's survey , more than 90 percent of executives agree that big data will help to understand the customer , so they can further enhance revenue . Of these, nearly half of the head of 45% or even higher expectations that enhance the effect of revenue up more than 25 %.
In addition, over 70% of executives are endorsed by big data can improve the productivity , profitability and innovation capability of enterprises care. However , large enterprises to import data speed is slow, but with the competent high expectations deviate , because nearly 58 percent of large enterprises in the areas not yet have specific information on progress .
In the case of today's manufacturers enthusiastic talk about big data , compared to the speed of the slow business adoption , there should be a part of the current hype gimmick ingredients. However , business executives for large data highly recognized nor ignored.
Most corporate executives are reactive resistance to many large data exist , but most of all the internal problems that already exist , such as the lack of communication between departments , selfishness , and so on . Therefore, in most big data to corporate landing, apparently still need a lot of time .
However, the development of science and technology took off like a rocket , only flying high . Once thought to be deliberately hype cloud computing , in the process of technological development has proven the value of its existence . People who take it to the hype has been left aside , and carefully brought something people can see the results of gradually .
Perhaps there is now a large data excessive speculation ingredients, but not the development of technology and others, is bound to the same big data and cloud computing trends , proved its worth in the next technological development process.
In fact , we already have some advanced applications can prove the value of big data. Microsoft Research Asia, the development of U-Air air pollution forecasting system , is a good example. The system can immediately predict any corner of the city 's air quality , the accuracy rate of more than 8 percent.
General air quality is to analyze historical data to predict air quality monitoring stations , but the only response of air monitoring stations around certain areas of air quality , but the city's transportation, construction, crowd a great impact on air quality , even in the vicinity of air monitoring stations , its air quality may also be very different because of the traffic flow . Therefore , only be provided by the analysis of historical data of air quality forecasts, the results are always great discrepancy with the actual situation , the accuracy rate of less than 6 percent.
This question is not meteorologists do not consider other factors that affect air quality , but limited to data analysis techniques are not readily analyze large amounts of heterogeneous data . Microsoft Research Asia is a big breakthrough in data , machine learning technology , to analyze historical weather information , traffic , crowd moving , urban locations ( such as railway stations, buildings, hotels , parking lots , parks , etc. ) and road construction , etc. the relationship between heterogeneous data in order to find accurate predictive models.
On the other hand , thanks to the large data processing method data , U-Air in the system can analyze large amounts of data five minutes good of the entire city , so it can provide real-time air quality forecasts , making this technology in practical applications can come in handy.
If there are no major information technology may never be able to break through the current method of monitoring air pollution bottleneck . Similar practices such large data beyond the past examples have ever occurred in the world .
Looking at the trends, cloud computing can be said that the big transformation computing architecture , big data is the Great Leap Forward information technology. Computing and information , not that the two dimensions of information architecture ; When the cloud and big data build up, is it not proclaim a new era of IT .
Systems: CEO led the leadership to success of Big Data implementation
Posted by Daniel J Su, [Sep-22nd , 2016]
Big Data because the data processing, data analysis
related breeze, easily be considered technical issues. However, Neville Vincent
from Hitachi Data Systems (HDS) to remind the CEO, do not put big data and
other information technology mix, so doomed to fail.
Neville Vincent said, today's CEO definitely want to
embrace big data, which he has three views. The primary reason is the company's
information assets. CEO knows that employees are our most important asset, but
may not realize that the information is actually everywhere an important asset,
and its importance second only to employees, is the second most important
Secondly, he believes that this asset utilization
data will help to improve revenue, profitability and productivity. Finally, when
companies like to treat data as an asset, will gradually dispersed throughout
the data originally centralized management, information silos issues will also
be resolved. In a single pool of architecture, you cannot get through
inter-departmental information flow problems, and enhance inter-sectorial
collaboration and communication efficiency.
Although HDS company Neville Vincent services
provide large information-related technologies, of course you want to appeal to
a large data chief executive attention, but on his observations, big data is
not a craze, some companies have realized that, if not immediately follow up on
big data After five years it will be possible to beat competitors.
The retail trade in Australia, which top two
supermarket chains Woolworths and Coles, this year in terms of big data move.
One of the biggest supermarket chain Woolworths, not only to expand the
customer analysis, more unexpected invest 20 million Australian dollars, made
Australia's largest manufacturers Quantum half of the data analysis options. On
the one hand is the key to technology, the future also plans to Woolworths
analysis of data to identify, sold to other companies.
CEO today continue to face pressure from many
sectors, including the pressure of market competition, improve profitability to
shareholders upon request, so the pressure to increase profits. Neville Vincent
said the chief executive to ask how they obtain information to cope with the
ensuing pressure, CEO universal answer read newspapers, websites and so on. He
said the CEO himself was analyzing data, but the human brain, although
powerful, but lack of independence expansion force of the computer. If you know
the CEO of rival social network can be analyzed from the inside you can find
information on the key improvements to enhance productivity and profits, they
will quickly adopt big data applications.
However, once the data to understand the importance
of the long run but cannot directly import large data delivery information departments.
Neville Vincent pointed out that often people ask: big data first by the
marketing, sales, research and development which is oriented start? Or that the
CIO or CMO to lead it? But his answers are directed at the executive.
Neville Vincent said, because the company's goal is
to perform long-established, large data to be successful, you must first
determine the objectives to be achieved, then through the relevant
technologies, and professionals such as data analysis information to assist
scientists in order to achieve the implementation of large data length set of
If the CEO is not the dominant large data occurs
marketing, sales and other departments in order to solve their problems, have
to import large data technology that the company is no lack of information on
all kinds of small big project, but did not play the whole value. CEO Neville
Vincent believes each camp to prevent these large data project, to focus
resources concern business goals, through the information platform, so that all
sectors of the flexible use of the information, and contribute to more
Of course, the technical problems involved are still
big data by a professional IT department to be responsible, Neville Vincent
pointed out as early as possible to allow the information to be chief executive
in the business plan long been involved in the brewing process, rather than
wait until after the decision-shaping, and only then large data issue to build
Large data must be from a business view, it must be
allowed to become part of the information department of business processes, so
much information to be successful.
Amazon Web Services Demystifying
Posted by Daniel J Su,